Python User Guide
Perspective for Python uses the exact same C++ data engine used by the WebAssembly version. The library consists of many of the same abstractions and API as in JavaScript, as well as Python-specific data loading support for NumPy, Pandas (and Apache Arrow, as in JavaScript).
Additionally, perspective-python
provides a session manager suitable for
integration into server systems such as
Tornado websockets,
AIOHTTP, or
Starlette/FastAPI,
which allows fully virtual Perspective tables to be interacted with by
multiple <perspective-viewer>
in a web browser. You can also interact with a
Perspective table from python clients, and to that end client libraries are
implemented for both Tornado and AIOHTTP.
As <perspective-viewer>
will only consume the data necessary to render the
current screen, this runtime mode allows ludicrously-sized datasets with
instant-load after they've been manifest on the server (at the expense of
network latency on UI interaction).
The included PerspectiveWidget
allows running such a viewer in
JupyterLab in either server or
client (via WebAssembly) mode, and the included PerspectiveTornadoHandler
makes it simple to extend a Tornado server with virtual Perspective support.
The perspective
module exports several tools:
Table
, the table constructor for Perspective, which implements thetable
andview
API in the same manner as the JavaScript library.PerspectiveWidget
the JupyterLab widget for interactive visualization.- Perspective webserver handlers that interface seamlessly with
<perspective-viewer>
in JavaScript. PerspectiveManager
the session manager for a shared server deployment ofperspective-python
.
This user's guide provides an overview of the most common ways to use
Perspective in Python: the Table
API, the JupyterLab widget, and the Tornado
handler.
For an understanding of Perspective's core concepts, see the Table, View, and Data Binding documentation. For API documentation, see the Python API.
More Examples are available on GitHub.
Installation
perspective-python
contains full bindings to the Perspective API, a JupyterLab
widget, and a WebSocket handlers for several webserver libraries that allow you
to host Perspective using server-side Python.
In addition to supporting row/columnar formats of data using dict
and list
,
pandas.DataFrame
, dictionaries of NumPy arrays, NumPy structured arrays, and
NumPy record arrays are all supported in perspective-python
.
PyPI
perspective-python
can be installed from PyPI via pip
:
pip install perspective-python
Anaconda
perspective-python
can also be installed for Anaconda
via Conda Forge
conda install -c conda-forge perspective
Jupyterlab
PerspectiveWidget
is a JupyterLab widget that implements the same API as
<perspective-viewer>
, allowing for fast, intuitive
transformations/visualizations of various data formats within JupyterLab.
PerspectiveWidget
is compatible with Jupyterlab 3 and Jupyter Notebook 6 via a
prebuilt extension.
To use it, simply install perspective-python
and the extensions should be
available.
perspective-python
's JupyterLab extension also provides convenient builtin
viewers for csv
, json
, or arrow
files. Simply right-click on a file with
this extension and choose the appropriate Perpective
option from the context
menu.
Table
A Table
can be created from a dataset or a schema, the specifics of which are
discussed in the JavaScript section of the user's
guide. In Python, however, Perspective supports additional data types that are
commonly used when processing data:
pandas.DataFrame
bytes
(encoding an Apache Arrow)objects
(either extracting a repr or via reference)
A Table
is created in a similar fashion to its JavaScript equivalent:
from datetime import date, datetime
import numpy as np
import pandas as pd
import perspective
data = pd.DataFrame({
"int": np.arange(100),
"float": [i * 1.5 for i in range(100)],
"bool": [True for i in range(100)],
"date": [date.today() for i in range(100)],
"datetime": [datetime.now() for i in range(100)],
"string": [str(i) for i in range(100)]
})
table = perspective.Table(data, index="float")
Likewise, a View
can be created via the view()
method:
view = table.view(group_by=["float"], filter=[["bool", "==", True]])
column_data = view.to_dict()
row_data = view.to_records()
Pandas & Numpy Support
Perspective supports dictionaries of one-dimensional numpy.ndarray
, as well as
structured arrays and record arrays. When passing in dictionaries of NumPy
arrays, make sure that your dataset contains only NumPy arrays, and not a
mixture of arrays and Python lists — this will raise an exception. Numpy
structured/record arrays are parsed according to their field name and dtype.
Table
can also be constructed from pandas.DataFrame
and pandas.Series
objects. Because Perspective is designed for applying its own transformations on
top of a flat dataset, dataframes that are passed in will be flattened and have
its index
treated as another column (through the
reset_index()
method).
If the dataframe does not have an index set, an integer-typed column named
"index"
is created. If you want to preserve the indexing behavior of the
dataframe passed into Perspective, simply create the Table
with
index="index"
as a keyword argument. This tells Perspective to once again
treat the index as a primary key:
data.set_index("datetime")
table = perspective.Table(data, index="index")
Schemas & Supported Data Types
Unlike JavaScript, where schemas must be created using string representations of
their types, perspective-python
leverages Python's type system for schema
creation. A schema can be created with the following types:
int
float
bool
datetime.date
datetime.datetime
str
object
Loading Custom Objects
Custom objects can also be loaded into Perspective by using object
in the
schema, or implementing _psp_repr_
to return object
. Perspective stores a
reference to your object as an unsigned 64-bit integer (e.g. a pointer), and
uses __repr__
(or _psp_repr
if implemented) to represent the object.
You can customize how Perspective extracts data from your objects by implementing these two methods into your object:
_psp_repr_
: Since__repr__
can only return strings, this lets you return other values_psp_dtype_
: Perpspective will look at this to determine how to cast your objects' repr.
Time Zone Handling
- "Naive" datetimes are assumed to be local time.
- "Aware"
datetimes use the time zone specified in the
tzinfo
.
All datetime
columns (regardless of input time zone) are output to the user as
datetime.datetime
objects in local time according to the Python runtime.
This behavior is consistent with Perspective's behavior in JavaScript. For more
details, see this in-depth
explanation of
perspective-python
semantics around time zone handling.
Pandas Timestamps
- Naive
pandas.Timestamp
objects are always treated as UTC times, and will be converted to local time when output to the user. - Aware
pandas.Timestamp
objects use the time zone specified intzinfo
. Usetz_localize
ortz_convert
to provide theTimestamp
with a time zone.
Callbacks and Events
perspective.Table
allows for on_update
and on_delete
callbacks to be
set—simply call on_update
or on_delete
with a reference to a function or a
lambda without any parameters:
def update_callback():
print("Updated!")
# set the update callback
view.on_update(update_callback)
def delete_callback():
print("Deleted!")
# set the delete callback
view.on_delete(delete_callback)
# set a lambda as a callback
view.on_delete(lambda: print("Deleted x2!"))
If the callback is a named reference to a function, it can be removed with
remove_update
or remove_delete
:
view.remove_update(update_callback)
view.remove_delete(delete_callback)
Callbacks defined with a lambda function cannot be removed, as lambda functions have no identifier.
PerspectiveManager
PerspectiveManager
offers an interface for hosting multiple
perspective.Table
and perspective.View
instances, extending their interfaces
to operate with the JavaScript library over a websocket connection.
PerspectiveManager
is required to enable perspective-python
to
operate remotely
using a websocket API.
Async Mode
By default, perspective
will run with a synchronous interface. Using the
PerspectiveManager.set_loop_callback()
method, perspective
can be configured
to defer the application of side-effectful calls like update()
to an event
loop, such as tornado.ioloop.IOLoop
. There are a few important differences
when running PerspectiveManager
in this mode:
- Calls to methods like
update()
will return immediately, and the reciprocalon_update()
callbacks will be invoked on an event later scheduled. Calls to other methods which require an up-to-date object, but will still synchronously apply the pending update. - Updates will be conflated when multiple calls to
update()
occur before the scheduled application. In such cases, you may receive a singleon_update()
notification for multipleupdate()
calls.
For example, using Tornado IOLoop
you can create a dedicated thread or pool
for a PerspectiveManager
:
manager = perspective.PerspectiveManager()
def perspective_thread():
loop = tornado.ioloop.IOLoop()
manager.set_loop_callback(loop.add_callback)
loop.start()
thread = threading.Thread(target=perspective_thread)
thread.daemon = True
thread.start()
Multi-threading
When running in Async mode, Perspective will release the GIL while dispatching
to an internal thread pool for some operations, enabling better parallelism and
overall better server performance. However, Perspective's Python interface
itself will still process queries in a single queue. To enable parallel query
processing, call set_loop_callback
with a multi-threaded executor such as
concurrent.futures.ThreadPoolExecutor
:
def perspective_thread():
loop = tornado.ioloop.IOLoop()
with concurrent.futures.ThreadPoolExecutor() as executor:
manager.set_loop_callback(loop.run_in_executor, executor)
loop.start()
Hosting Table
and View
instances
PerspectiveManager
has the ability to "host" perspective.Table
and
perspective.View
instances. Hosted tables/views can have their methods called
from other sources than the Python server, i.e. by a perspective-viewer
running in a JavaScript client over the network, interfacing with
perspective-python
through the websocket API.
The server has full control of all hosted Table
and View
instances, and can
call any public API method on hosted instances. This makes it extremely easy to
stream data to a hosted Table
using .update()
:
manager = PerspectiveManager()
table = Table(data)
manager.host_table("data_source", table)
for i in range(10):
# updates continue to propagate automatically
table.update(new_data)
In situations where clients should only be able to view the table and not modify
it through update
, delete
, etc., initialize the PerspectiveManager
with
lock=True
, or call the lock()
method on a manager instance:
# lock prevents clients from calling methods that may mutate the state
# of the table.
manager = PerspectiveManager(lock=True)
table = Table(data)
manager.host_table("data_source", table)
A PerspectiveManager
instance can host as many Table
s and View
s as
necessary, but each Table
should only be hosted by one PerspectiveManager
.
To host a Table
or a View
, call the corresponding method on an instance of
PerspectiveManager
with a string name and the instance to be hosted:
manager = PerspectiveManager()
table = Table(data)
manager.host_table("data_source", table)
The name
provided is important, as it enables Perspective in JavaScript to
look up a Table
and get a handle to it over the network. This enables several
powerful server/client implementations of Perspective, as explained in the next
section.
Client/Server Replicated Mode
Using Tornado and
PerspectiveTornadoHandler
, as well as
Perspective
's JavaScript library, we can set up "distributed" Perspective
instances that allows multiple browser perspective-viewer
clients to read from
a common perspective-python
server, as in the
Tornado Example Project.
This architecture works by maintaining two Tables
—one on the server, and one
on the client that mirrors the server's Table
automatically using on_update
.
All updates to the table on the server are automatically applied to each client,
which makes this architecture a natural fit for streaming dashboards and other
distributed use-cases. In conjunction with Async Mode,
distributed Perspective offers consistently high performance over large numbers
of clients and large datasets.
server.py
from perspective import Table, PerspectiveManager, PerspectiveTornadoHandler
# Create an instance of PerspectiveManager, and host a Table
MANAGER = PerspectiveManager()
TABLE = Table(data)
# The Table is exposed at `localhost:8888/websocket` with the name `data_source`
MANAGER.host_table("data_source", TABLE)
app = tornado.web.Application([
(r"/", MainHandler),
# create a websocket endpoint that the client JavaScript can access
(r"/websocket", PerspectiveTornadoHandler, {"manager": MANAGER, "check_origin": True})
])
# Start the Tornado server
app.listen(8888)
loop = tornado.ioloop.IOLoop.current()
loop.start()
Instead of calling load(server_table)
, create a View
using server_table
and pass that into viewer.load()
. This will automatically register an
on_update
callback that synchronizes state between the server and the client.
index.html
<perspective-viewer id="viewer" editable></perspective-viewer>
<script>
window.addEventListener("DOMContentLoaded", async function () {
// Create a client that expects a Perspective server
// to accept connections at the specified URL.
const websocket = perspective.websocket(
"ws://localhost:8888/websocket"
);
// Get a handle to the Table on the server
const server_table = websocket.open_table("data_source_one");
// Create a new view
const server_view = await table.view();
// Create a Table on the client using `perspective.worker()`
const worker = perspective.worker();
const client_table = await worker.table(view);
// Load the client table in the `<perspective-viewer>`.
document.getElementById("viewer").load(client_table);
});
</script>
For a more complex example that offers distributed editing of the server dataset, see client_server_editing.html.
We also provide examples for Starlette/FastAPI and AIOHTTP:
Server-only Mode
The server setup is identical to Distributed Mode above,
but instead of creating a view, the client calls load(server_table)
: In
Python, use PerspectiveManager
and PerspectiveTornadoHandler
to create a
websocket server that exposes a Table
. In this example, table
is a proxy for
the Table
we created on the server. All API methods are available on
proxies, the.g.us calling view()
, schema()
, update()
on table
will
pass those operations to the Python Table
, execute the commands, and return
the result back to Javascript.
<perspective-viewer id="viewer" editable></perspective-viewer>
const websocket = perspective.websocket("ws://localhost:8888/websocket");
const table = websocket.open_table("data_source_one");
document.getElementById("viewer").load(table);
PerspectiveWidget
Building on top of the API provided by perspective.Table
, the
PerspectiveWidget
is a JupyterLab plugin that offers the entire functionality
of Perspective within the Jupyter environment. It supports the same API
semantics of <perspective-viewer>
, along with the additional data types
supported by perspective.Table
. PerspectiveWidget
takes keyword arguments
for the managed View
; additioanl arguments index
and limit
will be passed
to the Table
. For convenience are the
Aggregate
,
Sort
,
and
Plugin
enums, which can be used as replacements to string values in the API:
from perspective import PerspectiveWidget, Aggregate, Sort, Plugin
w = perspective.PerspectiveWidget(
data,
plugin=Plugin.XBAR,
aggregates={"datetime": Aggregate.ANY},
sort=[["date", Sort.DESC]]
)
Creating a widget
A widget is created through the PerspectiveWidget
constructor, which takes as
its first, required parameter a perspective.Table
, a dataset, a schema, or
None
, which serves as a special value that tells the Widget to defer loading
any data until later. In maintaining consistency with the Javascript API,
Widgets cannot be created with empty dictionaries or lists—None
should be used
if the intention is to await data for loading later on. A widget can be
constructed from a dataset:
from perspective import PerspectiveWidget, Table
PerspectiveWidget(data, group_by=["date"])
.. or a schema:
PerspectiveWidget({"a": int, "b": str})
.. or an instance of a perspective.Table
:
table = Table(data)
PerspectiveWidget(table)
.. or None
:
PerspectiveWidget(None)
PerspectiveRenderer
Perspective also exposes a JS-only mimerender-extension
. This lets you view
csv
, json
, and arrow
files directly from the file browser. You can see
this by right clicking one of these files and Open With->CSVPerspective
(or
JSONPerspective
or ArrowPerspective
). Perspective will also install itself
as the default handler for opening .arrow
files.
PerspectiveTornadoHandler
Perspective ships with a pre-built Tornado handler that makes integration with
tornado.websockets
extremely easy. This allows you to run an instance of
Perspective
on a server using Python, open a websocket to a Table
, and
access the Table
in JavaScript and through <perspective-viewer>
. All
instructions sent to the Table
are processed in Python, which executes the
commands, and returns its output through the websocket back to Javascript.
Python setup
To use the handler, we need to first have an instance of a Table
and a
PerspectiveManager
. The manager acts as the interface between the JavaScript
and Python layers, implementing a JSON API that allows the two Perspective
runtimes to communicate.
MANAGER = PerspectiveManager()
Once the manager has been created, create a Table
instance and call
host_table
on the manager with a name, passing through a reference to the
Table
you just created. host_table()
registers the Table with the manager
and allows the manager to send instructions to the Table.
The name that you host the table under is important—it acts as a unique accessor on the JavaScript side, which will look for a Table hosted at the websocket with the name you specify.
TABLE = Table(data)
MANAGER.host_table("data_source_one", TABLE)
After the manager and table setup is complete, create a websocket endpoint and
provide it a reference to PerspectiveTornadoHandler
. You must provide the
configuration object in the route tuple, and it must contain manager
, which is
a reference to the PerspectiveManager
you just created.
app = tornado.web.Application([
(r"/", MainHandler),
# create a websocket endpoint that the client JavaScript can access
(r"/websocket", PerspectiveTornadoHandler, {"manager": MANAGER, "check_origin": True})
])
Optionally, the configuration object can also include check_origin
, a boolean
that determines whether the websocket accepts requests from origins other than
where the server is hosted. See
Tornado docs
for more details.
JavaScript setup
Once the server is up and running, you can access the Table you just hosted
using perspective.websocket
and open_table()
. First, create a client that
expects a Perspective server to accept connections at the specified URL:
const websocket = perspective.websocket("ws://localhost:8888/websocket");
Next open the Table
we created on the server by name:
const table = websocket.open_table("data_source_one");
table
is a proxy for the Table
we created on the server. All operations that
are possible through the JavaScript API are possible on the Python API as well,
thus calling view()
, schema()
, update()
etc. on const table
will pass
those operations to the Python Table
, execute the commands, and return the
result back to JavaScript. Similarly, providing this table
to a
<perspective-viewer>
instance will allow virtual rendering:
viewer.load(table);
perspective.websocket
expects a Websocket URL where it will send instructions.
When open_table
is called, the name to a hosted Table is passed through, and a
request is sent through the socket to fetch the Table. No actual Table
instance is passed inbetween the runtimes; all instructions are proxied through
websockets.
This provides for great flexibility — while Perspective.js
is full of
features, browser WebAssembly runtimes currently have some performance
restrictions on memory and CPU feature utilization, and the architecture in
general suffers when the dataset itself is too large to download to the client
in full.
The Python runtime does not suffer from memory limitations, utilizes Apache
Arrow internal threadpools for threading and parallel processing, and generates
architecture optimized code, which currently makes it more suitable as a
server-side runtime than node.js
.