Using the IPython and Jupyter Magic
Work with pandas and PRQL in an IPython terminal or Jupyter notebook.
This is a thin wrapper around the fantastic
all we do is parse PRQL to SQL and pass that through to
A full documentation of the supported features is available at their
Here, we document those places where we differ from them,
plus those features we think you are mostly likely to find useful.
If you have already installed PyPRQL into your environment,
then you should be could to go!
We bundle in
though you’ll need to install
If you haven’t installed PyPRQL,
that’s as simple as:
pip install pyprql
Open up either an
IPython terminal or
Jupyter notebook. First, we need to
load the extension and connect to a database.
In : %load_ext pyprql.magic
Connecting a database
We have two options for connecting a database
Create an in-memory DB. This is the easiest way to get started.
In : %prql duckdb:///:memory:
However, in-memory databases start off empty! So, we need to add some data. We have a two options:
We can easily add a pandas dataframe to the
DuckDBdatabase like so:
In : %prql --persist df
dfis a pandas dataframe. This adds a table named
dfto the in-memory
Or download a CSV and query it directly, with DuckDB:
from products.csvwill work.
Connect to an existing database
When connecting to a database, pass the connection string as an argument to the line magic
%prql. The connection string needs to be in SQLAlchemy format, so any connection supported by
SQLAlchemyis supported by the magic. Additional connection parameters can be passed as a dictionary using the
--connection_argumentsflag to the the
%prqlline magic. We ship with the necessary extensions to use DuckDB as the backend, and here connect to an in-memory database.
Now, let’s do a query! By default,
PRQLMagic always returns the results as
dataframe, and always prints the results. The results of the previous query are
accessible in the
These examples are based on the
products.csv example above.
In : %%prql ...: from p = products.csv ...: filter supplierID == 1 Done. Returning data to local variable _ productID productName supplierID categoryID quantityPerUnit unitPrice unitsInStock unitsOnOrder reorderLevel discontinued 0 1 Chai 1 1 10 boxes x 20 bags 18.0 39 0 10 0 1 2 Chang 1 1 24 - 12 oz bottles 19.0 17 40 25 0 2 3 Aniseed Syrup 1 2 12 - 550 ml bottles 10.0 13 70 25 0
In : %%prql ...: from p = products.csv ...: group categoryID ( ...: aggregate [average unitPrice] ...: ) Done. Returning data to local variable _ categoryID avg("unitPrice") 0 1 37.979167 1 2 23.062500 2 7 32.370000 3 6 54.006667 4 8 20.682500 5 4 28.730000 6 3 25.160000 7 5 20.250000
We can capture the results into a different variable like so:
In : %%prql results << ...: from p = products.csv ...: aggregate [min unitsInStock, max unitsInStock] Done. Returning data to local variable results min("unitsInStock") max("unitsInStock") 0 0 125
Now, the output of the query is saved to
IPython-SQL, we currently only support queries in a cell magic (i.e.
%%prql), and not a line magic (i.e.
%prql). So the following will not
In : %prql from data | select freq
This mainly to work around some parsing challenges, and it may be added as a feature in a future release.
We strive to provide sane defaults;
should you need to change settings,
a list of settings is available using the
%config line magic.
In : %config PRQLMagic PRQLMagic(SqlMagic) options ------------------------- PRQLMagic.autocommit=<Bool> Set autocommit mode Current: True PRQLMagic.autolimit=<Int> Automatically limit the size of the returned result sets Current: 0 PRQLMagic.autopandas=<Bool> Return Pandas DataFrames instead of regular result sets Current: True PRQLMagic.autoview=<Bool> Display results Current: True PRQLMagic.column_local_vars=<Bool> Return data into local variables from column names Current: False PRQLMagic.displaycon=<Bool> Show connection string after execute Current: False PRQLMagic.displaylimit=<Int> Automatically limit the number of rows displayed (full result set is still stored) Current: None PRQLMagic.dsn_filename=<Unicode> Path to DSN file. When the first argument is of the form [section], a sqlalchemy connection string is formed from the matching section in the DSN file. Current: 'odbc.ini' PRQLMagic.feedback=<Bool> Print number of rows affected by DML Current: True PRQLMagic.short_errors=<Bool> Don't display the full traceback on SQL Programming Error Current: True PRQLMagic.style=<Unicode> Set the table printing style to any of prettytable's defined styles (currently DEFAULT, MSWORD_FRIENDLY, PLAIN_COLUMNS, RANDOM) Current: 'DEFAULT'
If you want to change any of these,
you can do that with the
%config line magic as well.
In : %config PRQLMagic.autoview = False