Using the IPython and Jupyter Magic
Work with pandas and PRQL in an IPython terminal or Jupyter notebook.
Implementation
This is a thin wrapper around ploomber/jupysql magic. Full documentation of the supported features is available at their repository. Here, we document the most salient features or those where we differ.
Usage
Installation
If you have already installed pyprql into your environment,
then you should be could to go!
We bundle in IPython
and pandas
,
though you’ll need to install Jupyter
separately.
If you haven’t installed pyprql,
that’s as simple as:
pip install pyprql
Set Up
Open up either an IPython
terminal or Jupyter
notebook. First, we need to
load the extension and connect to a database.
In [1]: %load_ext pyprql.magic
Connecting a database
We have two options for connecting a database
Create an in-memory DB. This is the easiest way to get started.
In [2]: %prql duckdb:///:memory:
However, in-memory databases start off empty! So, we need to add some data. We have a two options:
We can easily add a pandas dataframe to the
DuckDB
database like so:In [3]: %prql --persist df
where
df
is a pandas dataframe. This adds a table nameddf
to the in-memoryDuckDB
instance.Or download a CSV and query it directly, with DuckDB:
!wget https://raw.githubusercontent.com/graphql-compose/graphql-compose-examples/master/examples/northwind/data/csv/products.csv
…and then
from `products.csv`
will work.
Connect to an existing database
When connecting to a database, pass the connection string as an argument to the line magic
%prql
. The connection string needs to be in SQLAlchemy format, so any connection supported bySQLAlchemy
is supported by the magic. Additional connection parameters can be passed as a dictionary using the--connection_arguments
flag to the the%prql
line magic. We ship with the necessary extensions to use DuckDB as the backend, and here connect to an in-memory database.
Querying
Now, let’s do a query! By default, PrqlMagic
always returns the results as
dataframe, and always prints the results. The results of the previous query are
accessible in the _
variable.
These examples are based on the from `products.csv`
example above.
In [4]: %%prql
...: from p = `products.csv`
...: filter supplierID == 1
Done.
Returning data to local variable _
productID productName supplierID categoryID quantityPerUnit unitPrice unitsInStock unitsOnOrder reorderLevel discontinued
0 1 Chai 1 1 10 boxes x 20 bags 18.0 39 0 10 0
1 2 Chang 1 1 24 - 12 oz bottles 19.0 17 40 25 0
2 3 Aniseed Syrup 1 2 12 - 550 ml bottles 10.0 13 70 25 0
In [5]: %%prql
...: from p = `products.csv`
...: group categoryID (
...: aggregate {average unitPrice}
...: )
Done.
Returning data to local variable _
categoryID avg("unitPrice")
0 1 37.979167
1 2 23.062500
2 7 32.370000
3 6 54.006667
4 8 20.682500
5 4 28.730000
6 3 25.160000
7 5 20.250000
We can capture the results into a different variable like so:
In [6]: %%prql results <<
...: from p = `products.csv`
...: aggregate {min unitsInStock, max unitsInStock}
Done.
Returning data to local variable results
min("unitsInStock") max("unitsInStock")
0 0 125
Now, the output of the query is saved to results
.
We can also use a line magic to capture the results like this:
In [7]: results = %prql from p = `products.csv` | aggregate {min unitsInStock, max unitsInStock}
Configuration
We strive to provide sane defaults;
however,
should you need to change settings,
a list of settings is available using the %config
line magic.
In [8]: %config PrqlMagic
PrqlMagic(SqlMagic) options
-------------------------
PrqlMagic.autocommit=<Bool>
Set autocommit mode
Current: True
PrqlMagic.autolimit=<Int>
Automatically limit the size of the returned result sets
Current: 0
PrqlMagic.autopandas=<Bool>
Return Pandas DataFrames instead of regular result sets
Current: True
PrqlMagic.autopolars=<Bool>
Return Polars DataFrames instead of regular result sets
Current: False
PrqlMagic.autoview=<Bool>
Display results
Current: True
PrqlMagic.column_local_vars=<Bool>
Return data into local variables from column names
Current: False
PrqlMagic.displaycon=<Bool>
Show connection string after execute
Current: False
PrqlMagic.displaylimit=<Int>
Automatically limit the number of rows displayed (full result set is still
stored)
Current: None
PrqlMagic.dryrun=<Bool>
Only print the compiled SQL
Current: False
PrqlMagic.dsn_filename=<Unicode>
Path to DSN file. When the first argument is of the form [section], a
sqlalchemy connection string is formed from the matching section in the DSN
file.
Current: 'odbc.ini'
PrqlMagic.feedback=<Bool>
Print number of rows affected by DML
Current: False
PrqlMagic.polars_dataframe_kwargs=<key-1>=<value-1>...
Polars DataFrame constructor keyword arguments(e.g. infer_schema_length,
nan_to_null, schema_overrides, etc)
Current: {}
PrqlMagic.short_errors=<Bool>
Don't display the full traceback on SQL Programming Error
Current: True
PrqlMagic.style=<Unicode>
Set the table printing style to any of prettytable's defined styles
(currently DEFAULT, MSWORD_FRIENDLY, PLAIN_COLUMNS, RANDOM)
Current: 'DEFAULT'
PrqlMagic.target=<Unicode>
Compile target of prql-compiler
Current: 'sql.any'
If you want to change any of these,
you can do that with the %config
line magic as well.
In [9]: %config PrqlMagic.autoview = False