Files
strategy-lab/to_explore/pyquantnews/53_PolarsVsPandas.ipynb
David Brazda e3da60c647 daily update
2024-10-21 20:57:56 +02:00

10 KiB

No description has been provided for this image

This code retrieves S&P 500 company tickers and fetches historical index data for those companies using the OpenBB SDK. It converts the data between Pandas and Polars dataframes and performs various operations like writing to CSV, reading from CSV, selecting, filtering, and grouping data. The code also measures the performance of these operations using Pandas and Polars. This is useful for comparing the efficiency of data manipulation operations between the two libraries in practice.

In [ ]:
import pandas as pd
import polars as pl
In [ ]:
from openbb_terminal.sdk import openbb

Retrieve S&P 500 tickers from Wikipedia and create a list of symbols

In [ ]:
table = pd.read_html("http://en.wikipedia.org/wiki/List_of_S%26P_500_companies")[0]
tickers = table.Symbol.tolist()

Fetch historical index data for the retrieved tickers using OpenBB SDK

In [ ]:
df_pandas = openbb.economy.index(tickers, start_date="1990-01-01")

Save the fetched data to a CSV file using Pandas

In [ ]:
df_pandas.to_csv("data.csv")

Convert the Pandas dataframe to a Polars dataframe

In [ ]:
df_polars = pl.from_pandas(df_pandas)

Measure the time taken to write the Pandas dataframe to a CSV file

In [ ]:
%timeit df_pandas.to_csv("data.csv", index=False)

Measure the time taken to write the Polars dataframe to a CSV file

In [ ]:
%timeit df_polars.write_csv("data.csv")

Measure the time taken to read the CSV file into a Pandas dataframe

In [ ]:
%timeit pd.read_csv("data.csv")

Measure the time taken to read the CSV file into a Polars dataframe

In [ ]:
%timeit pl.scan_csv("data.csv")

Select the first 100 tickers from the list

In [ ]:
selected = tickers[:100]

Measure the time taken to select columns in the Pandas dataframe

In [ ]:
%timeit df_pandas[selected]

Measure the time taken to select columns in the Polars dataframe

In [ ]:
%timeit df_polars.select(pl.col(selected))

Measure the time taken to filter rows in the Pandas dataframe where 'GE' > 100

In [ ]:
%timeit df_pandas[df_pandas["GE"] > 100]

Measure the time taken to filter rows in the Polars dataframe where 'GE' > 100

In [ ]:
%timeit df_polars.filter(pl.col("GE") > 100)

Measure the time taken to group by 'GE' and calculate the mean in the Pandas dataframe

In [ ]:
%timeit df_pandas.groupby("GE").mean()

Measure the time taken to group by 'GE' and calculate the mean in the Polars dataframe

In [ ]:
%timeit df_polars.groupby("GE").mean()

Measure the time taken to create a new column 'GE_Return' with percentage change in Pandas

In [ ]:
%timeit df_pandas.assign(GE_Return=df_pandas["GE"].pct_change())

Measure the time taken to create a new column 'GE_return' with percentage change in Polars

In [ ]:
%timeit df_polars.with_columns((pl.col("GE").pct_change()).alias("GE_return"))

Measure the time taken to fill missing values in the 'GE' column with 0 in Pandas

In [ ]:
%timeit df_pandas["GE"].fillna(0)

Measure the time taken to fill missing values in the 'GE' column with 0 in Polars

In [ ]:
%timeit df_polars.with_columns(pl.col("GE").fill_null(0))

Measure the time taken to sort the dataframe by the 'GE' column in Pandas

In [ ]:
%timeit df_pandas.sort_values("GE")

Measure the time taken to sort the dataframe by the 'GE' column in Polars

In [ ]:
%timeit df_polars.sort("GE")

Measure the time taken to calculate the rolling mean for 'GE' with a window of 20 in Pandas

In [ ]:
%timeit df_pandas.GE.rolling(window=20).mean()

Measure the time taken to calculate the rolling mean for 'GE' with a window of 20 in Polars

In [ ]:
%timeit df_polars.with_columns(pl.col("GE").rolling_mean(20))

PyQuant News is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to get started with Python for quant finance. For educational purposes. Not investment advise. Use at your own risk.