Files
strategy-lab/to_explore/pyquantnews/50_FactorEngineeringML.ipynb
David Brazda e3da60c647 daily update
2024-10-21 20:57:56 +02:00

7.2 KiB

No description has been provided for this image

This code acquires stock price data, calculates financial indicators, and analyzes their relationships with future returns. It uses OpenBB SDK to fetch stock data, filters it, and computes the Average True Range (ATR) as a volatility measure. The code then calculates historical returns over multiple time lags and sets up target variables for future returns. Finally, it visualizes the relationship between ATR and future returns and computes their correlation. This workflow is useful for quantitative analysis in financial markets.

In [ ]:
import pandas as pd
from openbb_terminal.sdk import openbb
from talib import ATR
from scipy.stats import spearmanr
import matplotlib.pyplot as plt
import seaborn as sns

Acquire stock data using OpenBB SDK and filter based on country and price criteria

In [ ]:
data = openbb.stocks.screener.screener_data(preset_loaded="most_volatile")
In [ ]:
universe = data[(data.Country == "USA") & (data.Price > 5)]

Fetch historical price data for each ticker and store in a list of DataFrames

In [ ]:
stocks = []
for ticker in universe.Ticker.tolist():
    df = openbb.stocks.load(ticker, start_date="2010-01-01", verbose=False).drop("Close", axis=1)
    df["ticker"] = ticker
    stocks.append(df)

Concatenate all DataFrames into a single DataFrame and rename columns

In [ ]:
prices = pd.concat(stocks)
prices.columns = ["open", "high", "low", "close", "volume", "ticker"]

Filter out stocks with insufficient data and remove duplicate entries

In [ ]:
nobs = prices.groupby("ticker").size()
mask = nobs[nobs > 2 * 12 * 21].index
prices = prices[prices.ticker.isin(mask)]
In [ ]:
prices = prices.set_index("ticker", append=True).reorder_levels(["ticker", "date"]).drop_duplicates()
In [ ]:
prices.drop_duplicates()

Calculate Average True Range (ATR) for each stock and standardize it

In [ ]:
def atr(data):
    """Calculate and standardize ATR.
    
    Parameters
    ----------
    data : DataFrame
        Data containing high, low, and close prices.
    
    Returns
    -------
    DataFrame
        Standardized ATR values.
    """
    df = ATR(data.high, data.low, data.close, timeperiod=14)
    return df.sub(df.mean()).div(df.std())
In [ ]:
prices["atr"] = prices.groupby('ticker', group_keys=False).apply(atr)

Calculate historical returns over different time lags and add them to the DataFrame

In [ ]:
lags = [1, 5, 10, 21, 42, 63]
for lag in lags:
    prices[f"return_{lag}d"] = prices.groupby(level="ticker").close.pct_change(lag)

Set up target variables for future returns by shifting historical returns

In [ ]:
for t in [1, 5, 10, 21]:
    prices[f"target_{t}d"] = prices.groupby(level="ticker")[f"return_{t}d"].shift(-t)

Visualize the relationship between ATR and future 1-day returns using Seaborn

In [ ]:
target = "target_1d"
metric = "atr"
j = sns.jointplot(x=metric, y=target, data=prices)
plt.tight_layout()

Calculate and print the Spearman correlation between ATR and future 1-day returns

In [ ]:
df = prices[[metric, target]].dropna()
r, p = spearmanr(df[metric], df[target])
print(f"{r:,.2%} ({p:.2%})")

PyQuant News is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to get started with Python for quant finance. For educational purposes. Not investment advise. Use at your own risk.