16 KiB
This code analyzes the cointegration of stock pairs using historical price data. It downloads stock prices, finds cointegrated pairs, and calculates their spread. Cointegration indicates a stable, long-term relationship between stock pairs, useful for statistical arbitrage. Visualization tools like heatmaps and rolling z-scores help to identify trading signals. The code is practical in pairs trading strategies and quantitative finance.
import numpy as np import pandas as pd
import statsmodels.api as sm from statsmodels.tsa.stattools import coint from statsmodels.regression.rolling import RollingOLS
import yfinance as yf import seaborn import matplotlib.pyplot as plt
Define the list of stock symbols to analyze
symbol_list = ['meta', 'amzn', 'aapl', 'nflx', 'goog']
Download historical adjusted closing prices for the specified symbols
data = yf.download(symbol_list, start='2014-01-01', end='2015-01-01')['Adj Close']
Define a function to find cointegrated pairs of stocks
def find_cointegrated_pairs(data): """Find cointegrated stock pairs This function calculates cointegration scores and p-values for stock pairs. Parameters ---------- data : pd.DataFrame DataFrame of stock prices with stocks as columns. Returns ------- score_matrix : np.ndarray Matrix of cointegration scores. pvalue_matrix : np.ndarray Matrix of p-values for cointegration tests. pairs : list List of cointegrated stock pairs. """ # Initialize matrices for scores and p-values, and a list for pairs n = data.shape[1] score_matrix = np.zeros((n, n)) pvalue_matrix = np.ones((n, n)) keys = data.keys() pairs = [] # Loop over combinations of stock pairs to test for cointegration for i in range(n): for j in range(i + 1, n): S1 = data[keys[i]] S2 = data[keys[j]] result = coint(S1, S2) score = result[0] pvalue = result[1] score_matrix[i, j] = score pvalue_matrix[i, j] = pvalue # Add pair to list if p-value is less than 0.05 if pvalue < 0.05: pairs.append((keys[i], keys[j])) return score_matrix, pvalue_matrix, pairs
Find cointegrated pairs and store scores, p-values, and pairs
scores, pvalues, pairs = find_cointegrated_pairs(data)
Visualize the p-value matrix as a heatmap
seaborn.heatmap( pvalues, xticklabels=symbol_list, yticklabels=symbol_list, cmap='RdYlGn_r', mask=(pvalues >= 0.10) )
Select two stocks, Amazon (AMZN) and Apple (AAPL), for further analysis
S1 = data.AMZN S2 = data.AAPL
Perform a cointegration test on the selected pair
score, pvalue, _ = coint(S1, S2)
Print the p-value of the cointegration test
pvalue
Add a constant term to the Amazon stock prices for regression analysis
S1 = sm.add_constant(S1)
Fit an Ordinary Least Squares (OLS) regression model using Apple as the dependent variable
results = sm.OLS(S2, S1).fit()
Remove the constant term from Amazon stock prices
S1 = S1.AMZN
Extract the regression coefficient (beta) for Amazon
b = results.params['AMZN']
Calculate the spread between Apple and the beta-adjusted Amazon prices
spread = S2 - b * S1
Plot the spread and its mean value
spread.plot() plt.axhline(spread.mean(), color='black') plt.legend(['Spread'])
Define a function to calculate the z-score of a series
def zscore(series): """Calculate z-score of a series This function returns the z-score for a given series. Parameters ---------- series : pd.Series A pandas Series for which to calculate z-score. Returns ------- zscore : pd.Series Z-score of the input series. """ return (series - series.mean()) / np.std(series)
Plot the z-score of the spread with mean and threshold lines
zscore(spread).plot() plt.axhline(zscore(spread).mean(), color='black') plt.axhline(1.0, color='red', linestyle='--') plt.axhline(-1.0, color='green', linestyle='--') plt.legend(['Spread z-score', 'Mean', '+1', '-1'])
Create a DataFrame with the signal and position size in the pair
trades = pd.concat([zscore(spread), S2 - b * S1], axis=1) trades.columns = ["signal", "position"]
Add long and short positions based on z-score thresholds
trades["side"] = 0.0 trades.loc[trades.signal <= -1, "side"] = 1 trades.loc[trades.signal >= 1, "side"] = -1
Calculate and plot cumulative returns from the trading strategy
returns = trades.position.pct_change() * trades.side returns.cumsum().plot()
Print the trades DataFrame for inspection
trades
Calculate the spread using a rolling OLS model with a 30-day window
model = RollingOLS(endog=S1, exog=S2, window=30) rres = model.fit() spread = S2 - rres.params.AAPL * S1 spread.name = 'spread'
Calculate 1-day and 30-day moving averages of the spread
spread_mavg1 = spread.rolling(1).mean() spread_mavg1.name = 'spread 1d mavg' spread_mavg30 = spread.rolling(30).mean() spread_mavg30.name = 'spread 30d mavg'
Plot the 1-day and 30-day moving averages of the spread
plt.plot(spread_mavg1.index, spread_mavg1.values) plt.plot(spread_mavg30.index, spread_mavg30.values) plt.legend(['1 Day Spread MAVG', '30 Day Spread MAVG']) plt.ylabel('Spread')
Calculate the rolling 30-day standard deviation of the spread
std_30 = spread.rolling(30).std() std_30.name = 'std 30d'
Compute and plot the z-score of the spread using 30-day moving averages and standard deviation
zscore_30_1 = (spread_mavg1 - spread_mavg30) / std_30 zscore_30_1.name = 'z-score' zscore_30_1.plot() plt.axhline(0, color='black') plt.axhline(1.0, color='red', linestyle='--')
Plot the scaled stock prices and the rolling z-score for comparison
plt.plot(S1.index, S1.values / 10) plt.plot(S2.index, S2.values / 10) plt.plot(zscore_30_1.index, zscore_30_1.values) plt.legend(['S1 Price / 10', 'S2 Price / 10', 'Price Spread Rolling z-Score'])
Update the symbol list and download new data for another analysis period
symbol_list = ['amzn', 'aapl'] data = yf.download(symbol_list, start='2015-01-01', end='2016-01-01')['Adj Close']
Select Amazon (AMZN) and Apple (AAPL) prices from the new data
S1 = data.AMZN S2 = data.AAPL
Perform a cointegration test on the new data and print the p-value
score, pvalue, _ = coint(S1, S2) print(pvalue)
PyQuant News is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to get started with Python for quant finance. For educational purposes. Not investment advise. Use at your own risk.
