Files
strategy-lab/to_explore/pyquantnews/21_PairsTrading.ipynb
David Brazda e3da60c647 daily update
2024-10-21 20:57:56 +02:00

16 KiB

No description has been provided for this image

This code analyzes the cointegration of stock pairs using historical price data. It downloads stock prices, finds cointegrated pairs, and calculates their spread. Cointegration indicates a stable, long-term relationship between stock pairs, useful for statistical arbitrage. Visualization tools like heatmaps and rolling z-scores help to identify trading signals. The code is practical in pairs trading strategies and quantitative finance.

In [ ]:
import numpy as np
import pandas as pd
In [ ]:
import statsmodels.api as sm
from statsmodels.tsa.stattools import coint
from statsmodels.regression.rolling import RollingOLS
In [ ]:
import yfinance as yf
import seaborn
import matplotlib.pyplot as plt

Define the list of stock symbols to analyze

In [ ]:
symbol_list = ['meta', 'amzn', 'aapl', 'nflx', 'goog']

Download historical adjusted closing prices for the specified symbols

In [ ]:
data = yf.download(symbol_list, start='2014-01-01', end='2015-01-01')['Adj Close']

Define a function to find cointegrated pairs of stocks

In [ ]:
def find_cointegrated_pairs(data):
    """Find cointegrated stock pairs

    This function calculates cointegration scores and p-values for stock pairs.

    Parameters
    ----------
    data : pd.DataFrame
        DataFrame of stock prices with stocks as columns.

    Returns
    -------
    score_matrix : np.ndarray
        Matrix of cointegration scores.
    pvalue_matrix : np.ndarray
        Matrix of p-values for cointegration tests.
    pairs : list
        List of cointegrated stock pairs.
    """
    
    # Initialize matrices for scores and p-values, and a list for pairs
    n = data.shape[1]
    score_matrix = np.zeros((n, n))
    pvalue_matrix = np.ones((n, n))
    keys = data.keys()
    pairs = []

    # Loop over combinations of stock pairs to test for cointegration
    for i in range(n):
        for j in range(i + 1, n):
            S1 = data[keys[i]]
            S2 = data[keys[j]]
            result = coint(S1, S2)
            score = result[0]
            pvalue = result[1]
            score_matrix[i, j] = score
            pvalue_matrix[i, j] = pvalue
            
            # Add pair to list if p-value is less than 0.05
            if pvalue < 0.05:
                pairs.append((keys[i], keys[j]))
    
    return score_matrix, pvalue_matrix, pairs

Find cointegrated pairs and store scores, p-values, and pairs

In [ ]:
scores, pvalues, pairs = find_cointegrated_pairs(data)

Visualize the p-value matrix as a heatmap

In [ ]:
seaborn.heatmap(
    pvalues, 
    xticklabels=symbol_list, 
    yticklabels=symbol_list, 
    cmap='RdYlGn_r', 
    mask=(pvalues >= 0.10)
)

Select two stocks, Amazon (AMZN) and Apple (AAPL), for further analysis

In [ ]:
S1 = data.AMZN
S2 = data.AAPL

Perform a cointegration test on the selected pair

In [ ]:
score, pvalue, _ = coint(S1, S2)

Print the p-value of the cointegration test

In [ ]:
pvalue

Add a constant term to the Amazon stock prices for regression analysis

In [ ]:
S1 = sm.add_constant(S1)

Fit an Ordinary Least Squares (OLS) regression model using Apple as the dependent variable

In [ ]:
results = sm.OLS(S2, S1).fit()

Remove the constant term from Amazon stock prices

In [ ]:
S1 = S1.AMZN

Extract the regression coefficient (beta) for Amazon

In [ ]:
b = results.params['AMZN']

Calculate the spread between Apple and the beta-adjusted Amazon prices

In [ ]:
spread = S2 - b * S1

Plot the spread and its mean value

In [ ]:
spread.plot()
plt.axhline(spread.mean(), color='black')
plt.legend(['Spread'])

Define a function to calculate the z-score of a series

In [ ]:
def zscore(series):
    """Calculate z-score of a series

    This function returns the z-score for a given series.

    Parameters
    ----------
    series : pd.Series
        A pandas Series for which to calculate z-score.

    Returns
    -------
    zscore : pd.Series
        Z-score of the input series.
    """
    
    return (series - series.mean()) / np.std(series)

Plot the z-score of the spread with mean and threshold lines

In [ ]:
zscore(spread).plot()
plt.axhline(zscore(spread).mean(), color='black')
plt.axhline(1.0, color='red', linestyle='--')
plt.axhline(-1.0, color='green', linestyle='--')
plt.legend(['Spread z-score', 'Mean', '+1', '-1'])

Create a DataFrame with the signal and position size in the pair

In [ ]:
trades = pd.concat([zscore(spread), S2 - b * S1], axis=1)
trades.columns = ["signal", "position"]

Add long and short positions based on z-score thresholds

In [ ]:
trades["side"] = 0.0
trades.loc[trades.signal <= -1, "side"] = 1
trades.loc[trades.signal >= 1, "side"] = -1

Calculate and plot cumulative returns from the trading strategy

In [ ]:
returns = trades.position.pct_change() * trades.side
returns.cumsum().plot()

Print the trades DataFrame for inspection

In [ ]:
trades

Calculate the spread using a rolling OLS model with a 30-day window

In [ ]:
model = RollingOLS(endog=S1, exog=S2, window=30)
rres = model.fit()
spread = S2 - rres.params.AAPL * S1
spread.name = 'spread'

Calculate 1-day and 30-day moving averages of the spread

In [ ]:
spread_mavg1 = spread.rolling(1).mean()
spread_mavg1.name = 'spread 1d mavg'
spread_mavg30 = spread.rolling(30).mean()
spread_mavg30.name = 'spread 30d mavg'

Plot the 1-day and 30-day moving averages of the spread

In [ ]:
plt.plot(spread_mavg1.index, spread_mavg1.values)
plt.plot(spread_mavg30.index, spread_mavg30.values)
plt.legend(['1 Day Spread MAVG', '30 Day Spread MAVG'])
plt.ylabel('Spread')

Calculate the rolling 30-day standard deviation of the spread

In [ ]:
std_30 = spread.rolling(30).std()
std_30.name = 'std 30d'

Compute and plot the z-score of the spread using 30-day moving averages and standard deviation

In [ ]:
zscore_30_1 = (spread_mavg1 - spread_mavg30) / std_30
zscore_30_1.name = 'z-score'
zscore_30_1.plot()
plt.axhline(0, color='black')
plt.axhline(1.0, color='red', linestyle='--')

Plot the scaled stock prices and the rolling z-score for comparison

In [ ]:
plt.plot(S1.index, S1.values / 10)
plt.plot(S2.index, S2.values / 10)
plt.plot(zscore_30_1.index, zscore_30_1.values)
plt.legend(['S1 Price / 10', 'S2 Price / 10', 'Price Spread Rolling z-Score'])

Update the symbol list and download new data for another analysis period

In [ ]:
symbol_list = ['amzn', 'aapl']
data = yf.download(symbol_list, start='2015-01-01', end='2016-01-01')['Adj Close']

Select Amazon (AMZN) and Apple (AAPL) prices from the new data

In [ ]:
S1 = data.AMZN
S2 = data.AAPL

Perform a cointegration test on the new data and print the p-value

In [ ]:
score, pvalue, _ = coint(S1, S2)
print(pvalue)

PyQuant News is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to get started with Python for quant finance. For educational purposes. Not investment advise. Use at your own risk.