Files
strategy-lab/to_explore/pyquantnews/21_PairsTrading.ipynb
David Brazda e3da60c647 daily update
2024-10-21 20:57:56 +02:00

698 lines
16 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "7dc325ea",
"metadata": {},
"source": [
"<div style=\"background-color:#000;\"><img src=\"pqn.png\"></img></div>"
]
},
{
"cell_type": "markdown",
"id": "7fbdbb7d",
"metadata": {},
"source": [
"This code analyzes the cointegration of stock pairs using historical price data. It downloads stock prices, finds cointegrated pairs, and calculates their spread. Cointegration indicates a stable, long-term relationship between stock pairs, useful for statistical arbitrage. Visualization tools like heatmaps and rolling z-scores help to identify trading signals. The code is practical in pairs trading strategies and quantitative finance."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "66c329f2",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "46e13192",
"metadata": {},
"outputs": [],
"source": [
"import statsmodels.api as sm\n",
"from statsmodels.tsa.stattools import coint\n",
"from statsmodels.regression.rolling import RollingOLS"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ace9406d",
"metadata": {},
"outputs": [],
"source": [
"import yfinance as yf\n",
"import seaborn\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"id": "fef80814",
"metadata": {},
"source": [
"Define the list of stock symbols to analyze"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3656066",
"metadata": {},
"outputs": [],
"source": [
"symbol_list = ['meta', 'amzn', 'aapl', 'nflx', 'goog']"
]
},
{
"cell_type": "markdown",
"id": "aea87a6c",
"metadata": {},
"source": [
"Download historical adjusted closing prices for the specified symbols"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "30fcec98",
"metadata": {
"lines_to_next_cell": 1
},
"outputs": [],
"source": [
"data = yf.download(symbol_list, start='2014-01-01', end='2015-01-01')['Adj Close']"
]
},
{
"cell_type": "markdown",
"id": "060ae221",
"metadata": {},
"source": [
"Define a function to find cointegrated pairs of stocks"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "56b50383",
"metadata": {
"lines_to_next_cell": 1
},
"outputs": [],
"source": [
"def find_cointegrated_pairs(data):\n",
" \"\"\"Find cointegrated stock pairs\n",
"\n",
" This function calculates cointegration scores and p-values for stock pairs.\n",
"\n",
" Parameters\n",
" ----------\n",
" data : pd.DataFrame\n",
" DataFrame of stock prices with stocks as columns.\n",
"\n",
" Returns\n",
" -------\n",
" score_matrix : np.ndarray\n",
" Matrix of cointegration scores.\n",
" pvalue_matrix : np.ndarray\n",
" Matrix of p-values for cointegration tests.\n",
" pairs : list\n",
" List of cointegrated stock pairs.\n",
" \"\"\"\n",
" \n",
" # Initialize matrices for scores and p-values, and a list for pairs\n",
" n = data.shape[1]\n",
" score_matrix = np.zeros((n, n))\n",
" pvalue_matrix = np.ones((n, n))\n",
" keys = data.keys()\n",
" pairs = []\n",
"\n",
" # Loop over combinations of stock pairs to test for cointegration\n",
" for i in range(n):\n",
" for j in range(i + 1, n):\n",
" S1 = data[keys[i]]\n",
" S2 = data[keys[j]]\n",
" result = coint(S1, S2)\n",
" score = result[0]\n",
" pvalue = result[1]\n",
" score_matrix[i, j] = score\n",
" pvalue_matrix[i, j] = pvalue\n",
" \n",
" # Add pair to list if p-value is less than 0.05\n",
" if pvalue < 0.05:\n",
" pairs.append((keys[i], keys[j]))\n",
" \n",
" return score_matrix, pvalue_matrix, pairs"
]
},
{
"cell_type": "markdown",
"id": "c154e226",
"metadata": {},
"source": [
"Find cointegrated pairs and store scores, p-values, and pairs"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "efff3232",
"metadata": {},
"outputs": [],
"source": [
"scores, pvalues, pairs = find_cointegrated_pairs(data)"
]
},
{
"cell_type": "markdown",
"id": "44112349",
"metadata": {},
"source": [
"Visualize the p-value matrix as a heatmap"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "288ff7d1",
"metadata": {},
"outputs": [],
"source": [
"seaborn.heatmap(\n",
" pvalues, \n",
" xticklabels=symbol_list, \n",
" yticklabels=symbol_list, \n",
" cmap='RdYlGn_r', \n",
" mask=(pvalues >= 0.10)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "65dabf09",
"metadata": {},
"source": [
"Select two stocks, Amazon (AMZN) and Apple (AAPL), for further analysis"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2c41310c",
"metadata": {},
"outputs": [],
"source": [
"S1 = data.AMZN\n",
"S2 = data.AAPL"
]
},
{
"cell_type": "markdown",
"id": "5edd06ec",
"metadata": {},
"source": [
"Perform a cointegration test on the selected pair"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83fc4b52",
"metadata": {},
"outputs": [],
"source": [
"score, pvalue, _ = coint(S1, S2)"
]
},
{
"cell_type": "markdown",
"id": "1ad95d73",
"metadata": {},
"source": [
"Print the p-value of the cointegration test"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "240c1103",
"metadata": {},
"outputs": [],
"source": [
"pvalue"
]
},
{
"cell_type": "markdown",
"id": "51882f40",
"metadata": {},
"source": [
"Add a constant term to the Amazon stock prices for regression analysis"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4bf6bfd7",
"metadata": {},
"outputs": [],
"source": [
"S1 = sm.add_constant(S1)"
]
},
{
"cell_type": "markdown",
"id": "fa8668d3",
"metadata": {},
"source": [
"Fit an Ordinary Least Squares (OLS) regression model using Apple as the dependent variable"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fbd88ec3",
"metadata": {},
"outputs": [],
"source": [
"results = sm.OLS(S2, S1).fit()"
]
},
{
"cell_type": "markdown",
"id": "e8d11ac1",
"metadata": {},
"source": [
"Remove the constant term from Amazon stock prices"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6d1de3f7",
"metadata": {},
"outputs": [],
"source": [
"S1 = S1.AMZN"
]
},
{
"cell_type": "markdown",
"id": "a38f4599",
"metadata": {},
"source": [
"Extract the regression coefficient (beta) for Amazon"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "03b56755",
"metadata": {},
"outputs": [],
"source": [
"b = results.params['AMZN']"
]
},
{
"cell_type": "markdown",
"id": "0199e5b8",
"metadata": {},
"source": [
"Calculate the spread between Apple and the beta-adjusted Amazon prices"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "67d972a7",
"metadata": {},
"outputs": [],
"source": [
"spread = S2 - b * S1"
]
},
{
"cell_type": "markdown",
"id": "da2513be",
"metadata": {},
"source": [
"Plot the spread and its mean value"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "29b6df5f",
"metadata": {
"lines_to_next_cell": 1
},
"outputs": [],
"source": [
"spread.plot()\n",
"plt.axhline(spread.mean(), color='black')\n",
"plt.legend(['Spread'])"
]
},
{
"cell_type": "markdown",
"id": "c0cc2832",
"metadata": {},
"source": [
"Define a function to calculate the z-score of a series"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "56b5ca81",
"metadata": {
"lines_to_next_cell": 1
},
"outputs": [],
"source": [
"def zscore(series):\n",
" \"\"\"Calculate z-score of a series\n",
"\n",
" This function returns the z-score for a given series.\n",
"\n",
" Parameters\n",
" ----------\n",
" series : pd.Series\n",
" A pandas Series for which to calculate z-score.\n",
"\n",
" Returns\n",
" -------\n",
" zscore : pd.Series\n",
" Z-score of the input series.\n",
" \"\"\"\n",
" \n",
" return (series - series.mean()) / np.std(series)"
]
},
{
"cell_type": "markdown",
"id": "ef4d6e5c",
"metadata": {},
"source": [
"Plot the z-score of the spread with mean and threshold lines"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e28d66d0",
"metadata": {},
"outputs": [],
"source": [
"zscore(spread).plot()\n",
"plt.axhline(zscore(spread).mean(), color='black')\n",
"plt.axhline(1.0, color='red', linestyle='--')\n",
"plt.axhline(-1.0, color='green', linestyle='--')\n",
"plt.legend(['Spread z-score', 'Mean', '+1', '-1'])"
]
},
{
"cell_type": "markdown",
"id": "daf80128",
"metadata": {},
"source": [
"Create a DataFrame with the signal and position size in the pair"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3e3b8c8f",
"metadata": {},
"outputs": [],
"source": [
"trades = pd.concat([zscore(spread), S2 - b * S1], axis=1)\n",
"trades.columns = [\"signal\", \"position\"]"
]
},
{
"cell_type": "markdown",
"id": "82f2b445",
"metadata": {},
"source": [
"Add long and short positions based on z-score thresholds"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "102994fc",
"metadata": {},
"outputs": [],
"source": [
"trades[\"side\"] = 0.0\n",
"trades.loc[trades.signal <= -1, \"side\"] = 1\n",
"trades.loc[trades.signal >= 1, \"side\"] = -1"
]
},
{
"cell_type": "markdown",
"id": "20667ab3",
"metadata": {},
"source": [
"Calculate and plot cumulative returns from the trading strategy"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "202ffdea",
"metadata": {},
"outputs": [],
"source": [
"returns = trades.position.pct_change() * trades.side\n",
"returns.cumsum().plot()"
]
},
{
"cell_type": "markdown",
"id": "ade9d6ff",
"metadata": {},
"source": [
"Print the trades DataFrame for inspection"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a176d604",
"metadata": {},
"outputs": [],
"source": [
"trades"
]
},
{
"cell_type": "markdown",
"id": "d7180626",
"metadata": {},
"source": [
"Calculate the spread using a rolling OLS model with a 30-day window"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c7690187",
"metadata": {},
"outputs": [],
"source": [
"model = RollingOLS(endog=S1, exog=S2, window=30)\n",
"rres = model.fit()\n",
"spread = S2 - rres.params.AAPL * S1\n",
"spread.name = 'spread'"
]
},
{
"cell_type": "markdown",
"id": "9b5f37bb",
"metadata": {},
"source": [
"Calculate 1-day and 30-day moving averages of the spread"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f64b6a8c",
"metadata": {},
"outputs": [],
"source": [
"spread_mavg1 = spread.rolling(1).mean()\n",
"spread_mavg1.name = 'spread 1d mavg'\n",
"spread_mavg30 = spread.rolling(30).mean()\n",
"spread_mavg30.name = 'spread 30d mavg'"
]
},
{
"cell_type": "markdown",
"id": "34db999b",
"metadata": {},
"source": [
"Plot the 1-day and 30-day moving averages of the spread"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "847c6226",
"metadata": {},
"outputs": [],
"source": [
"plt.plot(spread_mavg1.index, spread_mavg1.values)\n",
"plt.plot(spread_mavg30.index, spread_mavg30.values)\n",
"plt.legend(['1 Day Spread MAVG', '30 Day Spread MAVG'])\n",
"plt.ylabel('Spread')"
]
},
{
"cell_type": "markdown",
"id": "f45c9783",
"metadata": {},
"source": [
"Calculate the rolling 30-day standard deviation of the spread"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c4235ea2",
"metadata": {},
"outputs": [],
"source": [
"std_30 = spread.rolling(30).std()\n",
"std_30.name = 'std 30d'"
]
},
{
"cell_type": "markdown",
"id": "63186967",
"metadata": {},
"source": [
"Compute and plot the z-score of the spread using 30-day moving averages and standard deviation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e3f5ed69",
"metadata": {},
"outputs": [],
"source": [
"zscore_30_1 = (spread_mavg1 - spread_mavg30) / std_30\n",
"zscore_30_1.name = 'z-score'\n",
"zscore_30_1.plot()\n",
"plt.axhline(0, color='black')\n",
"plt.axhline(1.0, color='red', linestyle='--')"
]
},
{
"cell_type": "markdown",
"id": "2639cbd5",
"metadata": {},
"source": [
"Plot the scaled stock prices and the rolling z-score for comparison"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d976a6a",
"metadata": {},
"outputs": [],
"source": [
"plt.plot(S1.index, S1.values / 10)\n",
"plt.plot(S2.index, S2.values / 10)\n",
"plt.plot(zscore_30_1.index, zscore_30_1.values)\n",
"plt.legend(['S1 Price / 10', 'S2 Price / 10', 'Price Spread Rolling z-Score'])"
]
},
{
"cell_type": "markdown",
"id": "3e889d9b",
"metadata": {},
"source": [
"Update the symbol list and download new data for another analysis period"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "de1483ff",
"metadata": {},
"outputs": [],
"source": [
"symbol_list = ['amzn', 'aapl']\n",
"data = yf.download(symbol_list, start='2015-01-01', end='2016-01-01')['Adj Close']"
]
},
{
"cell_type": "markdown",
"id": "64d819f3",
"metadata": {},
"source": [
"Select Amazon (AMZN) and Apple (AAPL) prices from the new data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4ae4d2ee",
"metadata": {},
"outputs": [],
"source": [
"S1 = data.AMZN\n",
"S2 = data.AAPL"
]
},
{
"cell_type": "markdown",
"id": "d48f9db4",
"metadata": {},
"source": [
"Perform a cointegration test on the new data and print the p-value"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1f80f3d8",
"metadata": {},
"outputs": [],
"source": [
"score, pvalue, _ = coint(S1, S2)\n",
"print(pvalue)"
]
},
{
"cell_type": "markdown",
"id": "c0e7a627",
"metadata": {},
"source": [
"<a href=\"https://pyquantnews.com/\">PyQuant News</a> is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to <a href=\"https://gettingstartedwithpythonforquantfinance.com/\">get started with Python for quant finance</a>. For educational purposes. Not investment advise. Use at your own risk."
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"main_language": "python",
"notebook_metadata_filter": "-all"
}
},
"nbformat": 4,
"nbformat_minor": 5
}