698 lines
16 KiB
Plaintext
698 lines
16 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7dc325ea",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div style=\"background-color:#000;\"><img src=\"pqn.png\"></img></div>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7fbdbb7d",
|
|
"metadata": {},
|
|
"source": [
|
|
"This code analyzes the cointegration of stock pairs using historical price data. It downloads stock prices, finds cointegrated pairs, and calculates their spread. Cointegration indicates a stable, long-term relationship between stock pairs, useful for statistical arbitrage. Visualization tools like heatmaps and rolling z-scores help to identify trading signals. The code is practical in pairs trading strategies and quantitative finance."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "66c329f2",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import numpy as np\n",
|
|
"import pandas as pd"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "46e13192",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import statsmodels.api as sm\n",
|
|
"from statsmodels.tsa.stattools import coint\n",
|
|
"from statsmodels.regression.rolling import RollingOLS"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "ace9406d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import yfinance as yf\n",
|
|
"import seaborn\n",
|
|
"import matplotlib.pyplot as plt"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "fef80814",
|
|
"metadata": {},
|
|
"source": [
|
|
"Define the list of stock symbols to analyze"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "b3656066",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"symbol_list = ['meta', 'amzn', 'aapl', 'nflx', 'goog']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "aea87a6c",
|
|
"metadata": {},
|
|
"source": [
|
|
"Download historical adjusted closing prices for the specified symbols"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "30fcec98",
|
|
"metadata": {
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"data = yf.download(symbol_list, start='2014-01-01', end='2015-01-01')['Adj Close']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "060ae221",
|
|
"metadata": {},
|
|
"source": [
|
|
"Define a function to find cointegrated pairs of stocks"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "56b50383",
|
|
"metadata": {
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def find_cointegrated_pairs(data):\n",
|
|
" \"\"\"Find cointegrated stock pairs\n",
|
|
"\n",
|
|
" This function calculates cointegration scores and p-values for stock pairs.\n",
|
|
"\n",
|
|
" Parameters\n",
|
|
" ----------\n",
|
|
" data : pd.DataFrame\n",
|
|
" DataFrame of stock prices with stocks as columns.\n",
|
|
"\n",
|
|
" Returns\n",
|
|
" -------\n",
|
|
" score_matrix : np.ndarray\n",
|
|
" Matrix of cointegration scores.\n",
|
|
" pvalue_matrix : np.ndarray\n",
|
|
" Matrix of p-values for cointegration tests.\n",
|
|
" pairs : list\n",
|
|
" List of cointegrated stock pairs.\n",
|
|
" \"\"\"\n",
|
|
" \n",
|
|
" # Initialize matrices for scores and p-values, and a list for pairs\n",
|
|
" n = data.shape[1]\n",
|
|
" score_matrix = np.zeros((n, n))\n",
|
|
" pvalue_matrix = np.ones((n, n))\n",
|
|
" keys = data.keys()\n",
|
|
" pairs = []\n",
|
|
"\n",
|
|
" # Loop over combinations of stock pairs to test for cointegration\n",
|
|
" for i in range(n):\n",
|
|
" for j in range(i + 1, n):\n",
|
|
" S1 = data[keys[i]]\n",
|
|
" S2 = data[keys[j]]\n",
|
|
" result = coint(S1, S2)\n",
|
|
" score = result[0]\n",
|
|
" pvalue = result[1]\n",
|
|
" score_matrix[i, j] = score\n",
|
|
" pvalue_matrix[i, j] = pvalue\n",
|
|
" \n",
|
|
" # Add pair to list if p-value is less than 0.05\n",
|
|
" if pvalue < 0.05:\n",
|
|
" pairs.append((keys[i], keys[j]))\n",
|
|
" \n",
|
|
" return score_matrix, pvalue_matrix, pairs"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c154e226",
|
|
"metadata": {},
|
|
"source": [
|
|
"Find cointegrated pairs and store scores, p-values, and pairs"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "efff3232",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"scores, pvalues, pairs = find_cointegrated_pairs(data)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "44112349",
|
|
"metadata": {},
|
|
"source": [
|
|
"Visualize the p-value matrix as a heatmap"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "288ff7d1",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"seaborn.heatmap(\n",
|
|
" pvalues, \n",
|
|
" xticklabels=symbol_list, \n",
|
|
" yticklabels=symbol_list, \n",
|
|
" cmap='RdYlGn_r', \n",
|
|
" mask=(pvalues >= 0.10)\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "65dabf09",
|
|
"metadata": {},
|
|
"source": [
|
|
"Select two stocks, Amazon (AMZN) and Apple (AAPL), for further analysis"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "2c41310c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"S1 = data.AMZN\n",
|
|
"S2 = data.AAPL"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5edd06ec",
|
|
"metadata": {},
|
|
"source": [
|
|
"Perform a cointegration test on the selected pair"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "83fc4b52",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"score, pvalue, _ = coint(S1, S2)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "1ad95d73",
|
|
"metadata": {},
|
|
"source": [
|
|
"Print the p-value of the cointegration test"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "240c1103",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"pvalue"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "51882f40",
|
|
"metadata": {},
|
|
"source": [
|
|
"Add a constant term to the Amazon stock prices for regression analysis"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "4bf6bfd7",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"S1 = sm.add_constant(S1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "fa8668d3",
|
|
"metadata": {},
|
|
"source": [
|
|
"Fit an Ordinary Least Squares (OLS) regression model using Apple as the dependent variable"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "fbd88ec3",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"results = sm.OLS(S2, S1).fit()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e8d11ac1",
|
|
"metadata": {},
|
|
"source": [
|
|
"Remove the constant term from Amazon stock prices"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "6d1de3f7",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"S1 = S1.AMZN"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a38f4599",
|
|
"metadata": {},
|
|
"source": [
|
|
"Extract the regression coefficient (beta) for Amazon"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "03b56755",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"b = results.params['AMZN']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0199e5b8",
|
|
"metadata": {},
|
|
"source": [
|
|
"Calculate the spread between Apple and the beta-adjusted Amazon prices"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "67d972a7",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"spread = S2 - b * S1"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "da2513be",
|
|
"metadata": {},
|
|
"source": [
|
|
"Plot the spread and its mean value"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "29b6df5f",
|
|
"metadata": {
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"spread.plot()\n",
|
|
"plt.axhline(spread.mean(), color='black')\n",
|
|
"plt.legend(['Spread'])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c0cc2832",
|
|
"metadata": {},
|
|
"source": [
|
|
"Define a function to calculate the z-score of a series"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "56b5ca81",
|
|
"metadata": {
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def zscore(series):\n",
|
|
" \"\"\"Calculate z-score of a series\n",
|
|
"\n",
|
|
" This function returns the z-score for a given series.\n",
|
|
"\n",
|
|
" Parameters\n",
|
|
" ----------\n",
|
|
" series : pd.Series\n",
|
|
" A pandas Series for which to calculate z-score.\n",
|
|
"\n",
|
|
" Returns\n",
|
|
" -------\n",
|
|
" zscore : pd.Series\n",
|
|
" Z-score of the input series.\n",
|
|
" \"\"\"\n",
|
|
" \n",
|
|
" return (series - series.mean()) / np.std(series)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ef4d6e5c",
|
|
"metadata": {},
|
|
"source": [
|
|
"Plot the z-score of the spread with mean and threshold lines"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "e28d66d0",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"zscore(spread).plot()\n",
|
|
"plt.axhline(zscore(spread).mean(), color='black')\n",
|
|
"plt.axhline(1.0, color='red', linestyle='--')\n",
|
|
"plt.axhline(-1.0, color='green', linestyle='--')\n",
|
|
"plt.legend(['Spread z-score', 'Mean', '+1', '-1'])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "daf80128",
|
|
"metadata": {},
|
|
"source": [
|
|
"Create a DataFrame with the signal and position size in the pair"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "3e3b8c8f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"trades = pd.concat([zscore(spread), S2 - b * S1], axis=1)\n",
|
|
"trades.columns = [\"signal\", \"position\"]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "82f2b445",
|
|
"metadata": {},
|
|
"source": [
|
|
"Add long and short positions based on z-score thresholds"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "102994fc",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"trades[\"side\"] = 0.0\n",
|
|
"trades.loc[trades.signal <= -1, \"side\"] = 1\n",
|
|
"trades.loc[trades.signal >= 1, \"side\"] = -1"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "20667ab3",
|
|
"metadata": {},
|
|
"source": [
|
|
"Calculate and plot cumulative returns from the trading strategy"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "202ffdea",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"returns = trades.position.pct_change() * trades.side\n",
|
|
"returns.cumsum().plot()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ade9d6ff",
|
|
"metadata": {},
|
|
"source": [
|
|
"Print the trades DataFrame for inspection"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "a176d604",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"trades"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d7180626",
|
|
"metadata": {},
|
|
"source": [
|
|
"Calculate the spread using a rolling OLS model with a 30-day window"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c7690187",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"model = RollingOLS(endog=S1, exog=S2, window=30)\n",
|
|
"rres = model.fit()\n",
|
|
"spread = S2 - rres.params.AAPL * S1\n",
|
|
"spread.name = 'spread'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9b5f37bb",
|
|
"metadata": {},
|
|
"source": [
|
|
"Calculate 1-day and 30-day moving averages of the spread"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "f64b6a8c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"spread_mavg1 = spread.rolling(1).mean()\n",
|
|
"spread_mavg1.name = 'spread 1d mavg'\n",
|
|
"spread_mavg30 = spread.rolling(30).mean()\n",
|
|
"spread_mavg30.name = 'spread 30d mavg'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "34db999b",
|
|
"metadata": {},
|
|
"source": [
|
|
"Plot the 1-day and 30-day moving averages of the spread"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "847c6226",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"plt.plot(spread_mavg1.index, spread_mavg1.values)\n",
|
|
"plt.plot(spread_mavg30.index, spread_mavg30.values)\n",
|
|
"plt.legend(['1 Day Spread MAVG', '30 Day Spread MAVG'])\n",
|
|
"plt.ylabel('Spread')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f45c9783",
|
|
"metadata": {},
|
|
"source": [
|
|
"Calculate the rolling 30-day standard deviation of the spread"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c4235ea2",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"std_30 = spread.rolling(30).std()\n",
|
|
"std_30.name = 'std 30d'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "63186967",
|
|
"metadata": {},
|
|
"source": [
|
|
"Compute and plot the z-score of the spread using 30-day moving averages and standard deviation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "e3f5ed69",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"zscore_30_1 = (spread_mavg1 - spread_mavg30) / std_30\n",
|
|
"zscore_30_1.name = 'z-score'\n",
|
|
"zscore_30_1.plot()\n",
|
|
"plt.axhline(0, color='black')\n",
|
|
"plt.axhline(1.0, color='red', linestyle='--')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "2639cbd5",
|
|
"metadata": {},
|
|
"source": [
|
|
"Plot the scaled stock prices and the rolling z-score for comparison"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9d976a6a",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"plt.plot(S1.index, S1.values / 10)\n",
|
|
"plt.plot(S2.index, S2.values / 10)\n",
|
|
"plt.plot(zscore_30_1.index, zscore_30_1.values)\n",
|
|
"plt.legend(['S1 Price / 10', 'S2 Price / 10', 'Price Spread Rolling z-Score'])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3e889d9b",
|
|
"metadata": {},
|
|
"source": [
|
|
"Update the symbol list and download new data for another analysis period"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "de1483ff",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"symbol_list = ['amzn', 'aapl']\n",
|
|
"data = yf.download(symbol_list, start='2015-01-01', end='2016-01-01')['Adj Close']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "64d819f3",
|
|
"metadata": {},
|
|
"source": [
|
|
"Select Amazon (AMZN) and Apple (AAPL) prices from the new data"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "4ae4d2ee",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"S1 = data.AMZN\n",
|
|
"S2 = data.AAPL"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d48f9db4",
|
|
"metadata": {},
|
|
"source": [
|
|
"Perform a cointegration test on the new data and print the p-value"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "1f80f3d8",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"score, pvalue, _ = coint(S1, S2)\n",
|
|
"print(pvalue)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c0e7a627",
|
|
"metadata": {},
|
|
"source": [
|
|
"<a href=\"https://pyquantnews.com/\">PyQuant News</a> is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to <a href=\"https://gettingstartedwithpythonforquantfinance.com/\">get started with Python for quant finance</a>. For educational purposes. Not investment advise. Use at your own risk."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"jupytext": {
|
|
"cell_metadata_filter": "-all",
|
|
"main_language": "python",
|
|
"notebook_metadata_filter": "-all"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|