{ "cells": [ { "cell_type": "markdown", "id": "7dc325ea", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "id": "7fbdbb7d", "metadata": {}, "source": [ "This code analyzes the cointegration of stock pairs using historical price data. It downloads stock prices, finds cointegrated pairs, and calculates their spread. Cointegration indicates a stable, long-term relationship between stock pairs, useful for statistical arbitrage. Visualization tools like heatmaps and rolling z-scores help to identify trading signals. The code is practical in pairs trading strategies and quantitative finance." ] }, { "cell_type": "code", "execution_count": null, "id": "66c329f2", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "id": "46e13192", "metadata": {}, "outputs": [], "source": [ "import statsmodels.api as sm\n", "from statsmodels.tsa.stattools import coint\n", "from statsmodels.regression.rolling import RollingOLS" ] }, { "cell_type": "code", "execution_count": null, "id": "ace9406d", "metadata": {}, "outputs": [], "source": [ "import yfinance as yf\n", "import seaborn\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "id": "fef80814", "metadata": {}, "source": [ "Define the list of stock symbols to analyze" ] }, { "cell_type": "code", "execution_count": null, "id": "b3656066", "metadata": {}, "outputs": [], "source": [ "symbol_list = ['meta', 'amzn', 'aapl', 'nflx', 'goog']" ] }, { "cell_type": "markdown", "id": "aea87a6c", "metadata": {}, "source": [ "Download historical adjusted closing prices for the specified symbols" ] }, { "cell_type": "code", "execution_count": null, "id": "30fcec98", "metadata": { "lines_to_next_cell": 1 }, "outputs": [], "source": [ "data = yf.download(symbol_list, start='2014-01-01', end='2015-01-01')['Adj Close']" ] }, { "cell_type": "markdown", "id": "060ae221", "metadata": {}, "source": [ "Define a function to find cointegrated pairs of stocks" ] }, { "cell_type": "code", "execution_count": null, "id": "56b50383", "metadata": { "lines_to_next_cell": 1 }, "outputs": [], "source": [ "def find_cointegrated_pairs(data):\n", " \"\"\"Find cointegrated stock pairs\n", "\n", " This function calculates cointegration scores and p-values for stock pairs.\n", "\n", " Parameters\n", " ----------\n", " data : pd.DataFrame\n", " DataFrame of stock prices with stocks as columns.\n", "\n", " Returns\n", " -------\n", " score_matrix : np.ndarray\n", " Matrix of cointegration scores.\n", " pvalue_matrix : np.ndarray\n", " Matrix of p-values for cointegration tests.\n", " pairs : list\n", " List of cointegrated stock pairs.\n", " \"\"\"\n", " \n", " # Initialize matrices for scores and p-values, and a list for pairs\n", " n = data.shape[1]\n", " score_matrix = np.zeros((n, n))\n", " pvalue_matrix = np.ones((n, n))\n", " keys = data.keys()\n", " pairs = []\n", "\n", " # Loop over combinations of stock pairs to test for cointegration\n", " for i in range(n):\n", " for j in range(i + 1, n):\n", " S1 = data[keys[i]]\n", " S2 = data[keys[j]]\n", " result = coint(S1, S2)\n", " score = result[0]\n", " pvalue = result[1]\n", " score_matrix[i, j] = score\n", " pvalue_matrix[i, j] = pvalue\n", " \n", " # Add pair to list if p-value is less than 0.05\n", " if pvalue < 0.05:\n", " pairs.append((keys[i], keys[j]))\n", " \n", " return score_matrix, pvalue_matrix, pairs" ] }, { "cell_type": "markdown", "id": "c154e226", "metadata": {}, "source": [ "Find cointegrated pairs and store scores, p-values, and pairs" ] }, { "cell_type": "code", "execution_count": null, "id": "efff3232", "metadata": {}, "outputs": [], "source": [ "scores, pvalues, pairs = find_cointegrated_pairs(data)" ] }, { "cell_type": "markdown", "id": "44112349", "metadata": {}, "source": [ "Visualize the p-value matrix as a heatmap" ] }, { "cell_type": "code", "execution_count": null, "id": "288ff7d1", "metadata": {}, "outputs": [], "source": [ "seaborn.heatmap(\n", " pvalues, \n", " xticklabels=symbol_list, \n", " yticklabels=symbol_list, \n", " cmap='RdYlGn_r', \n", " mask=(pvalues >= 0.10)\n", ")" ] }, { "cell_type": "markdown", "id": "65dabf09", "metadata": {}, "source": [ "Select two stocks, Amazon (AMZN) and Apple (AAPL), for further analysis" ] }, { "cell_type": "code", "execution_count": null, "id": "2c41310c", "metadata": {}, "outputs": [], "source": [ "S1 = data.AMZN\n", "S2 = data.AAPL" ] }, { "cell_type": "markdown", "id": "5edd06ec", "metadata": {}, "source": [ "Perform a cointegration test on the selected pair" ] }, { "cell_type": "code", "execution_count": null, "id": "83fc4b52", "metadata": {}, "outputs": [], "source": [ "score, pvalue, _ = coint(S1, S2)" ] }, { "cell_type": "markdown", "id": "1ad95d73", "metadata": {}, "source": [ "Print the p-value of the cointegration test" ] }, { "cell_type": "code", "execution_count": null, "id": "240c1103", "metadata": {}, "outputs": [], "source": [ "pvalue" ] }, { "cell_type": "markdown", "id": "51882f40", "metadata": {}, "source": [ "Add a constant term to the Amazon stock prices for regression analysis" ] }, { "cell_type": "code", "execution_count": null, "id": "4bf6bfd7", "metadata": {}, "outputs": [], "source": [ "S1 = sm.add_constant(S1)" ] }, { "cell_type": "markdown", "id": "fa8668d3", "metadata": {}, "source": [ "Fit an Ordinary Least Squares (OLS) regression model using Apple as the dependent variable" ] }, { "cell_type": "code", "execution_count": null, "id": "fbd88ec3", "metadata": {}, "outputs": [], "source": [ "results = sm.OLS(S2, S1).fit()" ] }, { "cell_type": "markdown", "id": "e8d11ac1", "metadata": {}, "source": [ "Remove the constant term from Amazon stock prices" ] }, { "cell_type": "code", "execution_count": null, "id": "6d1de3f7", "metadata": {}, "outputs": [], "source": [ "S1 = S1.AMZN" ] }, { "cell_type": "markdown", "id": "a38f4599", "metadata": {}, "source": [ "Extract the regression coefficient (beta) for Amazon" ] }, { "cell_type": "code", "execution_count": null, "id": "03b56755", "metadata": {}, "outputs": [], "source": [ "b = results.params['AMZN']" ] }, { "cell_type": "markdown", "id": "0199e5b8", "metadata": {}, "source": [ "Calculate the spread between Apple and the beta-adjusted Amazon prices" ] }, { "cell_type": "code", "execution_count": null, "id": "67d972a7", "metadata": {}, "outputs": [], "source": [ "spread = S2 - b * S1" ] }, { "cell_type": "markdown", "id": "da2513be", "metadata": {}, "source": [ "Plot the spread and its mean value" ] }, { "cell_type": "code", "execution_count": null, "id": "29b6df5f", "metadata": { "lines_to_next_cell": 1 }, "outputs": [], "source": [ "spread.plot()\n", "plt.axhline(spread.mean(), color='black')\n", "plt.legend(['Spread'])" ] }, { "cell_type": "markdown", "id": "c0cc2832", "metadata": {}, "source": [ "Define a function to calculate the z-score of a series" ] }, { "cell_type": "code", "execution_count": null, "id": "56b5ca81", "metadata": { "lines_to_next_cell": 1 }, "outputs": [], "source": [ "def zscore(series):\n", " \"\"\"Calculate z-score of a series\n", "\n", " This function returns the z-score for a given series.\n", "\n", " Parameters\n", " ----------\n", " series : pd.Series\n", " A pandas Series for which to calculate z-score.\n", "\n", " Returns\n", " -------\n", " zscore : pd.Series\n", " Z-score of the input series.\n", " \"\"\"\n", " \n", " return (series - series.mean()) / np.std(series)" ] }, { "cell_type": "markdown", "id": "ef4d6e5c", "metadata": {}, "source": [ "Plot the z-score of the spread with mean and threshold lines" ] }, { "cell_type": "code", "execution_count": null, "id": "e28d66d0", "metadata": {}, "outputs": [], "source": [ "zscore(spread).plot()\n", "plt.axhline(zscore(spread).mean(), color='black')\n", "plt.axhline(1.0, color='red', linestyle='--')\n", "plt.axhline(-1.0, color='green', linestyle='--')\n", "plt.legend(['Spread z-score', 'Mean', '+1', '-1'])" ] }, { "cell_type": "markdown", "id": "daf80128", "metadata": {}, "source": [ "Create a DataFrame with the signal and position size in the pair" ] }, { "cell_type": "code", "execution_count": null, "id": "3e3b8c8f", "metadata": {}, "outputs": [], "source": [ "trades = pd.concat([zscore(spread), S2 - b * S1], axis=1)\n", "trades.columns = [\"signal\", \"position\"]" ] }, { "cell_type": "markdown", "id": "82f2b445", "metadata": {}, "source": [ "Add long and short positions based on z-score thresholds" ] }, { "cell_type": "code", "execution_count": null, "id": "102994fc", "metadata": {}, "outputs": [], "source": [ "trades[\"side\"] = 0.0\n", "trades.loc[trades.signal <= -1, \"side\"] = 1\n", "trades.loc[trades.signal >= 1, \"side\"] = -1" ] }, { "cell_type": "markdown", "id": "20667ab3", "metadata": {}, "source": [ "Calculate and plot cumulative returns from the trading strategy" ] }, { "cell_type": "code", "execution_count": null, "id": "202ffdea", "metadata": {}, "outputs": [], "source": [ "returns = trades.position.pct_change() * trades.side\n", "returns.cumsum().plot()" ] }, { "cell_type": "markdown", "id": "ade9d6ff", "metadata": {}, "source": [ "Print the trades DataFrame for inspection" ] }, { "cell_type": "code", "execution_count": null, "id": "a176d604", "metadata": {}, "outputs": [], "source": [ "trades" ] }, { "cell_type": "markdown", "id": "d7180626", "metadata": {}, "source": [ "Calculate the spread using a rolling OLS model with a 30-day window" ] }, { "cell_type": "code", "execution_count": null, "id": "c7690187", "metadata": {}, "outputs": [], "source": [ "model = RollingOLS(endog=S1, exog=S2, window=30)\n", "rres = model.fit()\n", "spread = S2 - rres.params.AAPL * S1\n", "spread.name = 'spread'" ] }, { "cell_type": "markdown", "id": "9b5f37bb", "metadata": {}, "source": [ "Calculate 1-day and 30-day moving averages of the spread" ] }, { "cell_type": "code", "execution_count": null, "id": "f64b6a8c", "metadata": {}, "outputs": [], "source": [ "spread_mavg1 = spread.rolling(1).mean()\n", "spread_mavg1.name = 'spread 1d mavg'\n", "spread_mavg30 = spread.rolling(30).mean()\n", "spread_mavg30.name = 'spread 30d mavg'" ] }, { "cell_type": "markdown", "id": "34db999b", "metadata": {}, "source": [ "Plot the 1-day and 30-day moving averages of the spread" ] }, { "cell_type": "code", "execution_count": null, "id": "847c6226", "metadata": {}, "outputs": [], "source": [ "plt.plot(spread_mavg1.index, spread_mavg1.values)\n", "plt.plot(spread_mavg30.index, spread_mavg30.values)\n", "plt.legend(['1 Day Spread MAVG', '30 Day Spread MAVG'])\n", "plt.ylabel('Spread')" ] }, { "cell_type": "markdown", "id": "f45c9783", "metadata": {}, "source": [ "Calculate the rolling 30-day standard deviation of the spread" ] }, { "cell_type": "code", "execution_count": null, "id": "c4235ea2", "metadata": {}, "outputs": [], "source": [ "std_30 = spread.rolling(30).std()\n", "std_30.name = 'std 30d'" ] }, { "cell_type": "markdown", "id": "63186967", "metadata": {}, "source": [ "Compute and plot the z-score of the spread using 30-day moving averages and standard deviation" ] }, { "cell_type": "code", "execution_count": null, "id": "e3f5ed69", "metadata": {}, "outputs": [], "source": [ "zscore_30_1 = (spread_mavg1 - spread_mavg30) / std_30\n", "zscore_30_1.name = 'z-score'\n", "zscore_30_1.plot()\n", "plt.axhline(0, color='black')\n", "plt.axhline(1.0, color='red', linestyle='--')" ] }, { "cell_type": "markdown", "id": "2639cbd5", "metadata": {}, "source": [ "Plot the scaled stock prices and the rolling z-score for comparison" ] }, { "cell_type": "code", "execution_count": null, "id": "9d976a6a", "metadata": {}, "outputs": [], "source": [ "plt.plot(S1.index, S1.values / 10)\n", "plt.plot(S2.index, S2.values / 10)\n", "plt.plot(zscore_30_1.index, zscore_30_1.values)\n", "plt.legend(['S1 Price / 10', 'S2 Price / 10', 'Price Spread Rolling z-Score'])" ] }, { "cell_type": "markdown", "id": "3e889d9b", "metadata": {}, "source": [ "Update the symbol list and download new data for another analysis period" ] }, { "cell_type": "code", "execution_count": null, "id": "de1483ff", "metadata": {}, "outputs": [], "source": [ "symbol_list = ['amzn', 'aapl']\n", "data = yf.download(symbol_list, start='2015-01-01', end='2016-01-01')['Adj Close']" ] }, { "cell_type": "markdown", "id": "64d819f3", "metadata": {}, "source": [ "Select Amazon (AMZN) and Apple (AAPL) prices from the new data" ] }, { "cell_type": "code", "execution_count": null, "id": "4ae4d2ee", "metadata": {}, "outputs": [], "source": [ "S1 = data.AMZN\n", "S2 = data.AAPL" ] }, { "cell_type": "markdown", "id": "d48f9db4", "metadata": {}, "source": [ "Perform a cointegration test on the new data and print the p-value" ] }, { "cell_type": "code", "execution_count": null, "id": "1f80f3d8", "metadata": {}, "outputs": [], "source": [ "score, pvalue, _ = coint(S1, S2)\n", "print(pvalue)" ] }, { "cell_type": "markdown", "id": "c0e7a627", "metadata": {}, "source": [ "PyQuant News is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to get started with Python for quant finance. For educational purposes. Not investment advise. Use at your own risk." ] } ], "metadata": { "jupytext": { "cell_metadata_filter": "-all", "main_language": "python", "notebook_metadata_filter": "-all" } }, "nbformat": 4, "nbformat_minor": 5 }