{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "54e72fa4",
   "metadata": {},
   "source": [
    "<div style=\"background-color:#000;\"><img src=\"pqn.png\"></img></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0440e960",
   "metadata": {},
   "source": [
    "This code performs financial analysis by downloading historical stock price data and calculating returns and correlations. It visualizes price movements and correlations, then calculates residuals between sector returns and market returns using OLS regression. Finally, it plots cumulative returns of residuals and evaluates the impact of forecast correlations on portfolio breadth. This is useful for risk management and portfolio optimization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7c23baec",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "import statsmodels.api as sm\n",
    "import yfinance as yf\n",
    "import warnings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f1010b27",
   "metadata": {},
   "outputs": [],
   "source": [
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87be8f2d",
   "metadata": {},
   "source": [
    "Define a list of stock tickers to download data for"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6061c80",
   "metadata": {},
   "outputs": [],
   "source": [
    "tickers = [\"WFC\", \"JPM\", \"USB\", \"XOM\", \"VLO\", \"SLB\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd06f63f",
   "metadata": {},
   "source": [
    "Download historical stock price data from Yahoo Finance for the specified tickers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3f877b54",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = yf.download(tickers, start=\"2015-01-01\", end=\"2023-12-31\")[\"Close\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc1b6237",
   "metadata": {},
   "source": [
    "Calculate daily percentage returns and drop any missing values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3bad2658",
   "metadata": {},
   "outputs": [],
   "source": [
    "returns = data.pct_change().dropna()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "341b12c7",
   "metadata": {},
   "source": [
    "Create a figure with two subplots to plot stock prices and correlation heatmap"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b446c6c",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, (ax1, ax2) = plt.subplots(ncols=2)\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c7032e5",
   "metadata": {},
   "source": [
    "Calculate the correlation matrix of the returns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "71c19d12",
   "metadata": {},
   "outputs": [],
   "source": [
    "corr = returns.corr()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d68921e",
   "metadata": {},
   "source": [
    "Plot the historical stock prices on the first subplot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d00a220b",
   "metadata": {},
   "outputs": [],
   "source": [
    "left = data.plot(ax=ax1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3fe58d51",
   "metadata": {},
   "source": [
    "Plot the correlation heatmap on the second subplot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66f9c906",
   "metadata": {},
   "outputs": [],
   "source": [
    "right = sns.heatmap(\n",
    "    corr, ax=ax2, vmin=-1, vmax=1,\n",
    "    xticklabels=tickers, yticklabels=tickers\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8b511a8",
   "metadata": {},
   "source": [
    "Calculate and print the average pairwise correlation among the stocks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "966c48c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "average_corr = np.mean(\n",
    "    corr.values[np.triu_indices_from(corr.values, k=1)]\n",
    ")\n",
    "print(f\"Average pairwise correlation: {average_corr}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "152f6d91",
   "metadata": {},
   "source": [
    "Define stock tickers for market indices and two sectors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e749269",
   "metadata": {},
   "outputs": [],
   "source": [
    "market_symbols = [\"XLF\", \"SPY\", \"XLE\"]\n",
    "sector_1_stocks = [\"WFC\", \"JPM\", \"USB\"]\n",
    "sector_2_stocks = [\"XOM\", \"VLO\", \"SLB\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fb99e34",
   "metadata": {},
   "source": [
    "Combine all tickers into one list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31030296",
   "metadata": {},
   "outputs": [],
   "source": [
    "tickers = market_symbols + sector_1_stocks + sector_2_stocks"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67fa30a3",
   "metadata": {},
   "source": [
    "Download historical price data for the combined list of tickers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0cda0643",
   "metadata": {},
   "outputs": [],
   "source": [
    "price = yf.download(tickers, start=\"2015-01-01\", end=\"2023-12-31\").Close"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c928e9d",
   "metadata": {},
   "source": [
    "Calculate daily percentage returns and drop any missing values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45653d87",
   "metadata": {},
   "outputs": [],
   "source": [
    "returns = price.pct_change().dropna()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53640dce",
   "metadata": {},
   "source": [
    "Separate market returns and sector returns from the combined returns data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b35541cd",
   "metadata": {},
   "outputs": [],
   "source": [
    "market_returns = returns[\"SPY\"]\n",
    "sector_1_returns = returns[\"XLF\"]\n",
    "sector_2_returns = returns[\"XLE\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "349719d1",
   "metadata": {},
   "source": [
    "Initialize DataFrames to store residuals after regression against market returns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28459d4f",
   "metadata": {
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "stock_returns = returns.drop(market_symbols, axis=1)\n",
    "residuals_market = stock_returns.copy() * 0.0\n",
    "residuals = stock_returns.copy() * 0.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2d025e1",
   "metadata": {
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "def ols_residual(y, x):\n",
    "    \"\"\"Calculate OLS residuals between two series\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    y : pd.Series\n",
    "        Dependent variable series\n",
    "    x : pd.Series\n",
    "        Independent variable series\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    residuals : pd.Series\n",
    "        Residuals from OLS regression\n",
    "    \"\"\"\n",
    "    \n",
    "    results = sm.OLS(y, x).fit()\n",
    "    return results.resid"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29195a5e",
   "metadata": {},
   "source": [
    "Calculate residuals of sector returns after removing market returns influence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e742d786",
   "metadata": {},
   "outputs": [],
   "source": [
    "sector_1_excess = ols_residual(sector_1_returns, market_returns)\n",
    "sector_2_excess = ols_residual(sector_2_returns, market_returns)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b930f72",
   "metadata": {},
   "source": [
    "Calculate residuals for each stock in sector 1 after removing market and sector influence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba96972b",
   "metadata": {},
   "outputs": [],
   "source": [
    "for stock in sector_1_stocks:\n",
    "    residuals_market[stock] = ols_residual(returns[stock], market_returns)\n",
    "    residuals[stock] = ols_residual(residuals_market[stock], sector_1_excess)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a990f544",
   "metadata": {},
   "source": [
    "Calculate residuals for each stock in sector 2 after removing market and sector influence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d560696",
   "metadata": {},
   "outputs": [],
   "source": [
    "for stock in sector_2_stocks:\n",
    "    residuals_market[stock] = ols_residual(returns[stock], market_returns)\n",
    "    residuals[stock] = ols_residual(residuals_market[stock], sector_2_excess)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f5a5058",
   "metadata": {},
   "source": [
    "Plot cumulative returns of residuals and the correlation heatmap of residuals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b85c9ece",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, (ax1, ax2) = plt.subplots(ncols=2)\n",
    "fig.tight_layout()\n",
    "corr = residuals.corr()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ae9e522",
   "metadata": {},
   "source": [
    "Plot cumulative returns of residuals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38612953",
   "metadata": {},
   "outputs": [],
   "source": [
    "left = (1 + residuals).cumprod().plot(ax=ax1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39b1f294",
   "metadata": {},
   "source": [
    "Plot correlation heatmap of residuals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0588ecf",
   "metadata": {},
   "outputs": [],
   "source": [
    "right = sns.heatmap(\n",
    "    corr,\n",
    "    ax=ax2,\n",
    "    fmt=\"d\",\n",
    "    vmin=-1,\n",
    "    vmax=1,\n",
    "    xticklabels=residuals.columns,\n",
    "    yticklabels=residuals.columns,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b6393ec",
   "metadata": {},
   "source": [
    "Calculate and print the average pairwise correlation among the residuals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab1c97bd",
   "metadata": {
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "average_corr = np.mean(corr.values[np.triu_indices_from(corr.values, k=1)])\n",
    "print(f\"Average pairwise correlation: {average_corr}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5e9956f6",
   "metadata": {
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "def buckle_BR_const(N, rho):\n",
    "    \"\"\"Calculate effective breadth based on correlation\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    N : int\n",
    "        Number of assets\n",
    "    rho : np.ndarray\n",
    "        Array of correlation values\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    effective_breadth : np.ndarray\n",
    "        Effective number of independent bets\n",
    "    \"\"\"\n",
    "    \n",
    "    return N / (1 + rho * (N - 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ab38dbf",
   "metadata": {},
   "source": [
    "Generate a range of correlation values and plot the effective breadth"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "973cd58a",
   "metadata": {},
   "outputs": [],
   "source": [
    "corr = np.linspace(start=0, stop=1.0, num=500)\n",
    "plt.plot(corr, buckle_BR_const(6, corr))\n",
    "plt.ylabel('Effective Breadth (Number of Bets)')\n",
    "plt.xlabel('Forecast Correlation')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f9ff2b9",
   "metadata": {},
   "source": [
    "<a href=\"https://pyquantnews.com/\">PyQuant News</a> is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to <a href=\"https://gettingstartedwithpythonforquantfinance.com/\">get started with Python for quant finance</a>. For educational purposes. Not investment advise. Use at your own risk."
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all",
   "main_language": "python",
   "notebook_metadata_filter": "-all"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}