Files
strategy-lab/to_explore/pyquantnews/22_PortfolioPCA.ipynb
David Brazda e3da60c647 daily update
2024-10-21 20:57:56 +02:00

352 lines
8.3 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "301963b1",
"metadata": {},
"source": [
"<div style=\"background-color:#000;\"><img src=\"pqn.png\"></img></div>"
]
},
{
"cell_type": "markdown",
"id": "667b85b6",
"metadata": {},
"source": [
"This code performs Principal Component Analysis (PCA) on a portfolio of stocks to identify principal components driving the returns. It retrieves historical stock data, calculates daily returns, and applies PCA to these returns. The explained variance and principal components are visualized, and the factor returns and exposures are computed. These statistical risk factors help in understanding how much of the portfolio's returns arise from unobservable features. This is useful for portfolio management, risk assessment, and diversification analysis."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "71d0282c",
"metadata": {},
"outputs": [],
"source": [
"import yfinance as yf"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "746b91c7",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"from sklearn.decomposition import PCA\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"id": "2863a52a",
"metadata": {},
"source": [
"Define a list of stock symbols to retrieve historical data for"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4f98bfc4",
"metadata": {},
"outputs": [],
"source": [
"symbols = [\n",
" 'IBM',\n",
" 'MSFT',\n",
" 'META',\n",
" 'INTC',\n",
" 'NEM',\n",
" 'AU',\n",
" 'AEM',\n",
" 'GFI'\n",
"]"
]
},
{
"cell_type": "markdown",
"id": "02e922c9",
"metadata": {},
"source": [
"Download historical adjusted close prices for the defined stock symbols within the specified date range"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a563405d",
"metadata": {},
"outputs": [],
"source": [
"data = yf.download(symbols, start=\"2020-01-01\", end=\"2022-11-30\")"
]
},
{
"cell_type": "markdown",
"id": "1f52f786",
"metadata": {},
"source": [
"Calculate daily returns for the portfolio by computing percentage change and dropping NaN values"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0b26bb2f",
"metadata": {},
"outputs": [],
"source": [
"portfolio_returns = data['Adj Close'].pct_change().dropna()"
]
},
{
"cell_type": "markdown",
"id": "d6a1198c",
"metadata": {},
"source": [
"Apply Principal Component Analysis (PCA) to the portfolio returns to identify key components"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a9c1ac8b",
"metadata": {},
"outputs": [],
"source": [
"pca = PCA(n_components=3)\n",
"pca.fit(portfolio_returns)"
]
},
{
"cell_type": "markdown",
"id": "7dea06ae",
"metadata": {},
"source": [
"Retrieve the explained variance ratio and the principal components from the PCA model"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3ed0fced",
"metadata": {},
"outputs": [],
"source": [
"pct = pca.explained_variance_ratio_\n",
"pca_components = pca.components_"
]
},
{
"cell_type": "markdown",
"id": "91fe5bfd",
"metadata": {},
"source": [
"Calculate the cumulative explained variance for visualization and create an array for component indices"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4b40fb20",
"metadata": {},
"outputs": [],
"source": [
"cum_pct = np.cumsum(pct)\n",
"x = np.arange(1, len(pct) + 1, 1)"
]
},
{
"cell_type": "markdown",
"id": "a58e88c7",
"metadata": {},
"source": [
"Plot the percentage contribution of each principal component and the cumulative contribution"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "184bb55c",
"metadata": {},
"outputs": [],
"source": [
"plt.subplot(1, 2, 1)\n",
"plt.bar(x, pct * 100, align=\"center\")\n",
"plt.title('Contribution (%)')\n",
"plt.xlabel('Component')\n",
"plt.xticks(x)\n",
"plt.xlim([0, 4])\n",
"plt.ylim([0, 100])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9fbd347d",
"metadata": {},
"outputs": [],
"source": [
"plt.subplot(1, 2, 2)\n",
"plt.plot(x, cum_pct * 100, 'ro-')\n",
"plt.title('Cumulative contribution (%)')\n",
"plt.xlabel('Component')\n",
"plt.xticks(x)\n",
"plt.xlim([0, 4])\n",
"plt.ylim([0, 100])"
]
},
{
"cell_type": "markdown",
"id": "5b0a4722",
"metadata": {},
"source": [
"Construct statistical risk factors using the principal components and portfolio returns"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "027a88cf",
"metadata": {},
"outputs": [],
"source": [
"X = np.asarray(portfolio_returns)\n",
"factor_returns = X.dot(pca_components.T)\n",
"factor_returns = pd.DataFrame(\n",
" columns=[\"f1\", \"f2\", \"f3\"], \n",
" index=portfolio_returns.index,\n",
" data=factor_returns\n",
")"
]
},
{
"cell_type": "markdown",
"id": "3ba5fc5f",
"metadata": {},
"source": [
"Display the first few rows of the factor returns DataFrame"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f7ee295",
"metadata": {},
"outputs": [],
"source": [
"factor_returns.head()"
]
},
{
"cell_type": "markdown",
"id": "ae7022ed",
"metadata": {},
"source": [
"Calculate and display the factor exposures by transposing the principal components matrix"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "69a1609b",
"metadata": {},
"outputs": [],
"source": [
"factor_exposures = pd.DataFrame(\n",
" index=[\"f1\", \"f2\", \"f3\"], \n",
" columns=portfolio_returns.columns,\n",
" data=pca_components\n",
").T"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "406e4727",
"metadata": {},
"outputs": [],
"source": [
"factor_exposures"
]
},
{
"cell_type": "markdown",
"id": "73a266c3",
"metadata": {},
"source": [
"Sort and plot the factor exposures for the first principal component (f1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ff5ca2a",
"metadata": {},
"outputs": [],
"source": [
"factor_exposures.f1.sort_values().plot.bar()"
]
},
{
"cell_type": "markdown",
"id": "65c58ba2",
"metadata": {},
"source": [
"Scatter plot to visualize factor exposures of the first two principal components (f1 and f2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6fd70496",
"metadata": {},
"outputs": [],
"source": [
"labels = factor_exposures.index\n",
"data = factor_exposures.values\n",
"plt.scatter(data[:, 0], data[:, 1])\n",
"plt.xlabel('factor exposure of PC1')\n",
"plt.ylabel('factor exposure of PC2')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "789e384f",
"metadata": {},
"outputs": [],
"source": [
"for label, x, y in zip(labels, data[:, 0], data[:, 1]):\n",
" plt.annotate(\n",
" label,\n",
" xy=(x, y), \n",
" xytext=(-20, 20),\n",
" textcoords='offset points',\n",
" arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0')\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "425b0e57",
"metadata": {},
"source": [
"<a href=\"https://pyquantnews.com/\">PyQuant News</a> is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to <a href=\"https://gettingstartedwithpythonforquantfinance.com/\">get started with Python for quant finance</a>. For educational purposes. Not investment advise. Use at your own risk."
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"main_language": "python",
"notebook_metadata_filter": "-all"
}
},
"nbformat": 4,
"nbformat_minor": 5
}