Files
strategy-lab/to_explore/pyquantnews/22_PortfolioPCA.ipynb
David Brazda e3da60c647 daily update
2024-10-21 20:57:56 +02:00

8.3 KiB

No description has been provided for this image

This code performs Principal Component Analysis (PCA) on a portfolio of stocks to identify principal components driving the returns. It retrieves historical stock data, calculates daily returns, and applies PCA to these returns. The explained variance and principal components are visualized, and the factor returns and exposures are computed. These statistical risk factors help in understanding how much of the portfolio's returns arise from unobservable features. This is useful for portfolio management, risk assessment, and diversification analysis.

In [ ]:
import yfinance as yf
In [ ]:
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

Define a list of stock symbols to retrieve historical data for

In [ ]:
symbols = [
    'IBM',
    'MSFT',
    'META',
    'INTC',
    'NEM',
    'AU',
    'AEM',
    'GFI'
]

Download historical adjusted close prices for the defined stock symbols within the specified date range

In [ ]:
data = yf.download(symbols, start="2020-01-01", end="2022-11-30")

Calculate daily returns for the portfolio by computing percentage change and dropping NaN values

In [ ]:
portfolio_returns = data['Adj Close'].pct_change().dropna()

Apply Principal Component Analysis (PCA) to the portfolio returns to identify key components

In [ ]:
pca = PCA(n_components=3)
pca.fit(portfolio_returns)

Retrieve the explained variance ratio and the principal components from the PCA model

In [ ]:
pct = pca.explained_variance_ratio_
pca_components = pca.components_

Calculate the cumulative explained variance for visualization and create an array for component indices

In [ ]:
cum_pct = np.cumsum(pct)
x = np.arange(1, len(pct) + 1, 1)

Plot the percentage contribution of each principal component and the cumulative contribution

In [ ]:
plt.subplot(1, 2, 1)
plt.bar(x, pct * 100, align="center")
plt.title('Contribution (%)')
plt.xlabel('Component')
plt.xticks(x)
plt.xlim([0, 4])
plt.ylim([0, 100])
In [ ]:
plt.subplot(1, 2, 2)
plt.plot(x, cum_pct * 100, 'ro-')
plt.title('Cumulative contribution (%)')
plt.xlabel('Component')
plt.xticks(x)
plt.xlim([0, 4])
plt.ylim([0, 100])

Construct statistical risk factors using the principal components and portfolio returns

In [ ]:
X = np.asarray(portfolio_returns)
factor_returns = X.dot(pca_components.T)
factor_returns = pd.DataFrame(
    columns=["f1", "f2", "f3"], 
    index=portfolio_returns.index,
    data=factor_returns
)

Display the first few rows of the factor returns DataFrame

In [ ]:
factor_returns.head()

Calculate and display the factor exposures by transposing the principal components matrix

In [ ]:
factor_exposures = pd.DataFrame(
    index=["f1", "f2", "f3"], 
    columns=portfolio_returns.columns,
    data=pca_components
).T
In [ ]:
factor_exposures

Sort and plot the factor exposures for the first principal component (f1)

In [ ]:
factor_exposures.f1.sort_values().plot.bar()

Scatter plot to visualize factor exposures of the first two principal components (f1 and f2)

In [ ]:
labels = factor_exposures.index
data = factor_exposures.values
plt.scatter(data[:, 0], data[:, 1])
plt.xlabel('factor exposure of PC1')
plt.ylabel('factor exposure of PC2')
In [ ]:
for label, x, y in zip(labels, data[:, 0], data[:, 1]):
    plt.annotate(
        label,
        xy=(x, y), 
        xytext=(-20, 20),
        textcoords='offset points',
        arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0')
    )

PyQuant News is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to get started with Python for quant finance. For educational purposes. Not investment advise. Use at your own risk.