Files
strategy-lab/to_explore/pyquantnews/65_KMedoids.ipynb
David Brazda e3da60c647 daily update
2024-10-21 20:57:56 +02:00

5.9 KiB

No description has been provided for this image

This code performs clustering on Nasdaq-100 stock returns and volatilities to identify distinct groups using the K-Medoids algorithm. It fetches historical stock data, calculates annualized returns and volatilities, and visualizes the clustering results. The Elbow method is used to determine the optimal number of clusters. This approach is useful for financial analysis, portfolio management, and identifying patterns in stock performance.

In [ ]:
import numpy as np
import pandas as pd
from sklearn_extra.cluster import KMedoids
import matplotlib.pyplot as plt
from openbb_terminal.sdk import openbb

Configure default plot style and parameters for visualizations

In [ ]:
plt.style.use("default")
plt.rcParams["figure.figsize"] = [5.5, 4.0]
plt.rcParams["figure.dpi"] = 140
plt.rcParams["lines.linewidth"] = 0.75
plt.rcParams["font.size"] = 8

Fetch Nasdaq-100 tickers from Wikipedia and retrieve historical stock data from the OpenBB Terminal SDK

In [ ]:
nq = pd.read_html("https://en.wikipedia.org/wiki/Nasdaq-100")[4]
symbols = nq.Ticker.tolist()
data = openbb.stocks.ca.hist(
    symbols, 
    start_date="2020-01-01", 
    end_date="2022-12-31"
)

Calculate annualized returns and volatilities from the historical stock data

In [ ]:
moments = (
    data
    .pct_change()
    .describe()
    .T[["mean", "std"]]
    .rename(columns={"mean": "returns", "std": "vol"})
) * [252, np.sqrt(252)]

Calculate the sum of squared errors (SSE) for different numbers of clusters to determine the optimal number using the Elbow method

In [ ]:
sse = []
for k in range(2, 15):
    km = KMedoids(n_clusters=k).fit(moments)
    sse.append(km.inertia_)
In [ ]:
plt.plot(range(2, 15), sse)
plt.title("Elbow Curve")

Fit the K-Medoids algorithm with the optimal number of clusters (in this case, 5) and obtain cluster labels

In [ ]:
km = KMedoids(n_clusters=5).fit(moments)
labels = km.labels_
unique_labels = set(labels)
colors = [
    plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))
]
In [ ]:
labels

Visualize the clustering results by plotting annualized returns and volatilities, color-coded by cluster

In [ ]:
for k, col in zip(unique_labels, colors):
    class_member_mask = labels == k

    xy = moments[class_member_mask]
    plt.plot(
        xy.iloc[:, 0],
        xy.iloc[:, 1],
        "o",
        markerfacecolor=tuple(col),
        markeredgecolor="k",
    )
In [ ]:
plt.plot(
    km.cluster_centers_[:, 0],
    km.cluster_centers_[:, 1],
    "o",
    markerfacecolor="cyan",
    markeredgecolor="k",
)
plt.xlabel("Return")
plt.ylabel("Ann. Vol.")

PyQuant News is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to get started with Python for quant finance. For educational purposes. Not investment advise. Use at your own risk.