{ "cells": [ { "cell_type": "markdown", "id": "344122e6", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "id": "fedd6a2c", "metadata": {}, "source": [ "This code performs clustering on Nasdaq-100 stock returns and volatilities to identify distinct groups using the K-Medoids algorithm. It fetches historical stock data, calculates annualized returns and volatilities, and visualizes the clustering results. The Elbow method is used to determine the optimal number of clusters. This approach is useful for financial analysis, portfolio management, and identifying patterns in stock performance." ] }, { "cell_type": "code", "execution_count": null, "id": "a3b6f094", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from sklearn_extra.cluster import KMedoids\n", "import matplotlib.pyplot as plt\n", "from openbb_terminal.sdk import openbb" ] }, { "cell_type": "markdown", "id": "67dda957", "metadata": {}, "source": [ "Configure default plot style and parameters for visualizations" ] }, { "cell_type": "code", "execution_count": null, "id": "a254a022", "metadata": {}, "outputs": [], "source": [ "plt.style.use(\"default\")\n", "plt.rcParams[\"figure.figsize\"] = [5.5, 4.0]\n", "plt.rcParams[\"figure.dpi\"] = 140\n", "plt.rcParams[\"lines.linewidth\"] = 0.75\n", "plt.rcParams[\"font.size\"] = 8" ] }, { "cell_type": "markdown", "id": "144440ca", "metadata": {}, "source": [ "Fetch Nasdaq-100 tickers from Wikipedia and retrieve historical stock data from the OpenBB Terminal SDK" ] }, { "cell_type": "code", "execution_count": null, "id": "9bfb4664", "metadata": {}, "outputs": [], "source": [ "nq = pd.read_html(\"https://en.wikipedia.org/wiki/Nasdaq-100\")[4]\n", "symbols = nq.Ticker.tolist()\n", "data = openbb.stocks.ca.hist(\n", " symbols, \n", " start_date=\"2020-01-01\", \n", " end_date=\"2022-12-31\"\n", ")" ] }, { "cell_type": "markdown", "id": "835e4118", "metadata": {}, "source": [ "Calculate annualized returns and volatilities from the historical stock data" ] }, { "cell_type": "code", "execution_count": null, "id": "293200a9", "metadata": {}, "outputs": [], "source": [ "moments = (\n", " data\n", " .pct_change()\n", " .describe()\n", " .T[[\"mean\", \"std\"]]\n", " .rename(columns={\"mean\": \"returns\", \"std\": \"vol\"})\n", ") * [252, np.sqrt(252)]" ] }, { "cell_type": "markdown", "id": "896c573b", "metadata": {}, "source": [ "Calculate the sum of squared errors (SSE) for different numbers of clusters to determine the optimal number using the Elbow method" ] }, { "cell_type": "code", "execution_count": null, "id": "a4728d89", "metadata": {}, "outputs": [], "source": [ "sse = []\n", "for k in range(2, 15):\n", " km = KMedoids(n_clusters=k).fit(moments)\n", " sse.append(km.inertia_)" ] }, { "cell_type": "code", "execution_count": null, "id": "fe9f1536", "metadata": {}, "outputs": [], "source": [ "plt.plot(range(2, 15), sse)\n", "plt.title(\"Elbow Curve\")" ] }, { "cell_type": "markdown", "id": "69b91a09", "metadata": {}, "source": [ "Fit the K-Medoids algorithm with the optimal number of clusters (in this case, 5) and obtain cluster labels" ] }, { "cell_type": "code", "execution_count": null, "id": "918cbfa6", "metadata": {}, "outputs": [], "source": [ "km = KMedoids(n_clusters=5).fit(moments)\n", "labels = km.labels_\n", "unique_labels = set(labels)\n", "colors = [\n", " plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))\n", "]" ] }, { "cell_type": "code", "execution_count": null, "id": "b8687502", "metadata": {}, "outputs": [], "source": [ "labels" ] }, { "cell_type": "markdown", "id": "1cffebb3", "metadata": {}, "source": [ "Visualize the clustering results by plotting annualized returns and volatilities, color-coded by cluster" ] }, { "cell_type": "code", "execution_count": null, "id": "7f465b31", "metadata": {}, "outputs": [], "source": [ "for k, col in zip(unique_labels, colors):\n", " class_member_mask = labels == k\n", "\n", " xy = moments[class_member_mask]\n", " plt.plot(\n", " xy.iloc[:, 0],\n", " xy.iloc[:, 1],\n", " \"o\",\n", " markerfacecolor=tuple(col),\n", " markeredgecolor=\"k\",\n", " )" ] }, { "cell_type": "code", "execution_count": null, "id": "97140a52", "metadata": {}, "outputs": [], "source": [ "plt.plot(\n", " km.cluster_centers_[:, 0],\n", " km.cluster_centers_[:, 1],\n", " \"o\",\n", " markerfacecolor=\"cyan\",\n", " markeredgecolor=\"k\",\n", ")\n", "plt.xlabel(\"Return\")\n", "plt.ylabel(\"Ann. Vol.\")" ] }, { "cell_type": "markdown", "id": "ec75d2f7", "metadata": {}, "source": [ "PyQuant News is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to get started with Python for quant finance. For educational purposes. Not investment advise. Use at your own risk." ] } ], "metadata": { "jupytext": { "cell_metadata_filter": "-all", "main_language": "python", "notebook_metadata_filter": "-all" } }, "nbformat": 4, "nbformat_minor": 5 }