strategy-lab/to_explore/notebooks/PQN_ParamCV.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b9d33f66-67f4-483f-9613-87b3298a8fc1",
   "metadata": {},
   "source": [
    "# How to cross-validate a parameterized trading strategy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23954d08-d7cf-43ec-84b2-cd6dbbfc5c15",
   "metadata": {},
   "source": [
    "Trading strategies often rely on parameters. Enhancing and effectively cross-validating these parameters can provide a competitive advantage in the market. However, creating a reliable cross-validation schema is challenging due to risks like look-ahead bias and other pitfalls that can lead to overestimating a strategy's performance. With [VectorBT PRO](https://vectorbt.pro/), you can easily access and implement a variety of sophisticated cross-validation methods with just a few lines of code."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d522552e-dd90-46de-a579-6df68154f91b",
   "metadata": {},
   "source": [
    "## Imports and data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d482486-ce8d-4474-8623-7a08dfba157c",
   "metadata": {},
   "source": [
    "Let's import VBT PRO and the few libraries relevant for our analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "039c1b36-bb5a-4272-8887-d2ff7770184e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from vectorbtpro import *\n",
    "# whats_imported()\n",
    "\n",
    "vbt.settings.set_theme(\"dark\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0b6912a-09c7-4c72-b381-c35135ca627f",
   "metadata": {},
   "source": [
    "The first step involves acquiring data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1f432668-8698-4c21-a9b4-d8c4c40972e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "SYMBOL = \"AAPL\"\n",
    "START = \"2010\"\n",
    "END = \"now\"\n",
    "TIMEFRAME = \"day\"\n",
    "\n",
    "data = vbt.YFData.pull(SYMBOL, start=START, end=END, timeframe=TIMEFRAME)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9058618c-6395-4bbc-8e7a-b0cdc6394a85",
   "metadata": {},
   "source": [
    "## Cross-validation schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea50af21-ff28-4d40-870b-7fcdf6e155e1",
   "metadata": {},
   "source": [
    "Next, we'll set up a \"splitter,\" which divides a date range into smaller segments according to a chosen schema. For instance, let's allocate 12 months for training data and another 12 months for testing data, with this cycle repeating every 3 months."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cc405ebe-ae7d-4687-8470-7d1dbf91aad6",
   "metadata": {},
   "outputs": [],
   "source": [
    "TRAIN = 12\n",
    "TEST = 12\n",
    "EVERY = 3\n",
    "OFFSET = vbt.offset(\"M\")\n",
    "\n",
    "splitter = vbt.Splitter.from_ranges(\n",
    "    data.index, \n",
    "    every=EVERY * OFFSET, \n",
    "    lookback_period=(TRAIN + TEST) * OFFSET,\n",
    "    split=(\n",
    "        vbt.RepFunc(lambda index: index < index[0] + TRAIN * OFFSET),\n",
    "        vbt.RepFunc(lambda index: index >= index[0] + TRAIN * OFFSET),\n",
    "    ),\n",
    "    set_labels=[\"train\", \"test\"]\n",
    ")\n",
    "splitter.plots().show_svg()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a749de2-6f58-4246-b308-ad64c6b03f90",
   "metadata": {},
   "source": [
    "In the first subplot, we see that each split (or row) contains adjacent training and testing sets, progressively rolling from past to present. The second subplot illustrates the overlap of each data point across different ranges. Tip: For non-overlapping testing sets, use the setting `EVERY = TRAIN`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab6ed776-f682-43c8-a6d7-489101f82c70",
   "metadata": {},
   "source": [
    "## Objective function"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ea17ccf-77e0-4d6c-af87-4c2cdb788d4f",
   "metadata": {},
   "source": [
    "Next, we'll create a function to execute a trading strategy within a specified date range using a single parameter set, returning one key metric. Our strategy will be a simple EMA crossover combined with an ATR trailing stop."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6a0ac65-5336-4600-870d-dd9202ef78ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "def objective(data, fast_period=10, slow_period=20, atr_period=14, atr_mult=3):\n",
    "    fast_ema = data.run(\"talib:ema\", fast_period, short_name=\"fast_ema\", unpack=True)\n",
    "    slow_ema = data.run(\"talib:ema\", slow_period, short_name=\"slow_ema\", unpack=True)\n",
    "    atr = data.run(\"talib:atr\", atr_period, unpack=True)\n",
    "    pf = vbt.PF.from_signals(\n",
    "        data, \n",
    "        entries=fast_ema.vbt.crossed_above(slow_ema), \n",
    "        exits=fast_ema.vbt.crossed_below(slow_ema), \n",
    "        tsl_stop=atr * atr_mult, \n",
    "        save_returns=True,\n",
    "        freq=TIMEFRAME\n",
    "    )\n",
    "    return pf.sharpe_ratio\n",
    "\n",
    "print(objective(data))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc93d52c-a4e5-4122-a8e6-51ef5d52adbb",
   "metadata": {},
   "source": [
    "## Parameter optimization"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41b0f21c-e99e-416a-b2da-dcdcbf7e7fed",
   "metadata": {},
   "source": [
    "Let's harness the power of VBT PRO! By decorating (or wrapping) our function with `parameterized`, we enable `objective` to accept a list of parameters and execute them across all combinations. We'll then further enhance the function with another decorator, `split`, which runs the strategy on each date range specified by the splitter. This approach allows us to apply our strategy across every possible date range and parameter combination, compiling the outcomes into a single Pandas Series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e150f68a-9ce3-432f-81ee-566acdfdd2dc",
   "metadata": {},
   "outputs": [],
   "source": [
    "param_objective = vbt.parameterized(\n",
    "    objective,\n",
    "    merge_func=\"concat\",\n",
    "    mono_n_chunks=\"auto\",  # merge parameter combinations into chunks\n",
    "    execute_kwargs=dict(warmup=True, engine=\"pathos\")  # run chunks in parallel using Pathos\n",
    ")\n",
    "cv_objective = vbt.split(\n",
    "    param_objective,\n",
    "    splitter=splitter, \n",
    "    takeable_args=[\"data\"],  # select date range from data\n",
    "    merge_func=\"concat\", \n",
    ")\n",
    "\n",
    "sharpe_ratio = cv_objective(\n",
    "    data,\n",
    "    vbt.Param(np.arange(10, 50), condition=\"slow_period - fast_period >= 5\"),\n",
    "    vbt.Param(np.arange(10, 50)),\n",
    "    vbt.Param(np.arange(10, 50), condition=\"fast_period <= atr_period <= slow_period\"),\n",
    "    vbt.Param(np.arange(2, 5))\n",
    ")\n",
    "print(sharpe_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5558952-d14d-49a3-9a2d-21152eb3bcdd",
   "metadata": {},
   "source": [
    "We tested over 3 million combinations of date ranges and parameters in just a few minutes."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbb436c2-6428-48d3-9c58-67e9293c7333",
   "metadata": {},
   "source": [
    "## Analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d92c269d-9641-4bf4-a86d-c415343615d5",
   "metadata": {},
   "source": [
    "Let's find out if there's a correlation between the results of the training and testing sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d52a439-cee9-4e6d-9ef6-48dd050e8c7a",
   "metadata": {},
   "outputs": [],
   "source": [
    "train_sharpe_ratio = sharpe_ratio.xs(\"train\", level=\"set\")\n",
    "test_sharpe_ratio = sharpe_ratio.xs(\"test\", level=\"set\")\n",
    "print(train_sharpe_ratio.corr(test_sharpe_ratio))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c2950f5-bd78-4bbc-b78d-bf3bb7f31f02",
   "metadata": {},
   "source": [
    "The analysis indicates a weak negative correlation or no substantial correlation. This suggests that the strategy tends to perform oppositely compared to its results in previous months."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f565e4ac-2cac-4d81-8b82-8f375333e1ad",
   "metadata": {},
   "source": [
    "And here's an analysis segmented by fast and slow EMA periods. It highlights the minimal variation in the Sharpe ratio from the training to the testing set across at least 50% of the splits, where blue indicates a positive change."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "58be0727-3e27-45bf-bca9-de24674eb5ea",
   "metadata": {},
   "outputs": [],
   "source": [
    "sharpe_ratio_diff = test_sharpe_ratio - train_sharpe_ratio\n",
    "sharpe_ratio_diff_median = sharpe_ratio_diff.groupby([\"fast_period\", \"slow_period\"]).median()\n",
    "sharpe_ratio_diff_median.vbt.heatmap(trace_kwargs=dict(colorscale=\"RdBu\")).show_svg()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5423a9e-289e-421f-886e-753577b4ba13",
   "metadata": {},
   "source": [
    "## Conclusion"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cfdc909c-6559-4eba-801c-d0792068aba2",
   "metadata": {},
   "source": [
    "Although you might have developed a promising strategy on paper, cross-validating it is essential to confirm its consistent performance over time and to ensure it's not merely a result of random fluctuations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1eb4e7a-4698-4c2c-896f-1c8d18a4c3ee",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}