{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d2770d21",
   "metadata": {},
   "source": [
    "<div style=\"background-color:#000;\"><img src=\"pqn.png\"></img></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18f4d52f",
   "metadata": {},
   "source": [
    "This code performs a walk-forward analysis using moving averages (MA) to optimize trading strategies. It splits historical stock prices into training and testing periods, runs simulations to find optimal MA parameters, and then tests the strategy on out-of-sample data. The code also evaluates the strategy's performance using the Sharpe ratio and compares it to a simple buy-and-hold strategy. Additionally, statistical tests are conducted to determine if the optimized strategy significantly outperforms the buy-and-hold approach. The results are visualized for further analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b64f33c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import vectorbt as vbt\n",
    "from datetime import datetime, timedelta\n",
    "import scipy.stats as stats"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "435ada84",
   "metadata": {},
   "source": [
    "Create a date range index for the Series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a98a4bb",
   "metadata": {},
   "outputs": [],
   "source": [
    "index = [datetime(2020, 1, 1) + timedelta(days=i) for i in range(10)]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "841c3cbf",
   "metadata": {},
   "source": [
    "Create a pandas Series with the date index and sequential integer values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e00fdf4a",
   "metadata": {},
   "outputs": [],
   "source": [
    "sr = pd.Series(np.arange(len(index)), index=index)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f9ca485",
   "metadata": {},
   "source": [
    "Perform a rolling split on the Series and plot the in-sample and out-sample periods"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c99f8c2d",
   "metadata": {},
   "outputs": [],
   "source": [
    "sr.vbt.rolling_split(\n",
    "    window_len=5, \n",
    "    set_lens=(1,), \n",
    "    left_to_right=False, \n",
    "    plot=True, \n",
    "    trace_names=['in_sample', 'out_sample']\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e767588",
   "metadata": {},
   "source": [
    "Create a range of window sizes for moving averages to iterate through"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6a1d2df1",
   "metadata": {},
   "outputs": [],
   "source": [
    "windows = np.arange(10, 50)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c454f62",
   "metadata": {},
   "source": [
    "Download historical closing prices for Apple (AAPL) stock"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0f6d7a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "price = vbt.YFData.download('AAPL').get('Close')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a49a8bb",
   "metadata": {},
   "source": [
    "Perform a rolling split on the price data for walk-forward analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3966351e",
   "metadata": {
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "(in_price, in_indexes), (out_price, out_indexes) = price.vbt.rolling_split(\n",
    "    n=30, \n",
    "    window_len=365 * 2,\n",
    "    set_lens=(180,),\n",
    "    left_to_right=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "971778fd",
   "metadata": {
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "def simulate_holding(price, **kwargs):\n",
    "    \"\"\"Returns Sharpe ratio for holding strategy\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    price : pd.Series\n",
    "        Historical price data\n",
    "    kwargs : dict\n",
    "        Additional arguments for the Portfolio function\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    float\n",
    "        Sharpe ratio of the holding strategy\n",
    "    \"\"\"\n",
    "    \n",
    "    # Run a backtest for holding the asset and return the Sharpe ratio\n",
    "    pf = vbt.Portfolio.from_holding(price, **kwargs)\n",
    "    return pf.sharpe_ratio()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5fad1f2c",
   "metadata": {
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "def simulate_all_params(price, windows, **kwargs):\n",
    "    \"\"\"Returns Sharpe ratio for all parameter combinations\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    price : pd.Series\n",
    "        Historical price data\n",
    "    windows : iterable\n",
    "        Range of window sizes for moving averages\n",
    "    kwargs : dict\n",
    "        Additional arguments for the Portfolio function\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    pd.Series\n",
    "        Sharpe ratios for all parameter combinations\n",
    "    \"\"\"\n",
    "    \n",
    "    # Run combinations of moving averages for all window sizes\n",
    "    fast_ma, slow_ma = vbt.MA.run_combs(\n",
    "        price, windows, r=2, short_names=[\"fast\", \"slow\"]\n",
    "    )\n",
    "    \n",
    "    # Generate entry signals when fast MA crosses above slow MA\n",
    "    entries = fast_ma.ma_crossed_above(slow_ma)\n",
    "    \n",
    "    # Generate exit signals when fast MA crosses below slow MA\n",
    "    exits = fast_ma.ma_crossed_below(slow_ma)\n",
    "    \n",
    "    # Run the backtest and return the Sharpe ratio\n",
    "    pf = vbt.Portfolio.from_signals(price, entries, exits, **kwargs)\n",
    "    return pf.sharpe_ratio()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "020c7750",
   "metadata": {
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "def get_best_index(performance, higher_better=True):\n",
    "    \"\"\"Returns the best performing index\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    performance : pd.Series\n",
    "        Performance metrics for each split\n",
    "    higher_better : bool, optional\n",
    "        Whether higher values are better, by default True\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    pd.Index\n",
    "        Index of the best performing parameters\n",
    "    \"\"\"\n",
    "    \n",
    "    if higher_better:\n",
    "        return performance[performance.groupby('split_idx').idxmax()].index\n",
    "    return performance[performance.groupby('split_idx').idxmin()].index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7e944870",
   "metadata": {
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "def get_best_params(best_index, level_name):\n",
    "    \"\"\"Returns the best parameters\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    best_index : pd.Index\n",
    "        Index of the best performing parameters\n",
    "    level_name : str\n",
    "        Name of the level to extract values from\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    np.ndarray\n",
    "        Best parameter values\n",
    "    \"\"\"\n",
    "    \n",
    "    return best_index.get_level_values(level_name).to_numpy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2c0a7d4e",
   "metadata": {
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "def simulate_best_params(price, best_fast_windows, best_slow_windows, **kwargs):\n",
    "    \"\"\"Returns Sharpe ratio for best parameters\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    price : pd.Series\n",
    "        Historical price data\n",
    "    best_fast_windows : np.ndarray\n",
    "        Best fast moving average windows\n",
    "    best_slow_windows : np.ndarray\n",
    "        Best slow moving average windows\n",
    "    kwargs : dict\n",
    "        Additional arguments for the Portfolio function\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    pd.Series\n",
    "        Sharpe ratios for the best parameters\n",
    "    \"\"\"\n",
    "    \n",
    "    # Run the moving average indicators with the best parameters\n",
    "    fast_ma = vbt.MA.run(price, window=best_fast_windows, per_column=True)\n",
    "    slow_ma = vbt.MA.run(price, window=best_slow_windows, per_column=True)\n",
    "    \n",
    "    # Generate entry signals when fast MA crosses above slow MA\n",
    "    entries = fast_ma.ma_crossed_above(slow_ma)\n",
    "    \n",
    "    # Generate exit signals when fast MA crosses below slow MA\n",
    "    exits = fast_ma.ma_crossed_below(slow_ma)\n",
    "    \n",
    "    # Run the backtest and return the Sharpe ratio\n",
    "    pf = vbt.Portfolio.from_signals(price, entries, exits, **kwargs)\n",
    "    return pf.sharpe_ratio()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8700077f",
   "metadata": {},
   "source": [
    "Get the Sharpe ratio of the strategy across all MA windows for in-sample data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a716951",
   "metadata": {},
   "outputs": [],
   "source": [
    "in_sharpe = simulate_all_params(\n",
    "    in_price, \n",
    "    windows, \n",
    "    direction=\"both\", \n",
    "    freq=\"d\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d87dbf9c",
   "metadata": {},
   "source": [
    "Find the best performing parameter index for in-sample data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7c423f29",
   "metadata": {},
   "outputs": [],
   "source": [
    "in_best_index = get_best_index(in_sharpe)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1331309d",
   "metadata": {},
   "source": [
    "Extract the best fast and slow moving average window values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "603dcbf3",
   "metadata": {},
   "outputs": [],
   "source": [
    "in_best_fast_windows = get_best_params(\n",
    "    in_best_index,\n",
    "    'fast_window'\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4f078e7d",
   "metadata": {},
   "outputs": [],
   "source": [
    "in_best_slow_windows = get_best_params(\n",
    "    in_best_index,\n",
    "    'slow_window'\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f629c91",
   "metadata": {},
   "source": [
    "Pair the best fast and slow moving average windows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a2c767ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "in_best_window_pairs = np.array(\n",
    "    list(\n",
    "        zip(\n",
    "            in_best_fast_windows, \n",
    "            in_best_slow_windows\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ff01a15",
   "metadata": {},
   "source": [
    "Use best parameters from in-sample ranges and simulate them for out-sample ranges"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5f1762ea",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_test_sharpe = simulate_best_params(\n",
    "    out_price, \n",
    "    in_best_fast_windows, \n",
    "    in_best_slow_windows, \n",
    "    direction=\"both\", \n",
    "    freq=\"d\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24130e6f",
   "metadata": {},
   "source": [
    "Extract the best in-sample Sharpe ratios"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9038e60e",
   "metadata": {},
   "outputs": [],
   "source": [
    "in_sample_best = in_sharpe[in_best_index].values"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9323f92",
   "metadata": {},
   "source": [
    "Extract the out-sample Sharpe ratios"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "92069cbb",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_sample_test = out_test_sharpe.values"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b699af77",
   "metadata": {},
   "source": [
    "Perform a t-test to compare in-sample and out-sample performance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75e3aa28",
   "metadata": {},
   "outputs": [],
   "source": [
    "t, p = stats.ttest_ind(\n",
    "    a=out_sample_test,\n",
    "    b=in_sample_best,\n",
    "    alternative=\"greater\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7fba885",
   "metadata": {},
   "source": [
    "Check if the p-value is greater than 0.05 to determine statistical significance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98184273",
   "metadata": {},
   "outputs": [],
   "source": [
    "p > 0.05"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "430c76ed",
   "metadata": {},
   "source": [
    "Print the t-statistic and p-value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c7f0dcd",
   "metadata": {},
   "outputs": [],
   "source": [
    "t, p"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b93cfef7",
   "metadata": {},
   "source": [
    "Plot the out-sample performance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ef9f240",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_sample.plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5cc318cf",
   "metadata": {},
   "source": [
    "Create a DataFrame to store cross-validation results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e014c89",
   "metadata": {},
   "outputs": [],
   "source": [
    "cv_results_df = pd.DataFrame({\n",
    "    'in_sample_median': in_sharpe.groupby('split_idx').median().values,\n",
    "    'out_sample_median': out_sharpe.groupby('split_idx').median().values,\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5fd42853",
   "metadata": {},
   "source": [
    "Plot the cross-validation results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "227948a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "color_schema = vbt.settings['plotting']['color_schema']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f297af27",
   "metadata": {},
   "outputs": [],
   "source": [
    "cv_results_df.vbt.plot(\n",
    "    trace_kwargs=[\n",
    "        dict(line_color=color_schema['blue']),\n",
    "        dict(line_color=color_schema['blue'], line_dash='dash'),\n",
    "        dict(line_color=color_schema['orange']),\n",
    "        dict(line_color=color_schema['orange'], line_dash='dash'),\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd55424d",
   "metadata": {},
   "source": [
    "<a href=\"https://pyquantnews.com/\">PyQuant News</a> is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to <a href=\"https://gettingstartedwithpythonforquantfinance.com/\">get started with Python for quant finance</a>. For educational purposes. Not investment advise. Use at your own risk."
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all",
   "main_language": "python",
   "notebook_metadata_filter": "-all"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}