strategy-lab/113_VectorBTPairsTrading.ipynb at d718ed61bd41f290bcc40430414eb8867e2b579a

dwker/strategy-lab

Fork 0

Files

David Brazda e3da60c647 daily update

2024-10-21 20:57:56 +02:00

13 KiB

Raw Blame History

No description has been provided for this image

In [ ]:

import time
from vectorbtpro import *
import pandas as pd
import scipy.stats as st
import statsmodels.tsa.stattools as ts  
import numpy as np
import warnings
warnings.filterwarnings("ignore")

Load and save S&P 500 tickers and their data¶

First, we load the S&P 500 tickers from Wikipedia and save their historical data if it doesn't already exist. We will store the data in an HDF5 file format.

In [ ]:

sp500_tickers = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]['Symbol'].tolist()

In [ ]:

COINT_FILE = "coint_pvalues.pickle"
POOL_FILE = "data_pool.h5"
START = "2015-01-01"
END = "2023-12-31"

In [ ]:

if not vbt.file_exists(POOL_FILE):
    with vbt.ProgressBar(total=len(sp500_tickers)) as pbar:
        collected = 0
        for symbol in sp500_tickers:
            try:
                data = vbt.YFData.pull(
                    symbol,
                    start=START,
                    end=END,
                    silence_warnings=True,
                )
                data.to_hdf(POOL_FILE)
                collected += 1
            except:
                pass
            pbar.set_prefix(f"{symbol} ({collected})")
            pbar.update()

We load the S&P 500 tickers from Wikipedia using pandas. We then check if the data file already exists. If it does not, we initialize a progress bar and start collecting historical data for each ticker using the vectorbtpro library's YFData.pull method. The collected data is then saved into an HDF5 file. If there is an error while collecting data for a ticker, we simply pass and move to the next ticker.

Filter and select valid symbols from the data¶

Next, we load the saved data, filter out any symbols with missing data, and keep only valid symbols. This ensures we work with complete datasets.

In [ ]:

data = vbt.HDFData.pull(
    POOL_FILE,
    start=START,
    end=END,
    silence_warnings=True
)

In [ ]:

data = data.select_symbols([
    k
    for k, v in data.data.items()
    if not v.isnull().any().any()
])

We load the saved data from the HDF5 file. We then iterate over each symbol's data and check for missing values. If a symbol has any missing values, it is excluded. This ensures that we only work with complete datasets, which is crucial for accurate analysis.

Identify cointegrated pairs of stocks¶

Now, we identify pairs of stocks that are cointegrated. Cointegration helps us find pairs of stocks that have a stable relationship over time, which is essential for pairs trading strategies.

In [ ]:

@vbt.parameterized(
    merge_func="concat",
    engine="pathos",
    distribute="chunks",
    n_chunks="auto"
)
def coint_pvalue(close, s1, s2):
    return ts.coint(np.log(close[s1]), np.log(close[s2]))[1]

In [ ]:

if not vbt.file_exists(COINT_FILE):
    coint_pvalues = coint_pvalue(
        data.close,
        vbt.Param(data.symbols, condition="s1 != s2"),
        vbt.Param(data.symbols)
    )
    vbt.save(coint_pvalues, COINT_FILE)
else:
    coint_pvalues = vbt.load(COINT_FILE)

In [ ]:

coint_pvalues = coint_pvalues.sort_values()
coint_pvalues.head(20)

We define a function to calculate the cointegration p-value between two stock price series. We use the parameterized decorator to parallelize the computation. If the cointegration p-values file does not exist, we calculate the p-values for all pairs of stocks and save the results. Otherwise, we load the saved p-values. We then sort the p-values in ascending order and display the top 20 pairs with the lowest p-values, indicating the strongest cointegration.

Analyze and visualize the selected stock pair¶

Finally, we choose a specific pair of stocks, analyze their price relationship, and visualize their spread and z-score. This helps us understand their mean-reverting behavior for potential trading opportunities.

In [ ]:

S1, S2 = "WYNN", "DVN"

In [ ]:

data.plot(column="Close", symbol=[S1, S2], base=1).show()

In [ ]:

S1_log = np.log(data.get("Close", S1))
S2_log = np.log(data.get("Close", S2))
log_diff = (S2_log - S1_log).rename("Log diff")
fig = log_diff.vbt.plot()
fig.add_hline(y=log_diff.mean(), line_color="yellow", line_dash="dot")
fig.show()

In [ ]:

data = vbt.YFData.pull(
    [S1, S2],
    start=START,
    end=END,
    silence_warnings=True,
)

In [ ]:

UPPER = st.norm.ppf(1 - 0.05 / 2)
LOWER = -st.norm.ppf(1 - 0.05 / 2)

In [ ]:

S1_close = data.get("Close", S1)
S2_close = data.get("Close", S2)
ols = vbt.OLS.run(S1_close, S2_close, window=vbt.Default(21))
spread = ols.error.rename("Spread")
zscore = ols.zscore.rename("Z-score")
print(pd.concat((spread, zscore), axis=1))

In [ ]:

upper_crossed = zscore.vbt.crossed_above(UPPER)
lower_crossed = zscore.vbt.crossed_below(LOWER)

In [ ]:

fig = zscore.vbt.plot()
fig.add_hline(y=UPPER, line_color="orangered", line_dash="dot")
fig.add_hline(y=0, line_color="yellow", line_dash="dot")
fig.add_hline(y=LOWER, line_color="limegreen", line_dash="dot")
upper_crossed.vbt.signals.plot_as_exits(zscore, fig=fig)
lower_crossed.vbt.signals.plot_as_entries(zscore, fig=fig)
fig.show()

In [ ]:

long_entries = data.symbol_wrapper.fill(False)
short_entries = data.symbol_wrapper.fill(False)

In [ ]:

short_entries.loc[upper_crossed, S1] = True
long_entries.loc[upper_crossed, S2] = True
long_entries.loc[lower_crossed, S1] = True
short_entries.loc[lower_crossed, S2] = True

In [ ]:

pf = vbt.Portfolio.from_signals(
    data,
    entries=long_entries,
    short_entries=short_entries,
    size=10,
    size_type="valuepercent100",
    group_by=True,
    cash_sharing=True,
    call_seq="auto"
)

In [ ]:

pf.stats()

We select two specific stocks and plot their closing prices. We then calculate their log-price difference and plot it to analyze their mean-reverting behavior. We pull the latest data for the selected stocks and calculate the spread and z-score using a rolling window OLS regression. We identify the points where the z-score crosses above or below the thresholds and plot these signals. Finally, we define long and short entry signals and create a portfolio based on these signals to evaluate its performance.

Your next steps¶

Try changing the selected stock pair to explore different cointegrated pairs. You can also adjust the z-score thresholds to see how it affects your trading signals. Experiment with different window sizes in the OLS regression to find an optimal setting for your strategy.

PyQuant News is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to get started with Python for quant finance. For educational purposes. Not investment advise. Use at your own risk.

13 KiB Raw Blame History

Load and save S&P 500 tickers and their data¶

Filter and select valid symbols from the data¶

Identify cointegrated pairs of stocks¶

Analyze and visualize the selected stock pair¶

Your next steps¶

13 KiB

Raw Blame History