13 KiB
import time from vectorbtpro import * import pandas as pd import scipy.stats as st import statsmodels.tsa.stattools as ts import numpy as np import warnings warnings.filterwarnings("ignore")
Load and save S&P 500 tickers and their data¶
First, we load the S&P 500 tickers from Wikipedia and save their historical data if it doesn't already exist. We will store the data in an HDF5 file format.
sp500_tickers = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]['Symbol'].tolist()
COINT_FILE = "coint_pvalues.pickle" POOL_FILE = "data_pool.h5" START = "2015-01-01" END = "2023-12-31"
if not vbt.file_exists(POOL_FILE): with vbt.ProgressBar(total=len(sp500_tickers)) as pbar: collected = 0 for symbol in sp500_tickers: try: data = vbt.YFData.pull( symbol, start=START, end=END, silence_warnings=True, ) data.to_hdf(POOL_FILE) collected += 1 except: pass pbar.set_prefix(f"{symbol} ({collected})") pbar.update()
We load the S&P 500 tickers from Wikipedia using pandas. We then check if the data file already exists. If it does not, we initialize a progress bar and start collecting historical data for each ticker using the vectorbtpro library's YFData.pull method. The collected data is then saved into an HDF5 file. If there is an error while collecting data for a ticker, we simply pass and move to the next ticker.
Filter and select valid symbols from the data¶
Next, we load the saved data, filter out any symbols with missing data, and keep only valid symbols. This ensures we work with complete datasets.
data = vbt.HDFData.pull( POOL_FILE, start=START, end=END, silence_warnings=True )
data = data.select_symbols([ k for k, v in data.data.items() if not v.isnull().any().any() ])
We load the saved data from the HDF5 file. We then iterate over each symbol's data and check for missing values. If a symbol has any missing values, it is excluded. This ensures that we only work with complete datasets, which is crucial for accurate analysis.
Identify cointegrated pairs of stocks¶
Now, we identify pairs of stocks that are cointegrated. Cointegration helps us find pairs of stocks that have a stable relationship over time, which is essential for pairs trading strategies.
@vbt.parameterized( merge_func="concat", engine="pathos", distribute="chunks", n_chunks="auto" ) def coint_pvalue(close, s1, s2): return ts.coint(np.log(close[s1]), np.log(close[s2]))[1]
if not vbt.file_exists(COINT_FILE): coint_pvalues = coint_pvalue( data.close, vbt.Param(data.symbols, condition="s1 != s2"), vbt.Param(data.symbols) ) vbt.save(coint_pvalues, COINT_FILE) else: coint_pvalues = vbt.load(COINT_FILE)
coint_pvalues = coint_pvalues.sort_values() coint_pvalues.head(20)
We define a function to calculate the cointegration p-value between two stock price series. We use the parameterized decorator to parallelize the computation. If the cointegration p-values file does not exist, we calculate the p-values for all pairs of stocks and save the results. Otherwise, we load the saved p-values. We then sort the p-values in ascending order and display the top 20 pairs with the lowest p-values, indicating the strongest cointegration.
Analyze and visualize the selected stock pair¶
Finally, we choose a specific pair of stocks, analyze their price relationship, and visualize their spread and z-score. This helps us understand their mean-reverting behavior for potential trading opportunities.
S1, S2 = "WYNN", "DVN"
data.plot(column="Close", symbol=[S1, S2], base=1).show()
S1_log = np.log(data.get("Close", S1)) S2_log = np.log(data.get("Close", S2)) log_diff = (S2_log - S1_log).rename("Log diff") fig = log_diff.vbt.plot() fig.add_hline(y=log_diff.mean(), line_color="yellow", line_dash="dot") fig.show()
data = vbt.YFData.pull( [S1, S2], start=START, end=END, silence_warnings=True, )
UPPER = st.norm.ppf(1 - 0.05 / 2) LOWER = -st.norm.ppf(1 - 0.05 / 2)
S1_close = data.get("Close", S1) S2_close = data.get("Close", S2) ols = vbt.OLS.run(S1_close, S2_close, window=vbt.Default(21)) spread = ols.error.rename("Spread") zscore = ols.zscore.rename("Z-score") print(pd.concat((spread, zscore), axis=1))
upper_crossed = zscore.vbt.crossed_above(UPPER) lower_crossed = zscore.vbt.crossed_below(LOWER)
fig = zscore.vbt.plot() fig.add_hline(y=UPPER, line_color="orangered", line_dash="dot") fig.add_hline(y=0, line_color="yellow", line_dash="dot") fig.add_hline(y=LOWER, line_color="limegreen", line_dash="dot") upper_crossed.vbt.signals.plot_as_exits(zscore, fig=fig) lower_crossed.vbt.signals.plot_as_entries(zscore, fig=fig) fig.show()
long_entries = data.symbol_wrapper.fill(False) short_entries = data.symbol_wrapper.fill(False)
short_entries.loc[upper_crossed, S1] = True long_entries.loc[upper_crossed, S2] = True long_entries.loc[lower_crossed, S1] = True short_entries.loc[lower_crossed, S2] = True
pf = vbt.Portfolio.from_signals( data, entries=long_entries, short_entries=short_entries, size=10, size_type="valuepercent100", group_by=True, cash_sharing=True, call_seq="auto" )
pf.stats()
We select two specific stocks and plot their closing prices. We then calculate their log-price difference and plot it to analyze their mean-reverting behavior. We pull the latest data for the selected stocks and calculate the spread and z-score using a rolling window OLS regression. We identify the points where the z-score crosses above or below the thresholds and plot these signals. Finally, we define long and short entry signals and create a portfolio based on these signals to evaluate its performance.
Your next steps¶
Try changing the selected stock pair to explore different cointegrated pairs. You can also adjust the z-score thresholds to see how it affects your trading signals. Experiment with different window sizes in the OLS regression to find an optimal setting for your strategy.
PyQuant News is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to get started with Python for quant finance. For educational purposes. Not investment advise. Use at your own risk.
