Files

David Brazda b23a772836 remote fetch

2024-11-10 14:08:41 +01:00

32 KiB

Raw Blame History

Load data¶

Make sure you have .env file in ttools or any parent dir with your Alpaca keys.

ACCOUNT1_LIVE_API_KEY=api_key
ACCOUNT1_LIVE_SECRET_KEY=secret_key

Cache directories¶

Daily trade files - DATADIR/tradecache Agg data cache - DATADIR/aggcache

DATADIR - user_data_dir from appdirs library - see config.py

In [1]:

import pandas as pd
import numpy as np
from ttools.utils import AggType
from datetime import datetime
from ttools.aggregator_vectorized import generate_time_bars_nb, aggregate_trades
from ttools.loaders import load_data, prepare_trade_cache, fetch_daily_stock_trades
from ttools.utils import zoneNY
import vectorbtpro as vbt
from lightweight_charts import PlotDFAccessor, PlotSRAccessor


vbt.settings.set_theme("dark")
vbt.settings['plotting']['layout']['width'] = 1280
vbt.settings.plotting.auto_rangebreaks = True
# Set the option to display with pagination
pd.set_option('display.notebook_repr_html', True)
pd.set_option('display.max_rows', 10)  # Number of rows per page

TTOOLS: Loaded env variables from file /Users/davidbrazda/Documents/Development/python/.env

Fetching aggregated data¶

Available aggregation types:

time based bars - AggType.OHLCV
volume based bars - AggType.OHLCV_VOL, resolution = volume threshold
dollar based bars - AggType.OHLCV_DOL, resolution = dollar threshold
renko bars - AggType.OHLCV_RENKO resolution = bricksize

In [7]:

#This is how to call LOAD function
symbol = ["SPY"]
#datetime in zoneNY 
day_start = datetime(2024, 9, 15, 9, 30, 0)
day_stop = datetime(2024, 10, 20, 16, 0, 0)
day_start = zoneNY.localize(day_start)
day_stop = zoneNY.localize(day_stop)

#requested AGG
resolution = 12 #12s bars
agg_type = AggType.OHLCV #other types AggType.OHLCV_VOL, AggType.OHLCV_DOL, AggType.OHLCV_RENKO
exclude_conditions = ['C','O','4','B','7','V','P','W','U','Z','F','9','M','6'] #None to defaults
minsize = 100 #min trade size to include
main_session_only = False
force_remote = False

data = load_data(symbol = symbol,
                     agg_type = agg_type,
                     resolution = resolution,
                     start_date = day_start,
                     end_date = day_stop,
                     #exclude_conditions = None,
                     minsize = minsize,
                     main_session_only = main_session_only,
                     force_remote = force_remote,
                     return_vbt = True, #returns vbt object
                     verbose = False
                     )
data.ohlcv.data[symbol[0]]
#data.ohlcv.data[symbol[0]].lw.plot()

Out[7]:

	open	high	low	close	volume
time
2024-09-16 04:01:24-04:00	562.22	562.22	562.22	562.22	200.0
2024-09-16 04:02:24-04:00	562.17	562.17	562.17	562.17	293.0
2024-09-16 04:04:36-04:00	562.54	562.54	562.54	562.54	100.0
2024-09-16 04:10:00-04:00	562.39	562.39	562.39	562.39	102.0
2024-09-16 04:10:24-04:00	562.44	562.44	562.44	562.44	371.0
...	...	...	...	...	...
2024-10-18 19:57:24-04:00	584.80	584.80	584.80	584.80	100.0
2024-10-18 19:57:48-04:00	584.84	584.84	584.84	584.84	622.0
2024-10-18 19:58:48-04:00	584.77	584.79	584.77	584.79	4158.0
2024-10-18 19:59:36-04:00	584.80	584.82	584.80	584.82	298.0
2024-10-18 19:59:48-04:00	584.76	584.76	584.72	584.72	258.0

64218 rows × 5 columns

In [ ]:

data.ohlcv.data[symbol[0]]

Prepare daily trade cache¶

This is how to prepare trade cache for given symbol and period (if daily trades are not cached they are remotely fetched.)

In [ ]:

symbols = ["BAC", "AAPL"]
#datetime in zoneNY 
day_start = datetime(2024, 10, 1, 9, 45, 0)
day_stop = datetime(2024, 10, 27, 15, 1, 0)
day_start = zoneNY.localize(day_start)
day_stop = zoneNY.localize(day_stop)
force_remote = False

prepare_trade_cache(symbols, day_start, day_stop, force_remote, verbose = True)

Prepare daily trade cache - cli script¶

Python script prepares trade cache for specified symbols and date range.

Usually 1 day takes about 35s. It is stored in /tradescache/ directory as daily file keyed by symbol.

To run this script in the background with specific arguments:

# Running without forcing remote fetch
python3 prepare_cache.py --symbols BAC AAPL --day_start 2024-10-14 --day_stop 2024-10-18 &

# Running with force_remote set to True
python3 prepare_cache.py --symbols BAC AAPL --day_start 2024-10-14 --day_stop 2024-10-18 --force_remote &

Aggregated data are stored per symbol, date range and conditions. If requested dates are matched with existing stored data with same conditions but wider data spans they are loaded from this file.

This is the matching part:

In [ ]:

from ttools.utils import list_matching_files, print_matching_files_info, zoneNY
from datetime import datetime
from ttools.config import AGG_CACHE

# Find all files covering January 15, 2024 9:30 to 16:00
files = list_matching_files(
    symbol='SPY',
    resolution="1",
    agg_type='AggType.OHLCV',
    start_date=datetime(2024, 1, 15, 9, 30),
    end_date=datetime(2024, 1, 15, 16, 0)
)

#print_matching_files_info(files)

# Example with all parameters specified
specific_files = list_matching_files(
    symbol="SPY",
    agg_type="AggType.OHLCV",
    resolution="12",
    start_date=zoneNY.localize(datetime(2024, 1, 15, 9, 30)),
    end_date=zoneNY.localize(datetime(2024, 1, 15, 16, 0)),
    excludes_str="4679BCFMOPUVWZ",
    minsize=100,
    main_session_only=True
)

print_matching_files_info(specific_files)

From this file the subset of dates are loaded. Usually this is all done automatically by load_data in loader.

In [1]:

#loading manually range subset from existing files
start = zoneNY.localize(datetime(2024, 1, 15, 9, 30))
end = zoneNY.localize(datetime(2024, 10, 20, 16, 00))

ohlcv_df = pd.read_parquet(
    AGG_CACHE / "SPY-AggType.OHLCV-1-2024-01-15T09-30-00-2024-10-20T16-00-00-4679BCFMOPUVWZ-100-True.parquet", 
    engine='pyarrow',
    filters=[('time', '>=', start), 
            ('time', '<=', end)]
)

ohlcv_df

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 2
      1 #loading manually range subset from existing files
----> 2 start = zoneNY.localize(datetime(2024, 1, 15, 9, 30))
      3 end = zoneNY.localize(datetime(2024, 10, 20, 16, 00))
      5 ohlcv_df = pd.read_parquet(
      6     AGG_CACHE / "SPY-AggType.OHLCV-1-2024-01-15T09-30-00-2024-10-20T16-00-00-4679BCFMOPUVWZ-100-True.parquet", 
      7     engine='pyarrow',
      8     filters=[('time', '>=', start), 
      9             ('time', '<=', end)]
     10 )

NameError: name 'zoneNY' is not defined

In [1]:

from ttools.loaders import fetch_daily_stock_trades, fetch_trades_parallel
from ttools.utils import zoneNY
from datetime import datetime

TTOOLS: Loaded env variables from file /Users/davidbrazda/Documents/Development/python/.env

Fetching trades for whole range¶

In [2]:

#fethcing one day
# df = fetch_daily_stock_trades(symbol="SPY",
#                               start=zoneNY.localize(datetime(2024, 1, 16, 9, 30)),
#                               end=zoneNY.localize(datetime(2024, 1, 16, 16, 00)))
# df.info()

#fetching multiple days with parallel
df = fetch_trades_parallel(symbol="BAC",
                              start_date=zoneNY.localize(datetime(2024, 1, 16, 0, 0)),
                              end_date=zoneNY.localize(datetime(2024, 1, 16, 23, 59)),
                              main_session_only=False,
                              exclude_conditions=None,
                              minsize=None,
                              force_remote=True)

df.info()

BAC Contains 1  market days

BAC Remote fetching: 100%|██████████| 1/1 [00:00<00:00, 434.55it/s]

Fetching from remote.

BAC Receiving trades:   0%|          | 0/1 [00:00<?, ?it/s]

Remote fetched completed whole day 2024-01-16
Exact UTC range fetched: 2024-01-16 05:00:00+00:00 - 2024-01-17 04:59:59.999999+00:00

BAC Receiving trades: 100%|██████████| 1/1 [00:42<00:00, 42.76s/it]

Saved to CACHE /Users/davidbrazda/Library/Application Support/v2realbot/tradecache/BAC-2024-01-16.parquet
Trimming 2024-01-16 00:00:00-05:00 2024-01-16 23:59:00-05:00
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 222754 entries, 2024-01-16 04:00:00.009225-05:00 to 2024-01-16 19:59:48.834830-05:00
Data columns (total 6 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   x       222754 non-null  object 
 1   p       222754 non-null  float64
 2   s       222754 non-null  int64  
 3   i       222754 non-null  int64  
 4   c       222754 non-null  object 
 5   z       222754 non-null  object 
dtypes: float64(1), int64(2), object(3)
memory usage: 11.9+ MB

In [22]:

df

Out[22]:

	x	p	s	i	c	z
t
2024-01-16 04:00:00.009225-05:00	K	32.800	1	52983525027912	[ , T, I]	A
2024-01-16 04:00:00.012088-05:00	P	32.580	8	52983525027890	[ , T, I]	A
2024-01-16 04:00:02.299262-05:00	P	32.750	1	52983525027916	[ , T, I]	A
2024-01-16 04:00:03.895322-05:00	P	32.640	1	52983525027920	[ , T, I]	A
2024-01-16 04:00:04.145553-05:00	P	32.740	1	52983525027921	[ , T, I]	A
...	...	...	...	...	...	...
2024-01-16 18:58:10.081270-05:00	D	32.104	10	79371957716549	[ , T, I]	A
2024-01-16 18:58:11.293971-05:00	T	32.090	3	62883460503386	[ , T, I]	A
2024-01-16 18:58:24.511348-05:00	D	32.110	1	79371957716560	[ , T, I]	A
2024-01-16 18:58:46.648899-05:00	D	32.110	1	79371957716786	[ , T, I]	A
2024-01-16 18:59:54.013894-05:00	D	32.100	1	71710070428229	[ , T, I]	A

159301 rows × 6 columns

In [3]:

#comparing dataframes
from ttools.utils import AGG_CACHE, compare_dataframes
import pandas as pd
file1 = AGG_CACHE / "SPY-AggType.OHLCV-1-2024-02-15T09-30-00-2024-10-20T16-00-00-4679BCFMOPUVWZ-100-False.parquet"
file2 = AGG_CACHE / "SPY-AggType.OHLCV-1-2024-02-15T09-30-00-2024-10-20T16-00-00-4679BCFMOPUVWZ-100-False_older2.parquet"
df1 = pd.read_parquet(file1)
df2 = pd.read_parquet(file2)
df1.equals(df2)

#compare_dataframes(df1, df2)

Out[3]:

True

In [5]:

from ttools.config import TRADE_CACHE
import pandas as pd
file1 = TRADE_CACHE / "BAC-2024-01-16.parquet"
df1 = pd.read_parquet(file1)

In [8]:

df1

Out[8]:

	x	p	s	i	c	z
t
2024-01-16 04:00:00.009225-05:00	K	32.80	1	52983525027912	[ , T, I]	A
2024-01-16 04:00:00.012088-05:00	P	32.58	8	52983525027890	[ , T, I]	A
2024-01-16 04:00:00.856156-05:00	K	32.61	14	52983525028705	[ , F, T, I]	A
2024-01-16 04:00:02.299262-05:00	P	32.75	1	52983525027916	[ , T, I]	A
2024-01-16 04:00:03.895322-05:00	P	32.64	1	52983525027920	[ , T, I]	A
...	...	...	...	...	...	...
2024-01-16 19:59:24.796862-05:00	P	32.12	500	52983576997941	[ , T]	A
2024-01-16 19:59:24.796868-05:00	P	32.12	500	52983576997942	[ , T]	A
2024-01-16 19:59:24.796868-05:00	P	32.12	500	52983576997943	[ , T]	A
2024-01-16 19:59:24.796871-05:00	P	32.12	500	52983576997944	[ , T]	A
2024-01-16 19:59:48.834830-05:00	K	32.10	25	52983526941511	[ , T, I]	A

222754 rows × 6 columns

32 KiB Raw Blame History Unescape Escape

Load data¶

Cache directories¶

Fetching aggregated data¶

Prepare daily trade cache¶

Prepare daily trade cache - cli script¶

Fetching trades for whole range¶

32 KiB

Raw Blame History