Learning the Quantstrat and Blotter packages
A small tutorial (mostly for myself) to understand the functionality of blotter and quantstrat
Load packages:
library(knitr)
library(kableExtra)
library(dplyr)
library(ggplot2)
library(quantstrat)
::: Note ::: This post is mostly for my future reference/documentation for learning the quantstrat
package. An example of a strategy I developed can be found below in which it uses a naive rolling logistic regression model trained on t
days to predict t+1
market movement. ::: END Note :::
I have been playing around with backtesting trading models using the quantstrat
package for a while but the most difficult thing about it was understanding the syntax of blotter
and quantstrat
, it didn’t seem very intuitive to me at first and there does not seem to be much detailed information online, despite the package being around since 2010. In this post I will give my comments and observations what certain functions do, I will update this post over time.
Backgorund
The quantstrat
package is built on the blotter
package which was developed in 2008. It works best with time series xts
objects which can be easily collected using the quantmod
package. The blotter
package is the accounting package behind the quantstrat
system, it can support multiple accounts or multiple portfolios and computes the P&L of trading systems.
The main blotter functions are the following:
Initialisation
initPortf
- which initialises a portfolioinitAcct
- which initialises an account
Performance
addTxn
- which adds transactions to the portfolioupdatePortf
- which computes the P&L for each symbol in a given periodupdateAcct
- which computes the equity from the assetsupdateEndEq
- which updates the final equity for an accountgetEndEq
- which provides us with the latest value of our accountgetPosQty
- which gets a position at a given date.chart.Posn
- which plots a chart of the position size, and cumulative P&LPortfReturns
- which calcualtes the portfolio returnsgetAccount
- which gets our account info!getPortfolio
- ""getTxns
- ""tradeStats
- which collects trade statisticsperTradeStats
- which calculates per trade statistics
The blotter
package loads in a number of additional packages as we can see by running the below.
The packages are: xts
and zoo
for time series, FinancialInstrument
, quantmod
, TTR
, PerformanceAnalytics
for finance, where TTR
stands for Technical Trading Rules.
The blotter package creates a new environment .blotter
.
Download Financial Data
Lets first download some data for the S&P500 from 2018 to 2019 in order to gather some data. For the quantstrat
package its quite usual to find the initiation date, start date, end date set outside the parameters of the model. Therefore I first set these parameters as initDate
, startDate
and endDate
.
initDate <- '2010-01-01' # this is used for later but is must be before the startDate
startDate <- '2018-01-01'
endDate <- '2019-01-01'
symbols <- c('SPY')
getSymbols(symbols, from = startDate, to = endDate, src = "yahoo", adjust=TRUE)
## [1] "SPY"
SPY.Open | SPY.High | SPY.Low | SPY.Close | SPY.Volume | SPY.Adjusted | |
---|---|---|---|---|---|---|
2018-01-02 | 262.8473 | 263.7992 | 262.4155 | 263.7599 | 86655700 | 260.1310 |
2018-01-03 | 263.9464 | 265.5951 | 263.9464 | 265.4283 | 90070400 | 261.7763 |
2018-01-04 | 266.1447 | 267.0868 | 265.4970 | 266.5470 | 80636400 | 262.8796 |
2018-01-05 | 267.4302 | 268.4606 | 266.8807 | 268.3233 | 83524000 | 264.6314 |
2018-01-08 | 268.2153 | 268.9906 | 267.8915 | 268.8140 | 57319200 | 265.1154 |
2018-01-09 | 269.2850 | 270.1191 | 268.9709 | 269.4224 | 57254000 | 265.7154 |
Plot the data
chartSeries(SPY, name = "Daily time series for SPY", type = "candlesticks", theme = chartTheme("white"))
Once we have the data and after loading the blotter
package we must define a few initialisation parameters, namely the currency()
and stocks
parameter from the FinancialInstrument
package.
currency("USD")
## [1] "USD"
stock("SPY", currency = "USD", multiplier = 1)
## [1] "SPY"
ls(all = TRUE)
## [1] ".blotter" ".getSymbols" ".Random.seed" ".strategy"
## [5] "endDate" "initDate" "SPY" "startDate"
## [9] "symbols"
ls(envir = FinancialInstrument:::.instrument)
## [1] "SPY" "USD"
We can see that we have the SPY index and the USD currency set. We can convert the data from daily time series to monthly time series using the to.period
fucntion.
SPY_monthly <- to.period(SPY, period = "months")
chartSeries(SPY_monthly, name = "Monthly time series for SPY", type = "candlesticks", theme = chartTheme("white"))
head(SPY_monthly)
## SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
## 2018-01-31 262.8473 281.2870 262.4155 276.6452 1985506700 272.8389
## 2018-02-28 275.8307 277.7836 248.2054 266.5862 2923722000 262.9184
## 2018-03-29 266.3507 275.1830 254.0372 259.2790 2323561800 255.7117
## 2018-04-30 258.6878 267.3091 250.9237 260.6190 1998466500 257.0332
## 2018-05-31 259.9884 270.2157 255.2393 266.9544 1606397200 263.2814
## 2018-06-29 268.4028 275.3688 265.7283 268.4896 1599001000 264.7955
Portfolio parameters
Now that we have our data we should initialise the portfolio with initPortf
, which will consist of the transactions over the period of analysis.
name
- “MyFirstPortfolio” - the initial name of the portfoliosymbols
- “SPY” - since we are using the SPY500initPosQty
- 100 - the initial quantity of our positioninitDate
- “initDate” - the initial account equity and position (prior to the closing price of our first position)currency
the currency we are using
initDate <- '2010-01-01' # NOTE: We already created this parameter --> which was quoted as "this is used for later but is must be before the startDate"
initPortf("MyFirstPortfolio", "SPY", initDate = initDate)
## [1] "MyFirstPortfolio"
Account parameters
We also need to initialise the account using initAcct
name
- “MyFirstPortfolio” - the initial name of the portfolioportfolios
- the name of our previous portfolio createdinitDate
- “initDate” - the initial account equity and position (prior to the closing price of our first position)initEq
- the initial equity we began withcurrency
the currency we are using
initEq = 1000000
initAcct("MyFirstPortfolio", portfolios = "MyFirstPortfolio", initDate = initDate, initEQ = initEq)
## [1] "MyFirstPortfolio"
first(SPY) # print the first observation in our data
## SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
## 2018-01-02 262.8473 263.7992 262.4155 263.7599 86655700 260.131
last(SPY) # print the last observation in our data
## SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
## 2018-12-31 249.56 250.19 247.47 249.92 144299400 246.4814
A simple Strategy
A trading strategy can be broken down into the following blocks:
- Assets - which are the stocks we want to trade,
SPY
,MSFT
,AAPL
,MSFT
etc. - Indicators - which are the variables, Open, High, Low, Close, SMA, EMA, RSI, Momentum etc.
- Signals - we can create signals based on the interaction between the indicators and the time-series data.
- Rules - Create buy and sell rules based on the signals created.
- Orders - Once a rule is activated, push an order through.
- Analysis - Analyse the performance of the strategy.
I wanted to test a basic backtesting concept. What happens if I trained a machine learning model each day on the last 100 days of data to predict the next days stock market direction but test this over many periods?. That is to continuously train a machine learning model at every step to predict the next price. One method is to use the rolling_origin
fucntion from the rsample
package but I write a more simple function for this.
I load in the parameters of the model and download data for MSFT.
library(quantstrat)
library(PerformanceAnalytics)
library(e1071)
initDate = "2010-01-01"
from <- "2018-01-01"
to <- "2019-09-20"
init_equity <- 1000
adjustment <- TRUE
.orderqty <- 10 # The profitability of the strategy depends heavily on this value
.txnfees <- -10
.stoploss <- 3e-3 # 0.003 or 0.3%
currency('USD')
## [1] "USD"
Sys.setenv(TZ="UTC")
symbols <- c('MSFT')
getSymbols(symbols, from = from, to = to, src = "yahoo", adjust = TRUE)
## [1] "MSFT"
I created a simple time series pre-processing function to clean up the data, create some features and set the data to xts
. Ignore the Scale_Me
function.
Scale_Me <- function(x){
(x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}
TS_preprocess <- function(dat){
dat = data.frame(dat)
colnames(dat) = c("open", "high", "low", "close", "volume", "adjusted")
dat$Y = with(dat, ifelse(close >= open, 1, 0))
dat$X1 = SMA(lag(dat$close), n = 10)
dat$X2 = RSI(lag(dat$close), nFast = 14, nSlow = 26, nSig = 9, maType = SMA)
dat$X3 = momentum(lag(dat$close), n = 12)
dat = dat[complete.cases(dat), ]
#dat = cbind(dat[, 'Y'], apply(dat[, 8:ncol(dat)], 2, Scale_Me))
#colnames(dat)[1] = "Y"
dat = dat[, c("Y", "X1", "X2", "X3")]
dat = as.xts(dat)
return(dat)
}
I next set the training period n_train
= 100 periods and n_test
= 1 period, or train on t
days of data and test of t+1
days of data. This can be quite computationally expensive depending on the machine learning model you input but for a simple binary logistic regression classifier its relatively fast. I create the logistic function which returns the predicted probabilities.
The RollingBacktest
function, runs the model on n_train
periods of data and makes a prediction on n_test
. That is, say we have 1000 days of data, the model will train on the first 100 days and predict on day 101, then retrain on days 2 to 101 days and predict on day 102 and so on, continuing until all 1000 days have passed.
df <- TS_preprocess(MSFT)
n_train <- 100
n_test <- 1
LogistFun <- function(frm, dat, trainIndex, testIndex){
LogitModel <- glm(frm, data = dat[trainIndex, ])
pred <- predict(LogitModel, newdata = dat[testIndex, ], type = 'response')
return(pred)
}
RollingBacktest <- function(dat, ntrain = n_train, ntest = n_test){
stopifnot('Y' %in% names(dat))
frm_ <- formula(reformulate(paste0("X", seq(2:ncol(dat))), "Y"))
stride <- ntrain + ntest
startPosn <- seq(1, dim(dat)[1] - stride)
train_index_list <- lapply(startPosn, function(i) seq(i, i + ntrain))
test_index_list <- lapply(startPosn, function(i) seq((i + ntrain + 1), (i + ntrain + ntest)))
mapply(LogistFun, trainIndex = train_index_list, testIndex = test_index_list, MoreArgs = list(frm = frm_, dat = dat), SIMPLIFY = FALSE
)
}
Now that I have the pre-process function, the logistic function and the backtest function, we can run the model through the TS_postprocess
function, which applies everything.
TS_postprocess <- function(dat, ntrain){
results = tail(dat, -(ntrain + 1))
results$probs <- RollingBacktest(dat)
results$predictions <- ifelse(results$probs > 0.6, 1, 0)
print(paste0("Model Accuracy at the 0.60 prob cut-off ", mean(results$Y == results$predictions)))
return(results)
}
out <- TS_postprocess(df, ntrain = n_train)
## [1] "Model Accuracy at the 0.60 prob cut-off 0.518987341772152"
out <- na.omit(cbind(MSFT, out))
The model is not very accurate…
The na.omit
removes the NA
values which were created from the SMA
, RSI
and momentum
calculations which are the first few observations of the time series data.
The data looks like:
out %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
MSFT.Open | MSFT.High | MSFT.Low | MSFT.Close | MSFT.Volume | MSFT.Adjusted | Y | X1 | X2 | X3 | probs | predictions |
---|---|---|---|---|---|---|---|---|---|---|---|
97.73225 | 99.05627 | 97.58513 | 98.91896 | 28653100 | 98.91896 | 1 | 99.33186 | 63.01370 | 1.98113024 | 0.6709360 | 1 |
99.41915 | 100.54701 | 99.17396 | 99.90953 | 26180800 | 99.90952 | 1 | 99.20142 | 59.54048 | 0.06865286 | 0.7426882 | 1 |
100.11548 | 100.48817 | 98.93857 | 99.19357 | 23198200 | 99.19357 | 0 | 99.14061 | 63.88638 | 0.19615593 | 0.7523711 | 1 |
98.47763 | 98.83069 | 97.71263 | 98.47763 | 38923100 | 98.47763 | 1 | 99.16611 | 51.80598 | -1.02979581 | 0.7586200 | 1 |
98.07551 | 98.18340 | 95.42748 | 96.49649 | 35433300 | 96.49649 | 0 | 99.04646 | 43.39626 | -2.03996476 | 0.7851164 | 1 |
96.91822 | 98.15397 | 96.84957 | 97.17322 | 26897200 | 97.17323 | 1 | 98.78558 | 32.78984 | -2.44207828 | 0.6282847 | 1 |
We can see that the model outputs predicted probabilities, I simply set the predictions
column to give a 1
if the models predicted probability is > 0.5
or a 0
if it is < 0.5
. The Y
variable is the observed and the X1
, X2
and X3
variables are the SMA
, RSI
and momentum
.
We can now use the quantstrat
package to backtest the model and see how the performance went.
MSFT <- out
stock("MSFT", currency = "USD", multiplier = 1)
## [1] "MSFT"
strategy.st <- portfolio.st <- account.st <- "RollingLogitStrategy"
rm.strat(strategy.st)
rm.strat(portfolio.st)
rm.strat(account.st)
initPortf(name = portfolio.st,
symbols = symbols,
initDate = initDate,
currency = 'USD')
## [1] "RollingLogitStrategy"
initAcct(name = account.st,
portfolios = portfolio.st,
initDate = initDate,
currency = 'USD',
initEq = init_equity)
## [1] "RollingLogitStrategy"
initOrders(portfolio.st,
symbols = symbols,
initDate = initDate)
strategy(strategy.st, store = TRUE)
I add the signals to the strategy and give it some rules.
- 1a)
The signal is that the first time the Logistic model produces a probability greater than 0.6
then assign the signal. The sigThreshold
is a quantstrat
function, the others are sigComparison
, sigCrossover
and sigFormula
. It calls upon the threshold
value in the list
of arguments
, the column it looks for is the probs
column which the predicted probabilities are output to, gt
means greater than. It basically creates a new column called logSig
and it would be similar to ifelse(df$probs > 0.6, 1, 0)
as far as I understand.
- 1b)
The rule for the strategy is to take
sigcol
which is thelabel
we gave our signal in the previous lines which is calledlongSig
. IflongSig = 1
, execute a market order going long, buying at the next days open price, the transaction fees were set at the start of the strategy. We call thisrule
,EnterLONG
.
nMult_orderqty <- 2
addPosLimit(portfolio.st, symbol = "MSFT", timestamp = initDate, maxpos = nMult_orderqty * .orderqty)
# Objective: Buy when the probability is gt 0.60, using cross = TRUE
# 1.a)
add.signal(strategy = strategy.st,
name = "sigThreshold",
arguments = list(threshold = 0.6,
column = "probs",
relationship = "gt",
cross = TRUE),
label = "longSig")
## [1] "RollingLogitStrategy"
# 1.b)
# # Adding the rules, enter at the open price when prob > 0.60 for the first time, taking transaction fees into account
add.rule(strategy = strategy.st,
name = "ruleSignal",
arguments = list(sigcol = "longSig", # check the ifelse predictions statement
sigval = 1,
orderqty = .orderqty,
ordertype = "market",
orderside = "long",
osFUN = osMaxPos,
prefer = "Open",
replace = TRUE,
TxnFees = .txnfees),
type = "enter",
label = "EnterLONG")
## [1] "RollingLogitStrategy"
From the Logistic model we have added our signal to buy when the probability of the next days price is greater than 0.60
. So far we just keep buying when the probability of the logistic model passes over 0.60
but we have no position to sell when the model predicts something different.
I thought it might be interesting to exit the strategy when the model is undecided or makes a very low predicted probability. I set this threshold to be less than 0.4 based on the probability density plot below. So we are making trades at the tail ends of the distribution, buying when the model is confident and selling when it is unsure.
plot(density(out$probs))
Add the signals and rules for exiting the strategy. Using a similar principle as before, I create a signal
when the probability is less than 0.4 and call it exitlongSig
.
- Careful here when setting
type = "exit"
andorderside = "long"
. Previously had it set totype = exit
andorderside = short
! which is completely wrong.
# 2.a) # create the signal of when we should be looking to exit
# #exit when prob drops below 0.4 for the first time
add.signal(strategy = strategy.st,
name = "sigThreshold",
arguments = list(threshold = 0.4,
column = "probs",
relationship = "lt",
cross = TRUE),
label = "exitlongSig")
## [1] "RollingLogitStrategy"
# 2.b) # Add that signal to the rule of exiting
add.rule(strategy.st,
name = "ruleSignal",
arguments = list(sigcol = "exitlongSig",
sigval = 1,
orderqty = "all",
ordertype = "market",
orderside = "long",
osFUN = osMaxPos,
prefer = "Open",
replace = TRUE,
TxnFees = .txnfees),
type = "exit",
label = "ExitLong")
## [1] "RollingLogitStrategy"
We can finally apply the strategy and see how it performs. It doesn’t make many transactions due to the backtesting being quite restrictive.
applyStrategy(strategy.st, portfolios = portfolio.st)
## [1] "2018-07-05 00:00:00 MSFT 10 @ 97.5851340312762"
## [1] "2018-07-11 00:00:00 MSFT 10 @ 99.2033819340086"
## [1] "2018-12-06 00:00:00 MSFT -20 @ 104.632969498356"
## [1] "2018-12-19 00:00:00 MSFT 10 @ 102.487313341245"
## [1] "2019-01-09 00:00:00 MSFT 10 @ 102.694956688076"
## [1] "2019-02-21 00:00:00 MSFT -20 @ 106.15227613615"
## [1] "2019-04-11 00:00:00 MSFT 10 @ 119.69686840234"
## [1] "2019-04-24 00:00:00 MSFT 10 @ 124.91014659961"
## [1] "2019-06-13 00:00:00 MSFT -20 @ 131.541967172209"
## [1] "2019-06-27 00:00:00 MSFT 10 @ 133.694801331394"
## [1] "2019-07-09 00:00:00 MSFT 10 @ 135.548629168169"
## [1] "2019-08-23 00:00:00 MSFT -20 @ 137.190002"
## [1] "2019-08-28 00:00:00 MSFT 10 @ 134.880005"
## [1] "2019-09-05 00:00:00 MSFT -10 @ 139.110001"
updatePortf(portfolio.st)
## [1] "RollingLogitStrategy"
updateAcct(account.st)
## [1] "RollingLogitStrategy"
updateEndEq(account.st)
## [1] "RollingLogitStrategy"
chart.Posn(portfolio.st, Symbol = "MSFT")
#TA="add_SMA(n = 10, col = 2); add_SMA(n = 30, col = 4)")
Look at the mktdata
which is where our signals
and predictions
are stored.
mktdata %>%
data.frame() %>%
head(10) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
MSFT.Open | MSFT.High | MSFT.Low | MSFT.Close | MSFT.Volume | MSFT.Adjusted | Y | X1 | X2 | X3 | probs | predictions | longSig | exitlongSig | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2018-06-19 | 97.73225 | 99.05627 | 97.58513 | 98.91896 | 28653100 | 98.91896 | 1 | 99.33186 | 63.01370 | 1.9811302 | 0.6709360 | 1 | NA | NA |
2018-06-20 | 99.41915 | 100.54701 | 99.17396 | 99.90953 | 26180800 | 99.90952 | 1 | 99.20142 | 59.54048 | 0.0686529 | 0.7426882 | 1 | 0 | 0 |
2018-06-21 | 100.11548 | 100.48817 | 98.93857 | 99.19357 | 23198200 | 99.19357 | 0 | 99.14061 | 63.88638 | 0.1961559 | 0.7523711 | 1 | 0 | 0 |
2018-06-22 | 98.47763 | 98.83069 | 97.71263 | 98.47763 | 38923100 | 98.47763 | 1 | 99.16611 | 51.80598 | -1.0297958 | 0.7586200 | 1 | 0 | 0 |
2018-06-25 | 98.07551 | 98.18340 | 95.42748 | 96.49649 | 35433300 | 96.49649 | 0 | 99.04646 | 43.39626 | -2.0399648 | 0.7851164 | 1 | 0 | 0 |
2018-06-26 | 96.91822 | 98.15397 | 96.84957 | 97.17322 | 26897200 | 97.17323 | 1 | 98.78558 | 32.78984 | -2.4420783 | 0.6282847 | 1 | 0 | 0 |
2018-06-27 | 97.66360 | 98.09512 | 95.52555 | 95.66285 | 31298400 | 95.66285 | 0 | 98.56687 | 35.08314 | -2.5009206 | 0.7361390 | 1 | 0 | 0 |
2018-06-28 | 95.50593 | 97.20264 | 95.38824 | 96.73187 | 26650700 | 96.73187 | 1 | 98.24224 | 35.29932 | -3.4424524 | 0.6387378 | 1 | 0 | 0 |
2018-06-29 | 97.02610 | 97.98725 | 96.43765 | 96.71226 | 28053200 | 96.71226 | 0 | 97.96861 | 37.17949 | -2.6284247 | 0.6327528 | 1 | 0 | 0 |
2018-07-02 | 96.21207 | 98.13435 | 96.11400 | 98.08532 | 19564500 | 98.08531 | 1 | 97.81954 | 39.04847 | -2.1968885 | 0.5659825 | 0 | 0 | 0 |
So if we trained a simple logistic model on 100 days of data to give us the next days prediction everyday since June 2018 and only invest if the logistic models prediction is greater than 0.60 and only sell if the logistic models prediction is less than 0.40 then out cumulative P&L would be $334 with a max drawdown of $61.
I have probably (almost certainly) gone wrong at somepoint using the quantstrat
package so the results will probably not generalise well to other assets, especially using a logistic model with 3 regressors! But it was a fun exercise using the quantstrat
package.
I will revist and modify this markdown file.