Learning the Quantstrat and Blotter packages

A small tutorial (mostly for myself) to understand the functionality of blotter and quantstrat

Load packages:

library(knitr)
library(kableExtra)
library(dplyr)
library(ggplot2)
library(quantstrat)

::: Note ::: This post is mostly for my future reference/documentation for learning the quantstrat package. An example of a strategy I developed can be found below in which it uses a naive rolling logistic regression model trained on t days to predict t+1 market movement. ::: END Note :::

I have been playing around with backtesting trading models using the quantstrat package for a while but the most difficult thing about it was understanding the syntax of blotter and quantstrat, it didn’t seem very intuitive to me at first and there does not seem to be much detailed information online, despite the package being around since 2010. In this post I will give my comments and observations what certain functions do, I will update this post over time.

Backgorund

The quantstrat package is built on the blotter package which was developed in 2008. It works best with time series xts objects which can be easily collected using the quantmod package. The blotter package is the accounting package behind the quantstrat system, it can support multiple accounts or multiple portfolios and computes the P&L of trading systems.

The main blotter functions are the following:

Initialisation

  • initPortf - which initialises a portfolio
  • initAcct - which initialises an account

Performance

  • addTxn - which adds transactions to the portfolio
  • updatePortf - which computes the P&L for each symbol in a given period
  • updateAcct - which computes the equity from the assets
  • updateEndEq - which updates the final equity for an account
  • getEndEq - which provides us with the latest value of our account
  • getPosQty - which gets a position at a given date.
  • chart.Posn- which plots a chart of the position size, and cumulative P&L
  • PortfReturns- which calcualtes the portfolio returns
  • getAccount - which gets our account info!
  • getPortfolio - ""
  • getTxns - ""
  • tradeStats - which collects trade statistics
  • perTradeStats - which calculates per trade statistics

The blotter package loads in a number of additional packages as we can see by running the below.

The packages are: xts and zoo for time series, FinancialInstrument, quantmod, TTR, PerformanceAnalytics for finance, where TTR stands for Technical Trading Rules.

The blotter package creates a new environment .blotter.

Download Financial Data

Lets first download some data for the S&P500 from 2018 to 2019 in order to gather some data. For the quantstrat package its quite usual to find the initiation date, start date, end date set outside the parameters of the model. Therefore I first set these parameters as initDate, startDate and endDate.

initDate <- '2010-01-01' # this is used for later but is must be before the startDate
startDate <- '2018-01-01'
endDate <- '2019-01-01'
symbols <- c('SPY')
getSymbols(symbols, from = startDate, to = endDate, src = "yahoo", adjust=TRUE) 
## [1] "SPY"
Table 1: SPY stocks
SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
2018-01-02 262.8473 263.7992 262.4155 263.7599 86655700 260.1310
2018-01-03 263.9464 265.5951 263.9464 265.4283 90070400 261.7763
2018-01-04 266.1447 267.0868 265.4970 266.5470 80636400 262.8796
2018-01-05 267.4302 268.4606 266.8807 268.3233 83524000 264.6314
2018-01-08 268.2153 268.9906 267.8915 268.8140 57319200 265.1154
2018-01-09 269.2850 270.1191 268.9709 269.4224 57254000 265.7154

Plot the data

chartSeries(SPY, name = "Daily time series for SPY", type = "candlesticks", theme = chartTheme("white"))

Once we have the data and after loading the blotter package we must define a few initialisation parameters, namely the currency() and stocks parameter from the FinancialInstrument package.

currency("USD")
## [1] "USD"
stock("SPY", currency = "USD", multiplier = 1)
## [1] "SPY"
ls(all = TRUE)
## [1] ".blotter"     ".getSymbols"  ".Random.seed" ".strategy"   
## [5] "endDate"      "initDate"     "SPY"          "startDate"   
## [9] "symbols"
ls(envir = FinancialInstrument:::.instrument)
## [1] "SPY" "USD"

We can see that we have the SPY index and the USD currency set. We can convert the data from daily time series to monthly time series using the to.period fucntion.

SPY_monthly <- to.period(SPY, period = "months")
chartSeries(SPY_monthly, name = "Monthly time series for SPY", type = "candlesticks", theme = chartTheme("white"))

head(SPY_monthly)
##            SPY.Open SPY.High  SPY.Low SPY.Close SPY.Volume SPY.Adjusted
## 2018-01-31 262.8473 281.2870 262.4155  276.6452 1985506700     272.8389
## 2018-02-28 275.8307 277.7836 248.2054  266.5862 2923722000     262.9184
## 2018-03-29 266.3507 275.1830 254.0372  259.2790 2323561800     255.7117
## 2018-04-30 258.6878 267.3091 250.9237  260.6190 1998466500     257.0332
## 2018-05-31 259.9884 270.2157 255.2393  266.9544 1606397200     263.2814
## 2018-06-29 268.4028 275.3688 265.7283  268.4896 1599001000     264.7955

Portfolio parameters

Now that we have our data we should initialise the portfolio with initPortf, which will consist of the transactions over the period of analysis.

  • name - “MyFirstPortfolio” - the initial name of the portfolio
  • symbols - “SPY” - since we are using the SPY500
  • initPosQty - 100 - the initial quantity of our position
  • initDate - “initDate” - the initial account equity and position (prior to the closing price of our first position)
  • currency the currency we are using
initDate <- '2010-01-01' # NOTE: We already created this parameter --> which was quoted as "this is used for later but is must be before the startDate"
initPortf("MyFirstPortfolio", "SPY", initDate = initDate)
## [1] "MyFirstPortfolio"

Account parameters

We also need to initialise the account using initAcct

  • name - “MyFirstPortfolio” - the initial name of the portfolio
  • portfolios - the name of our previous portfolio created
  • initDate - “initDate” - the initial account equity and position (prior to the closing price of our first position)
  • initEq - the initial equity we began with
  • currency the currency we are using
initEq = 1000000
initAcct("MyFirstPortfolio", portfolios = "MyFirstPortfolio", initDate = initDate, initEQ = initEq)
## [1] "MyFirstPortfolio"
first(SPY) # print the first observation in our data
##            SPY.Open SPY.High  SPY.Low SPY.Close SPY.Volume SPY.Adjusted
## 2018-01-02 262.8473 263.7992 262.4155  263.7599   86655700      260.131
last(SPY) # print the last observation in our data
##            SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
## 2018-12-31   249.56   250.19  247.47    249.92  144299400     246.4814

A simple Strategy

A trading strategy can be broken down into the following blocks:

  • Assets - which are the stocks we want to trade, SPY, MSFT, AAPL, MSFT etc.
  • Indicators - which are the variables, Open, High, Low, Close, SMA, EMA, RSI, Momentum etc.
  • Signals - we can create signals based on the interaction between the indicators and the time-series data.
  • Rules - Create buy and sell rules based on the signals created.
  • Orders - Once a rule is activated, push an order through.
  • Analysis - Analyse the performance of the strategy.

I wanted to test a basic backtesting concept. What happens if I trained a machine learning model each day on the last 100 days of data to predict the next days stock market direction but test this over many periods?. That is to continuously train a machine learning model at every step to predict the next price. One method is to use the rolling_origin fucntion from the rsample package but I write a more simple function for this.

I load in the parameters of the model and download data for MSFT.

library(quantstrat)
library(PerformanceAnalytics)
library(e1071)

initDate = "2010-01-01"
from <- "2018-01-01"
to <- "2019-09-20"

init_equity <- 1000
adjustment <- TRUE

.orderqty <- 10                  # The profitability of the strategy depends heavily on this value
.txnfees <- -10
.stoploss <- 3e-3 # 0.003 or 0.3%

currency('USD')
## [1] "USD"
Sys.setenv(TZ="UTC")

symbols <- c('MSFT')

getSymbols(symbols, from = from, to = to, src = "yahoo", adjust = TRUE) 
## [1] "MSFT"

I created a simple time series pre-processing function to clean up the data, create some features and set the data to xts. Ignore the Scale_Me function.

Scale_Me <- function(x){
  (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}


TS_preprocess <- function(dat){
  dat = data.frame(dat)
  colnames(dat) = c("open", "high", "low", "close", "volume", "adjusted")
  dat$Y = with(dat, ifelse(close >= open, 1, 0))
  dat$X1 = SMA(lag(dat$close), n = 10)
  dat$X2 = RSI(lag(dat$close), nFast = 14, nSlow = 26, nSig = 9, maType = SMA)
  dat$X3 = momentum(lag(dat$close), n = 12)
  dat = dat[complete.cases(dat), ]
  #dat = cbind(dat[, 'Y'], apply(dat[, 8:ncol(dat)], 2, Scale_Me))
  #colnames(dat)[1] = "Y" 
  dat = dat[, c("Y", "X1", "X2", "X3")]
  dat = as.xts(dat)
  return(dat)
}

I next set the training period n_train = 100 periods and n_test = 1 period, or train on t days of data and test of t+1 days of data. This can be quite computationally expensive depending on the machine learning model you input but for a simple binary logistic regression classifier its relatively fast. I create the logistic function which returns the predicted probabilities.

The RollingBacktest function, runs the model on n_train periods of data and makes a prediction on n_test. That is, say we have 1000 days of data, the model will train on the first 100 days and predict on day 101, then retrain on days 2 to 101 days and predict on day 102 and so on, continuing until all 1000 days have passed.

df <- TS_preprocess(MSFT)

n_train <- 100
n_test <- 1

LogistFun <- function(frm, dat, trainIndex, testIndex){
  LogitModel <- glm(frm, data = dat[trainIndex, ])
  pred <- predict(LogitModel, newdata = dat[testIndex, ], type = 'response')
  return(pred)
}

RollingBacktest <- function(dat, ntrain = n_train, ntest = n_test){
  stopifnot('Y' %in% names(dat))
  frm_ <- formula(reformulate(paste0("X", seq(2:ncol(dat))), "Y"))
  
  stride <- ntrain + ntest
  startPosn <- seq(1, dim(dat)[1] - stride)
  
  train_index_list <- lapply(startPosn, function(i) seq(i, i + ntrain))
  test_index_list <- lapply(startPosn, function(i) seq((i + ntrain + 1), (i + ntrain + ntest))) 
  
  mapply(LogistFun, trainIndex = train_index_list, testIndex = test_index_list, MoreArgs = list(frm = frm_, dat = dat), SIMPLIFY = FALSE
  )
}

Now that I have the pre-process function, the logistic function and the backtest function, we can run the model through the TS_postprocess function, which applies everything.

TS_postprocess <- function(dat, ntrain){
  results = tail(dat, -(ntrain + 1))
  results$probs <- RollingBacktest(dat)
  results$predictions <- ifelse(results$probs > 0.6, 1, 0)
  print(paste0("Model Accuracy at the 0.60 prob cut-off ", mean(results$Y == results$predictions)))
  return(results)
}

out <- TS_postprocess(df, ntrain = n_train)
## [1] "Model Accuracy at the 0.60 prob cut-off 0.518987341772152"
out <- na.omit(cbind(MSFT, out))

The model is not very accurate…

The na.omit removes the NA values which were created from the SMA, RSI and momentum calculations which are the first few observations of the time series data.

The data looks like:

out %>%
  head() %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted Y X1 X2 X3 probs predictions
97.73225 99.05627 97.58513 98.91896 28653100 98.91896 1 99.33186 63.01370 1.98113024 0.6709360 1
99.41915 100.54701 99.17396 99.90953 26180800 99.90952 1 99.20142 59.54048 0.06865286 0.7426882 1
100.11548 100.48817 98.93857 99.19357 23198200 99.19357 0 99.14061 63.88638 0.19615593 0.7523711 1
98.47763 98.83069 97.71263 98.47763 38923100 98.47763 1 99.16611 51.80598 -1.02979581 0.7586200 1
98.07551 98.18340 95.42748 96.49649 35433300 96.49649 0 99.04646 43.39626 -2.03996476 0.7851164 1
96.91822 98.15397 96.84957 97.17322 26897200 97.17323 1 98.78558 32.78984 -2.44207828 0.6282847 1

We can see that the model outputs predicted probabilities, I simply set the predictions column to give a 1 if the models predicted probability is > 0.5 or a 0 if it is < 0.5. The Y variable is the observed and the X1, X2 and X3 variables are the SMA, RSI and momentum.

We can now use the quantstrat package to backtest the model and see how the performance went.

MSFT <- out

stock("MSFT", currency = "USD", multiplier = 1)
## [1] "MSFT"
strategy.st <- portfolio.st <- account.st <- "RollingLogitStrategy"

rm.strat(strategy.st)
rm.strat(portfolio.st)
rm.strat(account.st)


initPortf(name = portfolio.st,
          symbols = symbols,
          initDate = initDate,
          currency = 'USD')
## [1] "RollingLogitStrategy"
initAcct(name = account.st,
         portfolios = portfolio.st,
         initDate = initDate,
         currency = 'USD',
         initEq = init_equity)
## [1] "RollingLogitStrategy"
initOrders(portfolio.st,
           symbols = symbols,
           initDate = initDate)

strategy(strategy.st, store = TRUE)

I add the signals to the strategy and give it some rules.

  • 1a)

The signal is that the first time the Logistic model produces a probability greater than 0.6 then assign the signal. The sigThreshold is a quantstrat function, the others are sigComparison, sigCrossover and sigFormula. It calls upon the threshold value in the list of arguments, the column it looks for is the probs column which the predicted probabilities are output to, gt means greater than. It basically creates a new column called logSig and it would be similar to ifelse(df$probs > 0.6, 1, 0) as far as I understand.

  • 1b) The rule for the strategy is to take sigcol which is the label we gave our signal in the previous lines which is called longSig. If longSig = 1, execute a market order going long, buying at the next days open price, the transaction fees were set at the start of the strategy. We call this rule, EnterLONG.
nMult_orderqty <- 2
addPosLimit(portfolio.st, symbol = "MSFT", timestamp = initDate, maxpos = nMult_orderqty * .orderqty)

# Objective: Buy when the probability is gt 0.60, using cross = TRUE

# 1.a)
add.signal(strategy = strategy.st,
           name = "sigThreshold",
           arguments = list(threshold = 0.6,
                            column = "probs",
                            relationship = "gt",
                            cross = TRUE),
           label = "longSig")
## [1] "RollingLogitStrategy"
# 1.b)
# # Adding the rules, enter at the open price when prob > 0.60 for the first time, taking transaction fees into account
add.rule(strategy = strategy.st,
         name = "ruleSignal",
         arguments = list(sigcol = "longSig",   # check the ifelse predictions statement
                          sigval = 1,
                          orderqty = .orderqty,
                          ordertype = "market",
                          orderside = "long",
                          osFUN = osMaxPos,
                          prefer = "Open",
                          replace = TRUE,
                          TxnFees = .txnfees),
         type = "enter",
         label = "EnterLONG")
## [1] "RollingLogitStrategy"

From the Logistic model we have added our signal to buy when the probability of the next days price is greater than 0.60. So far we just keep buying when the probability of the logistic model passes over 0.60 but we have no position to sell when the model predicts something different.

I thought it might be interesting to exit the strategy when the model is undecided or makes a very low predicted probability. I set this threshold to be less than 0.4 based on the probability density plot below. So we are making trades at the tail ends of the distribution, buying when the model is confident and selling when it is unsure.

plot(density(out$probs))

Add the signals and rules for exiting the strategy. Using a similar principle as before, I create a signal when the probability is less than 0.4 and call it exitlongSig.

  • Careful here when setting type = "exit" and orderside = "long". Previously had it set to type = exit and orderside = short! which is completely wrong.
# 2.a) # create the signal of when we should be looking to exit
# #exit when prob drops below 0.4 for the first time
add.signal(strategy = strategy.st,
           name = "sigThreshold",
           arguments = list(threshold = 0.4,
                            column = "probs",
                            relationship = "lt",
                            cross = TRUE),
           label = "exitlongSig")
## [1] "RollingLogitStrategy"
# 2.b) # Add that signal to the rule of exiting
add.rule(strategy.st,
         name = "ruleSignal",
         arguments = list(sigcol = "exitlongSig",
                          sigval = 1,
                          orderqty = "all",
                          ordertype = "market",
                          orderside = "long",
                          osFUN = osMaxPos,
                          prefer = "Open",
                          replace = TRUE,
                          TxnFees = .txnfees),
         type = "exit",
         label = "ExitLong")
## [1] "RollingLogitStrategy"

We can finally apply the strategy and see how it performs. It doesn’t make many transactions due to the backtesting being quite restrictive.

applyStrategy(strategy.st, portfolios = portfolio.st)
## [1] "2018-07-05 00:00:00 MSFT 10 @ 97.5851340312762"
## [1] "2018-07-11 00:00:00 MSFT 10 @ 99.2033819340086"
## [1] "2018-12-06 00:00:00 MSFT -20 @ 104.632969498356"
## [1] "2018-12-19 00:00:00 MSFT 10 @ 102.487313341245"
## [1] "2019-01-09 00:00:00 MSFT 10 @ 102.694956688076"
## [1] "2019-02-21 00:00:00 MSFT -20 @ 106.15227613615"
## [1] "2019-04-11 00:00:00 MSFT 10 @ 119.69686840234"
## [1] "2019-04-24 00:00:00 MSFT 10 @ 124.91014659961"
## [1] "2019-06-13 00:00:00 MSFT -20 @ 131.541967172209"
## [1] "2019-06-27 00:00:00 MSFT 10 @ 133.694801331394"
## [1] "2019-07-09 00:00:00 MSFT 10 @ 135.548629168169"
## [1] "2019-08-23 00:00:00 MSFT -20 @ 137.190002"
## [1] "2019-08-28 00:00:00 MSFT 10 @ 134.880005"
## [1] "2019-09-05 00:00:00 MSFT -10 @ 139.110001"
updatePortf(portfolio.st)
## [1] "RollingLogitStrategy"
updateAcct(account.st)
## [1] "RollingLogitStrategy"
updateEndEq(account.st)
## [1] "RollingLogitStrategy"
chart.Posn(portfolio.st, Symbol = "MSFT")

           #TA="add_SMA(n = 10, col = 2); add_SMA(n = 30, col = 4)")

Look at the mktdata which is where our signals and predictions are stored.

mktdata %>%
  data.frame() %>%
  head(10) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted Y X1 X2 X3 probs predictions longSig exitlongSig
2018-06-19 97.73225 99.05627 97.58513 98.91896 28653100 98.91896 1 99.33186 63.01370 1.9811302 0.6709360 1 NA NA
2018-06-20 99.41915 100.54701 99.17396 99.90953 26180800 99.90952 1 99.20142 59.54048 0.0686529 0.7426882 1 0 0
2018-06-21 100.11548 100.48817 98.93857 99.19357 23198200 99.19357 0 99.14061 63.88638 0.1961559 0.7523711 1 0 0
2018-06-22 98.47763 98.83069 97.71263 98.47763 38923100 98.47763 1 99.16611 51.80598 -1.0297958 0.7586200 1 0 0
2018-06-25 98.07551 98.18340 95.42748 96.49649 35433300 96.49649 0 99.04646 43.39626 -2.0399648 0.7851164 1 0 0
2018-06-26 96.91822 98.15397 96.84957 97.17322 26897200 97.17323 1 98.78558 32.78984 -2.4420783 0.6282847 1 0 0
2018-06-27 97.66360 98.09512 95.52555 95.66285 31298400 95.66285 0 98.56687 35.08314 -2.5009206 0.7361390 1 0 0
2018-06-28 95.50593 97.20264 95.38824 96.73187 26650700 96.73187 1 98.24224 35.29932 -3.4424524 0.6387378 1 0 0
2018-06-29 97.02610 97.98725 96.43765 96.71226 28053200 96.71226 0 97.96861 37.17949 -2.6284247 0.6327528 1 0 0
2018-07-02 96.21207 98.13435 96.11400 98.08532 19564500 98.08531 1 97.81954 39.04847 -2.1968885 0.5659825 0 0 0

So if we trained a simple logistic model on 100 days of data to give us the next days prediction everyday since June 2018 and only invest if the logistic models prediction is greater than 0.60 and only sell if the logistic models prediction is less than 0.40 then out cumulative P&L would be $334 with a max drawdown of $61.

I have probably (almost certainly) gone wrong at somepoint using the quantstrat package so the results will probably not generalise well to other assets, especially using a logistic model with 3 regressors! But it was a fun exercise using the quantstrat package.

I will revist and modify this markdown file.

Avatar
Matthew Smith
Researcher in Dept Finance

I am a researcher with a focus on Machine Learning methods applied to economics and finance.

Related

comments powered by Disqus