# Learning the Quantstrat and Blotter packages

A small tutorial (mostly for myself) to understand the functionality of blotter and quantstrat

Load packages:

```
library(knitr)
library(kableExtra)
library(dplyr)
library(ggplot2)
library(quantstrat)
```

**::: Note :::** This post is mostly for my future reference/documentation for learning the `quantstrat`

package. An example of a strategy I developed can be found below in which it uses a naive rolling logistic regression model trained on `t`

days to predict `t+1`

market movement. **::: END Note :::**

I have been playing around with backtesting trading models using the `quantstrat`

package for a while but the most difficult thing about it was understanding the syntax of `blotter`

and `quantstrat`

, it didn’t seem very intuitive to me at first and there does not seem to be *much* detailed information online, despite the package being around since 2010. In this post I will give my comments and observations what certain functions do, I will update this post over time.

### Backgorund

The `quantstrat`

package is built on the `blotter`

package which was developed in 2008. It works best with time series `xts`

objects which can be easily collected using the `quantmod`

package. The `blotter`

package is the accounting package behind the `quantstrat`

system, it can support multiple accounts or multiple portfolios and computes the P&L of trading systems.

The main blotter functions are the following:

### Initialisation

`initPortf`

- which initialises a portfolio`initAcct`

- which initialises an account

### Performance

`addTxn`

- which adds transactions to the portfolio`updatePortf`

- which computes the P&L for each symbol in a given period`updateAcct`

- which computes the equity from the assets`updateEndEq`

- which updates the final equity for an account`getEndEq`

- which provides us with the latest value of our account`getPosQty`

- which gets a position at a given date.`chart.Posn`

- which plots a chart of the position size, and cumulative P&L`PortfReturns`

- which calcualtes the portfolio returns`getAccount`

- which gets our account info!`getPortfolio`

- ""`getTxns`

- ""`tradeStats`

- which collects trade statistics`perTradeStats`

- which calculates per trade statistics

The `blotter`

package loads in a number of additional packages as we can see by running the below.

The packages are: `xts`

and `zoo`

for time series, `FinancialInstrument`

, `quantmod`

, `TTR`

, `PerformanceAnalytics`

for finance, where `TTR`

stands for Technical Trading Rules.

The blotter package creates a new environment `.blotter`

.

### Download Financial Data

Lets first download some data for the S&P500 from 2018 to 2019 in order to gather some data. For the `quantstrat`

package its quite usual to find the initiation date, start date, end date set outside the parameters of the model. Therefore I first set these parameters as `initDate`

, `startDate`

and `endDate`

.

```
initDate <- '2010-01-01' # this is used for later but is must be before the startDate
startDate <- '2018-01-01'
endDate <- '2019-01-01'
symbols <- c('SPY')
getSymbols(symbols, from = startDate, to = endDate, src = "yahoo", adjust=TRUE)
```

`## [1] "SPY"`

SPY.Open | SPY.High | SPY.Low | SPY.Close | SPY.Volume | SPY.Adjusted | |
---|---|---|---|---|---|---|

2018-01-02 | 262.8473 | 263.7992 | 262.4155 | 263.7599 | 86655700 | 260.1310 |

2018-01-03 | 263.9464 | 265.5951 | 263.9464 | 265.4283 | 90070400 | 261.7763 |

2018-01-04 | 266.1447 | 267.0868 | 265.4970 | 266.5470 | 80636400 | 262.8796 |

2018-01-05 | 267.4302 | 268.4606 | 266.8807 | 268.3233 | 83524000 | 264.6314 |

2018-01-08 | 268.2153 | 268.9906 | 267.8915 | 268.8140 | 57319200 | 265.1154 |

2018-01-09 | 269.2850 | 270.1191 | 268.9709 | 269.4224 | 57254000 | 265.7154 |

### Plot the data

`chartSeries(SPY, name = "Daily time series for SPY", type = "candlesticks", theme = chartTheme("white"))`

Once we have the data and after loading the `blotter`

package we must define a few initialisation parameters, namely the `currency()`

and `stocks`

parameter from the `FinancialInstrument`

package.

`currency("USD")`

`## [1] "USD"`

`stock("SPY", currency = "USD", multiplier = 1)`

`## [1] "SPY"`

`ls(all = TRUE)`

```
## [1] ".blotter" ".getSymbols" ".Random.seed" ".strategy"
## [5] "endDate" "initDate" "SPY" "startDate"
## [9] "symbols"
```

`ls(envir = FinancialInstrument:::.instrument)`

`## [1] "SPY" "USD"`

We can see that we have the SPY index and the USD currency set. We can convert the data from daily time series to monthly time series using the `to.period`

fucntion.

```
SPY_monthly <- to.period(SPY, period = "months")
chartSeries(SPY_monthly, name = "Monthly time series for SPY", type = "candlesticks", theme = chartTheme("white"))
```

`head(SPY_monthly)`

```
## SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
## 2018-01-31 262.8473 281.2870 262.4155 276.6452 1985506700 272.8389
## 2018-02-28 275.8307 277.7836 248.2054 266.5862 2923722000 262.9184
## 2018-03-29 266.3507 275.1830 254.0372 259.2790 2323561800 255.7117
## 2018-04-30 258.6878 267.3091 250.9237 260.6190 1998466500 257.0332
## 2018-05-31 259.9884 270.2157 255.2393 266.9544 1606397200 263.2814
## 2018-06-29 268.4028 275.3688 265.7283 268.4896 1599001000 264.7955
```

### Portfolio parameters

Now that we have our data we should initialise the portfolio with `initPortf`

, which will consist of the transactions over the period of analysis.

`name`

- “MyFirstPortfolio” - the initial name of the portfolio`symbols`

- “SPY” - since we are using the SPY500`initPosQty`

- 100 - the initial quantity of our position`initDate`

- “initDate” - the initial account equity and position (prior to the closing price of our first position)`currency`

the currency we are using

```
initDate <- '2010-01-01' # NOTE: We already created this parameter --> which was quoted as "this is used for later but is must be before the startDate"
initPortf("MyFirstPortfolio", "SPY", initDate = initDate)
```

`## [1] "MyFirstPortfolio"`

### Account parameters

We also need to initialise the account using `initAcct`

`name`

- “MyFirstPortfolio” - the initial name of the portfolio`portfolios`

- the name of our previous portfolio created`initDate`

- “initDate” - the initial account equity and position (prior to the closing price of our first position)`initEq`

- the initial equity we began with`currency`

the currency we are using

```
initEq = 1000000
initAcct("MyFirstPortfolio", portfolios = "MyFirstPortfolio", initDate = initDate, initEQ = initEq)
```

`## [1] "MyFirstPortfolio"`

`first(SPY) # print the first observation in our data`

```
## SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
## 2018-01-02 262.8473 263.7992 262.4155 263.7599 86655700 260.131
```

`last(SPY) # print the last observation in our data`

```
## SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
## 2018-12-31 249.56 250.19 247.47 249.92 144299400 246.4814
```

# A simple Strategy

A trading strategy can be broken down into the following blocks:

*Assets*- which are the stocks we want to trade,`SPY`

,`MSFT`

,`AAPL`

,`MSFT`

etc.*Indicators*- which are the variables, Open, High, Low, Close, SMA, EMA, RSI, Momentum etc.*Signals*- we can create signals based on the interaction between the*indicators*and the time-series data.*Rules*- Create*buy*and*sell*rules based on the*signals*created.*Orders*- Once a*rule*is activated, push an*order*through.*Analysis*- Analyse the performance of the strategy.

I wanted to test a basic backtesting concept. What happens if I trained a machine learning model each day on the last 100 days of data to predict the next days stock market direction but test this over many periods?. That is to continuously train a machine learning model at every step to predict the next price. One method is to use the `rolling_origin`

fucntion from the `rsample`

package but I write a more simple function for this.

I load in the parameters of the model and download data for MSFT.

```
library(quantstrat)
library(PerformanceAnalytics)
library(e1071)
initDate = "2010-01-01"
from <- "2018-01-01"
to <- "2019-09-20"
init_equity <- 1000
adjustment <- TRUE
.orderqty <- 10 # The profitability of the strategy depends heavily on this value
.txnfees <- -10
.stoploss <- 3e-3 # 0.003 or 0.3%
currency('USD')
```

`## [1] "USD"`

```
Sys.setenv(TZ="UTC")
symbols <- c('MSFT')
getSymbols(symbols, from = from, to = to, src = "yahoo", adjust = TRUE)
```

`## [1] "MSFT"`

I created a simple time series pre-processing function to clean up the data, create some features and set the data to `xts`

. Ignore the `Scale_Me`

function.

```
Scale_Me <- function(x){
(x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}
TS_preprocess <- function(dat){
dat = data.frame(dat)
colnames(dat) = c("open", "high", "low", "close", "volume", "adjusted")
dat$Y = with(dat, ifelse(close >= open, 1, 0))
dat$X1 = SMA(lag(dat$close), n = 10)
dat$X2 = RSI(lag(dat$close), nFast = 14, nSlow = 26, nSig = 9, maType = SMA)
dat$X3 = momentum(lag(dat$close), n = 12)
dat = dat[complete.cases(dat), ]
#dat = cbind(dat[, 'Y'], apply(dat[, 8:ncol(dat)], 2, Scale_Me))
#colnames(dat)[1] = "Y"
dat = dat[, c("Y", "X1", "X2", "X3")]
dat = as.xts(dat)
return(dat)
}
```

I next set the training period `n_train`

= 100 periods and `n_test`

= 1 period, or train on `t`

days of data and test of `t+1`

days of data. This can be quite computationally expensive depending on the machine learning model you input but for a simple binary logistic regression classifier its relatively fast. I create the logistic function which returns the predicted probabilities.

The `RollingBacktest`

function, runs the model on `n_train`

periods of data and makes a prediction on `n_test`

. That is, say we have 1000 days of data, the model will train on the first 100 days and predict on day 101, then retrain on days 2 to 101 days and predict on day 102 and so on, continuing until all 1000 days have passed.

```
df <- TS_preprocess(MSFT)
n_train <- 100
n_test <- 1
LogistFun <- function(frm, dat, trainIndex, testIndex){
LogitModel <- glm(frm, data = dat[trainIndex, ])
pred <- predict(LogitModel, newdata = dat[testIndex, ], type = 'response')
return(pred)
}
RollingBacktest <- function(dat, ntrain = n_train, ntest = n_test){
stopifnot('Y' %in% names(dat))
frm_ <- formula(reformulate(paste0("X", seq(2:ncol(dat))), "Y"))
stride <- ntrain + ntest
startPosn <- seq(1, dim(dat)[1] - stride)
train_index_list <- lapply(startPosn, function(i) seq(i, i + ntrain))
test_index_list <- lapply(startPosn, function(i) seq((i + ntrain + 1), (i + ntrain + ntest)))
mapply(LogistFun, trainIndex = train_index_list, testIndex = test_index_list, MoreArgs = list(frm = frm_, dat = dat), SIMPLIFY = FALSE
)
}
```

Now that I have the *pre-process function*, the *logistic function* and the *backtest function*, we can run the model through the `TS_postprocess`

function, which applies everything.

```
TS_postprocess <- function(dat, ntrain){
results = tail(dat, -(ntrain + 1))
results$probs <- RollingBacktest(dat)
results$predictions <- ifelse(results$probs > 0.6, 1, 0)
print(paste0("Model Accuracy at the 0.60 prob cut-off ", mean(results$Y == results$predictions)))
return(results)
}
out <- TS_postprocess(df, ntrain = n_train)
```

`## [1] "Model Accuracy at the 0.60 prob cut-off 0.518987341772152"`

`out <- na.omit(cbind(MSFT, out))`

The model is not very accurate…

The `na.omit`

removes the `NA`

values which were created from the `SMA`

, `RSI`

and `momentum`

calculations which are the first few observations of the time series data.

The data looks like:

```
out %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```

MSFT.Open | MSFT.High | MSFT.Low | MSFT.Close | MSFT.Volume | MSFT.Adjusted | Y | X1 | X2 | X3 | probs | predictions |
---|---|---|---|---|---|---|---|---|---|---|---|

97.73225 | 99.05627 | 97.58513 | 98.91896 | 28653100 | 98.91896 | 1 | 99.33186 | 63.01370 | 1.98113024 | 0.6709360 | 1 |

99.41915 | 100.54701 | 99.17396 | 99.90953 | 26180800 | 99.90952 | 1 | 99.20142 | 59.54048 | 0.06865286 | 0.7426882 | 1 |

100.11548 | 100.48817 | 98.93857 | 99.19357 | 23198200 | 99.19357 | 0 | 99.14061 | 63.88638 | 0.19615593 | 0.7523711 | 1 |

98.47763 | 98.83069 | 97.71263 | 98.47763 | 38923100 | 98.47763 | 1 | 99.16611 | 51.80598 | -1.02979581 | 0.7586200 | 1 |

98.07551 | 98.18340 | 95.42748 | 96.49649 | 35433300 | 96.49649 | 0 | 99.04646 | 43.39626 | -2.03996476 | 0.7851164 | 1 |

96.91822 | 98.15397 | 96.84957 | 97.17322 | 26897200 | 97.17323 | 1 | 98.78558 | 32.78984 | -2.44207828 | 0.6282847 | 1 |

We can see that the model outputs predicted probabilities, I simply set the `predictions`

column to give a `1`

if the models predicted probability is `> 0.5`

or a `0`

if it is `< 0.5`

. The `Y`

variable is the observed and the `X1`

, `X2`

and `X3`

variables are the `SMA`

, `RSI`

and `momentum`

.

We can now use the `quantstrat`

package to backtest the model and see how the performance went.

```
MSFT <- out
stock("MSFT", currency = "USD", multiplier = 1)
```

`## [1] "MSFT"`

```
strategy.st <- portfolio.st <- account.st <- "RollingLogitStrategy"
rm.strat(strategy.st)
rm.strat(portfolio.st)
rm.strat(account.st)
initPortf(name = portfolio.st,
symbols = symbols,
initDate = initDate,
currency = 'USD')
```

`## [1] "RollingLogitStrategy"`

```
initAcct(name = account.st,
portfolios = portfolio.st,
initDate = initDate,
currency = 'USD',
initEq = init_equity)
```

`## [1] "RollingLogitStrategy"`

```
initOrders(portfolio.st,
symbols = symbols,
initDate = initDate)
strategy(strategy.st, store = TRUE)
```

I add the signals to the strategy and give it some rules.

- 1a)

The signal is that the first time the Logistic model produces a probability greater than `0.6`

then assign the signal. The `sigThreshold`

is a `quantstrat`

function, the others are `sigComparison`

, `sigCrossover`

and `sigFormula`

. It calls upon the `threshold`

value in the `list`

of `arguments`

, the column it looks for is the `probs`

column which the predicted probabilities are output to, `gt`

means greater than. It basically creates a new column called `logSig`

and it would be similar to `ifelse(df$probs > 0.6, 1, 0)`

as far as I understand.

- 1b)
The rule for the strategy is to take
`sigcol`

which is the`label`

we gave our signal in the previous lines which is called`longSig`

. If`longSig = 1`

, execute a market order going long, buying at the next days open price, the transaction fees were set at the start of the strategy. We call this`rule`

,`EnterLONG`

.

```
nMult_orderqty <- 2
addPosLimit(portfolio.st, symbol = "MSFT", timestamp = initDate, maxpos = nMult_orderqty * .orderqty)
# Objective: Buy when the probability is gt 0.60, using cross = TRUE
# 1.a)
add.signal(strategy = strategy.st,
name = "sigThreshold",
arguments = list(threshold = 0.6,
column = "probs",
relationship = "gt",
cross = TRUE),
label = "longSig")
```

`## [1] "RollingLogitStrategy"`

```
# 1.b)
# # Adding the rules, enter at the open price when prob > 0.60 for the first time, taking transaction fees into account
add.rule(strategy = strategy.st,
name = "ruleSignal",
arguments = list(sigcol = "longSig", # check the ifelse predictions statement
sigval = 1,
orderqty = .orderqty,
ordertype = "market",
orderside = "long",
osFUN = osMaxPos,
prefer = "Open",
replace = TRUE,
TxnFees = .txnfees),
type = "enter",
label = "EnterLONG")
```

`## [1] "RollingLogitStrategy"`

From the Logistic model we have added our signal to buy when the probability of the next days price is greater than `0.60`

. So far we just keep buying when the probability of the logistic model passes over `0.60`

but we have no position to sell when the model predicts something different.

I thought it might be interesting to exit the strategy when the model is undecided or makes a very low predicted probability. I set this threshold to be less than 0.4 based on the probability density plot below. So we are making trades at the tail ends of the distribution, buying when the model is *confident* and selling when it is *unsure*.

`plot(density(out$probs))`

Add the signals and rules for exiting the strategy. Using a similar principle as before, I create a `signal`

when the probability is less than 0.4 and call it `exitlongSig`

.

- Careful here when setting
`type = "exit"`

and`orderside = "long"`

. Previously had it set to`type = exit`

and`orderside = short`

! which is completely wrong.

```
# 2.a) # create the signal of when we should be looking to exit
# #exit when prob drops below 0.4 for the first time
add.signal(strategy = strategy.st,
name = "sigThreshold",
arguments = list(threshold = 0.4,
column = "probs",
relationship = "lt",
cross = TRUE),
label = "exitlongSig")
```

`## [1] "RollingLogitStrategy"`

```
# 2.b) # Add that signal to the rule of exiting
add.rule(strategy.st,
name = "ruleSignal",
arguments = list(sigcol = "exitlongSig",
sigval = 1,
orderqty = "all",
ordertype = "market",
orderside = "long",
osFUN = osMaxPos,
prefer = "Open",
replace = TRUE,
TxnFees = .txnfees),
type = "exit",
label = "ExitLong")
```

`## [1] "RollingLogitStrategy"`

We can finally apply the strategy and see how it performs. It doesn’t make many transactions due to the backtesting being quite restrictive.

`applyStrategy(strategy.st, portfolios = portfolio.st)`

```
## [1] "2018-07-05 00:00:00 MSFT 10 @ 97.5851340312762"
## [1] "2018-07-11 00:00:00 MSFT 10 @ 99.2033819340086"
## [1] "2018-12-06 00:00:00 MSFT -20 @ 104.632969498356"
## [1] "2018-12-19 00:00:00 MSFT 10 @ 102.487313341245"
## [1] "2019-01-09 00:00:00 MSFT 10 @ 102.694956688076"
## [1] "2019-02-21 00:00:00 MSFT -20 @ 106.15227613615"
## [1] "2019-04-11 00:00:00 MSFT 10 @ 119.69686840234"
## [1] "2019-04-24 00:00:00 MSFT 10 @ 124.91014659961"
## [1] "2019-06-13 00:00:00 MSFT -20 @ 131.541967172209"
## [1] "2019-06-27 00:00:00 MSFT 10 @ 133.694801331394"
## [1] "2019-07-09 00:00:00 MSFT 10 @ 135.548629168169"
## [1] "2019-08-23 00:00:00 MSFT -20 @ 137.190002"
## [1] "2019-08-28 00:00:00 MSFT 10 @ 134.880005"
## [1] "2019-09-05 00:00:00 MSFT -10 @ 139.110001"
```

`updatePortf(portfolio.st)`

`## [1] "RollingLogitStrategy"`

`updateAcct(account.st)`

`## [1] "RollingLogitStrategy"`

`updateEndEq(account.st)`

`## [1] "RollingLogitStrategy"`

`chart.Posn(portfolio.st, Symbol = "MSFT")`

` #TA="add_SMA(n = 10, col = 2); add_SMA(n = 30, col = 4)")`

Look at the `mktdata`

which is where our `signals`

and `predictions`

are stored.

```
mktdata %>%
data.frame() %>%
head(10) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
```

MSFT.Open | MSFT.High | MSFT.Low | MSFT.Close | MSFT.Volume | MSFT.Adjusted | Y | X1 | X2 | X3 | probs | predictions | longSig | exitlongSig | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

2018-06-19 | 97.73225 | 99.05627 | 97.58513 | 98.91896 | 28653100 | 98.91896 | 1 | 99.33186 | 63.01370 | 1.9811302 | 0.6709360 | 1 | NA | NA |

2018-06-20 | 99.41915 | 100.54701 | 99.17396 | 99.90953 | 26180800 | 99.90952 | 1 | 99.20142 | 59.54048 | 0.0686529 | 0.7426882 | 1 | 0 | 0 |

2018-06-21 | 100.11548 | 100.48817 | 98.93857 | 99.19357 | 23198200 | 99.19357 | 0 | 99.14061 | 63.88638 | 0.1961559 | 0.7523711 | 1 | 0 | 0 |

2018-06-22 | 98.47763 | 98.83069 | 97.71263 | 98.47763 | 38923100 | 98.47763 | 1 | 99.16611 | 51.80598 | -1.0297958 | 0.7586200 | 1 | 0 | 0 |

2018-06-25 | 98.07551 | 98.18340 | 95.42748 | 96.49649 | 35433300 | 96.49649 | 0 | 99.04646 | 43.39626 | -2.0399648 | 0.7851164 | 1 | 0 | 0 |

2018-06-26 | 96.91822 | 98.15397 | 96.84957 | 97.17322 | 26897200 | 97.17323 | 1 | 98.78558 | 32.78984 | -2.4420783 | 0.6282847 | 1 | 0 | 0 |

2018-06-27 | 97.66360 | 98.09512 | 95.52555 | 95.66285 | 31298400 | 95.66285 | 0 | 98.56687 | 35.08314 | -2.5009206 | 0.7361390 | 1 | 0 | 0 |

2018-06-28 | 95.50593 | 97.20264 | 95.38824 | 96.73187 | 26650700 | 96.73187 | 1 | 98.24224 | 35.29932 | -3.4424524 | 0.6387378 | 1 | 0 | 0 |

2018-06-29 | 97.02610 | 97.98725 | 96.43765 | 96.71226 | 28053200 | 96.71226 | 0 | 97.96861 | 37.17949 | -2.6284247 | 0.6327528 | 1 | 0 | 0 |

2018-07-02 | 96.21207 | 98.13435 | 96.11400 | 98.08532 | 19564500 | 98.08531 | 1 | 97.81954 | 39.04847 | -2.1968885 | 0.5659825 | 0 | 0 | 0 |

So if we trained a simple logistic model on 100 days of data to give us the next days prediction everyday since June 2018 and only invest if the logistic models prediction is greater than 0.60 and only sell if the logistic models prediction is less than 0.40 then out cumulative P&L would be $334 with a max drawdown of $61.

I have probably (almost certainly) gone wrong at somepoint using the `quantstrat`

package so the results will probably not generalise well to other assets, especially using a logistic model with 3 regressors! But it was a fun exercise using the `quantstrat`

package.

I will revist and modify this markdown file.