This repository contains a comprehensive dashboard for analyzing and visualizing the performance of various statistical arbitrage strategies over historical data. The dashboard supports strategies such as momentum, mean reversion, and volatility-based methods.
- Data Analysis: Tools for analyzing stock data, including statistical measures and visualizations.
- Machine Learning: Implementations of machine learning models for predictive analysis.
- Backtesting: Framework for backtesting different trading strategies.
- Interactive Dashboard: User-friendly interface for exploring and understanding the performance of strategies.
git clone https://github.com/hassangaber/statarb.git
cd statarb
pip install -r requirements.txt
gunicorn --preload qstrat:server
After installation, you can access the dashboard locally. The dashboard allows you to select different stocks, set parameters for backtesting, and visualize the results of various strategies.
Visit the live site at https://qstrat-e42f91fdc838.herokuapp.com/
In supervised learning, labeling is necessary to train models to predict future changes in returns. This dataset class creates a target label for predicting changes in returns based on a dynamic threshold calculated from rolling volatility. The labels are classified into three categories:
- 1: Change in returns greater than the positive threshold.
- -1: Change in returns less than the negative threshold.
- 0: Change in returns within the threshold range.
The change in returns (
where
In addition to returns, several rolling indicators are calculated to enhance the predictive power of the model:
- Momentum: Calculated as the mean of returns over the horizon:
- Simple Moving Averages (SMA): For different periods to capture trends:
The threshold (
where
Labels are assigned based on the future returns and the calculated indicators:
-
Buy Signal (1): Assigned when:
- Future returns are greater than the positive threshold.
- Momentum is positive.
- 9-day SMA is greater than 21-day SMA.
-
Sell Signal (-1): Assigned when:
- Future returns are less than the negative threshold.
- Momentum is negative.
- 9-day SMA is less than 21-day SMA.
-
Hold Signal (0): Assigned when the conditions for buy and sell signals are not met.
The implementation in the TimeSeriesDataset
class involves the following steps:
- Initialize the Class: Set the parameters and prepare the data.
- Preprocess the Data: Calculate changes in returns, rolling indicators, and assign labels.
- Scale Features: Normalize the features using
StandardScaler
. - Data Handling: Implement methods to get the length of the dataset and retrieve individual data points.
The trading signal model is based on a convolutional neural network (CNN) which captures temporal patterns in the data. The architecture includes:
- Conv1D Layers: To capture temporal dependencies in the time series data.
- Adaptive Pooling: To reduce the sequence length to a fixed size.
- Fully Connected Layers: To further process the extracted features.
- Activation Functions:
ReLU
between layers andTanh
at the output to constrain the signals between -1 and 1.
The custom loss function, ExcessReturnLoss
, is designed to maximize the Sharpe Ratio, which measures the performance of the trading signals relative to their risk. The loss function:
- Calculates Excess Returns: Based on the signals and the actual returns.
- Computes the Sharpe Ratio: As the mean excess return divided by the standard deviation of excess returns.
- Negates the Sharpe Ratio: So that minimizing the loss function maximizes the Sharpe Ratio.
By using this architecture and loss function, the model aims to generate trading signals that optimize returns relative to risk.
Monte Carlo simulations are used to project the future performance of investment portfolios by running multiple scenarios based on historical data.
-
Generate random variables
$Z \sim N(0, 1)$ -
Calculate the Cholesky decomposition of the covariance matrix
$L$ -
Compute daily returns:
$\text{dailyReturns} = \mu + L \cdot Z$ -
Compute portfolio values:
$\text{portfolioValues} = \text{initialPortfolio} \cdot \prod_{t=1}^{T} (1 + \text{dailyReturns})$ -
Calculate performance metrics (e.g., Sharpe ratio, VaR, CVaR).
Monte Carlo simulation is a statistical method used to model the probability of different outcomes in a process that cannot easily be predicted due to the intervention of random variables. In the context of portfolio management, it is used to simulate the future returns of a portfolio by generating a wide range of possible outcomes based on historical data and statistical properties of asset returns.
- Number of Simulations (mc_sims): The number of simulated paths to generate.
- Time Horizon (T): The number of time periods (e.g., days) for each simulation.
- Portfolio Weights (weights): The allocation of the initial portfolio value across different assets.
- Mean Returns (meanReturns): The expected returns of the assets.
- Covariance Matrix (covMatrix): The covariance matrix of asset returns.
- Initial Portfolio Value (initial_portfolio): The starting value of the portfolio.
The simulation uses the Cholesky decomposition of the covariance matrix to ensure that the generated random returns preserve the statistical properties of the historical data.
- Cholesky Decomposition (L): The covariance matrix is decomposed into a lower triangular matrix using Cholesky decomposition.
- Random Samples (Z): Generate random samples from a standard normal distribution.
- daily returns: The daily returns are simulated by combining the mean returns with the random samples adjusted by the Cholesky matrix.
- portfolio values: The portfolio values are calculated by iteratively applying the daily returns to the initial portfolio value.
L = np.linalg.cholesky(covMatrix)
Z = np.random.normal(size=(T, len(weights)))
dailyReturns = meanM + np.inner(L, Z)
portfolio_values = np.cumprod(np.dot(weights, dailyReturns) + 1) * initial_portfolio