Dataset to use for backtesting

Hi, I have a dataset of stocks (2007 - 2021) that I want to backtest my strategy on. This dataset I have bifurcated into parts:

  1. In sample (this is the past dataset that I will run the backtest on and make optimisations to the strategy basis the results from the backtest). This is approx from 2007-2015.
  2. Out of sample - the optimised strategy I will then run on this dataset to understand how it would perform between 2016-2021.

I have already run some backtests and through that, one thing that I have come to realise is that my strategy is momentum based and hence will work only on stocks that have momentum or have had momentum in the past.

Considering this, does it make sense to exclude those stocks which have not had any momentum in the past and run the strategy on only those set of stocks which have exhibited momentum in the past?

Well I wouldn’t remove the stocks which didn’t show any momentum. If your system looks for momentum to enter then it should automatically filter these out. More the number of stocks more patterns you have access to which is a good thing to have.

Also for stocks note these

  1. Data should be adjusted for splits / bonus etc
  2. Liquidity should be considered. Remove the illiquid stocks from the dataset.
  3. If a stock is circuit prone better to remove it. Your system will generate a signal in simulation. But in the real world trade cannot be executed reliably, if it’s hit the upper or lower circuit.

Thanks for your inputs. What you said makes sense.

And yes I haven’t gone beyond Nifty 500 stocks. Even then I have a liquidity check in place.

K then you seem to be on the right side of things.