Backtesting is the process of evaluating a trading strategy or investment model using historical data. It answers the question: "If I had followed these rules in the past, what would my returns have been?" When done rigorously, backtesting provides evidence for or against a strategy's viability. When done carelessly, it becomes a tool for self-deception. The difference between a reliable backtest and a misleading one often determines whether a strategy succeeds or fails with real capital.
Point-in-time data is the most critical requirement for accurate backtesting. Financial data is frequently revised: a company's initial earnings report may be restated weeks or months later. Using the restated data in a backtest creates look-ahead bias because the revised figures were not available when the trading decision would have been made. Point-in-time databases, such as those from Compustat or FactSet, record data as it was originally reported, preserving the information set that was actually available to investors at each point in history.
Transaction costs and market impact must be modeled realistically. A strategy that trades frequently in small-cap stocks may look spectacular on paper but become unprofitable once commissions, bid-ask spreads, and market impact are included. Commissions are typically a minor cost today, but bid-ask spreads can be substantial for less liquid stocks. Market impact, the price movement caused by your own trading, increases with position size and is particularly important for institutional investors. A common rule of thumb is to assume round-trip transaction costs of 10-50 basis points for large-cap stocks and 50-200 basis points for small-cap stocks.
Out-of-sample testing is the gold standard for evaluating backtested results. The historical data is split into an in-sample period (used to develop and calibrate the strategy) and an out-of-sample period (used to evaluate it). The strategy is not modified based on out-of-sample results. If performance degrades significantly out-of-sample, the strategy likely overfit the in-sample data. Walk-forward analysis extends this concept by repeatedly re-calibrating the strategy on rolling windows of data and testing on the subsequent period.
Multiple testing bias is perhaps the most insidious threat to backtest validity. When you test many variations of a strategy (different parameters, signals, and holding periods), some will appear profitable purely by chance. If you test 100 random strategies, approximately 5 will show "statistically significant" results at the 5% level even if none of them have any real edge. Correcting for multiple comparisons using methods like the Bonferroni correction or the false discovery rate is essential but rarely done in practice.
A credible backtest report should include: annualized return and volatility, Sharpe ratio, maximum drawdown and drawdown duration, the number of trades, average trade return, win rate, profit factor, and performance across different market regimes (bull, bear, sideways). The results should be robust to reasonable parameter variations. If changing a lookback period from 12 months to 11 months destroys the strategy's profitability, the original result was likely noise rather than signal.