By Maksym Lytvynov, Founder of AlphaStocks | Last updated: March 2026

Walk-Forward Backtest Results

Does the AlphaStocks scoring formula actually work on data it has never seen? This page presents the full walk-forward validation results for our composite scoring system. A portfolio of the top-30 highest-scoring S&P 500 stocks delivered +132.9% total returnversus +98.7% for the S&P 500 over January 2021 through February 2026, with +8.4% annual alpha during the out-of-sample period the formula had never seen during development.

Below you will find the methodology, complete results tables, drawdown analysis, honest limitations, and the full disclaimer. We disclose everything because transparency is not optional when real money is at stake.

What Is Walk-Forward Validation?

Walk-forward validation is the gold standard for testing quantitative investment strategies. The concept is straightforward: you split your data into two non-overlapping periods. The first period (in-sample) is used to design, calibrate, and optimize your model. The second period (out-of-sample) is used to test it on data the model has never seen.

This approach directly addresses the most dangerous problem in quantitative finance: overfitting. With enough parameters, any model can be made to look spectacular on historical data. A model that “predicts the past” with 99% accuracy might be completely useless going forward because it memorized noise rather than discovering genuine patterns.

Walk-forward validation prevents this by separating discovery from confirmation. If a strategy performs well on data it was specifically designed for, that tells you very little. If it performs well on data it has never seen, that is meaningful evidence that the underlying signals are real.

Most retail trading tools skip this step entirely. They show you one set of backtested numbers without disclosing whether the model was designed on the same data being used to evaluate it. That is like a student writing the test questions and then claiming a perfect score proves intelligence.

How We Structured Our Test

AlphaStocks split the January 2021 through February 2026 period into two distinct windows:

Phase	Period	Purpose
In-sample	Jan 2021 – Dec 2023	Design and calibrate the scoring formula. Choose model weights, test axis combinations, set thresholds.
Out-of-sample	Jan 2024 – Feb 2026	Lock the formula and test it on unseen data. No changes, no re-optimization, no parameter tweaking.

The in-sample period includes the post-COVID rally, the 2022 bear market, and the 2023 recovery — a diverse set of market conditions that stress-tested the formula across multiple regimes.

Once the formula was finalized, it was locked. The out-of-sample period (2024 through early 2026) includes the AI-driven tech rally, sector rotations, and several correction events. The formula was not modified in response to any of these conditions.

Scoring Formula Used (v22)

The backtest uses the v22 composite scoring formula, which blends four axes into a single 0–10 rating for every S&P 500 stock:

Composite = Quality × 0.40 + Value × 0.10 + Momentum × 0.35 + Timing × 0.15

Axis	Weight	What It Measures
Quality	40%	Fundamental strength via Piotroski F-Score (35%) and Buffett-style quality assessment (65%)
Value	10%	Valuation attractiveness via Graham Fair Value (45%), Lynch PEG (25%), Greenblatt Magic Formula (30%)
Momentum	35%	6-month price performance, percentile-ranked within the S&P 500
Timing	15%	min(Value, Momentum) — the value-trap killer, requires both cheapness and rising prices

These weights are intentional. Quality gets the largest share because long-term business strength is the most reliable predictor of stock returns. Momentum is second because price trends capture information not yet reflected in quarterly filings. Value is deliberately modest to avoid chasing value traps. Timing ties them together. For a full explanation of each axis and the five investment models behind them, see the methodology page.

Top-10 Portfolio: Buy-and-Hold Results

The simplest test: take the 10 highest-scoring stocks at the start of the backtest period (January 2021) and hold them through February 2026 with no rebalancing. This is a pure buy-and-hold test that eliminates any benefit from monthly rotation.

Total Return

+185%

S&P 500 (SPY)

+99%

Annual Alpha

+8.4%

The top-10 portfolio nearly doubled the S&P 500's return over the same period. An annual alpha of +8.4% means that each year, on average, the portfolio outperformed the benchmark by 8.4 percentage points. Over five years, that compounds into a substantial gap.

This buy-and-hold result is particularly significant because it requires no ongoing management. The formula identified high-quality companies at attractive prices with strong momentum, and those characteristics persisted over the following five years.

* Hypothetical backtested results. Does not reflect actual trading. No transaction costs, taxes, or slippage included.

Top-30 Portfolio: Monthly Rebalanced Results

The more rigorous test uses a larger portfolio with monthly rebalancing. Each month, the 30 highest-scoring stocks are selected, equally weighted, and held until the next rebalance. This tests whether the formula can consistently identify outperforming stocks over time, not just at a single entry point.

Period	Top-30*	S&P 500 (SPY)	Alpha	Sharpe
In-sample (2021–2023)	+48.2%	+34.4%	+13.9%	0.61
Out-of-sample (2024–2026)	+51.8%	+45.6%	+8.4%	1.12
Full period (2021–2026)	+132.9%	+98.7%	+34.2%	0.84

* Hypothetical backtested results. Top-30 portfolio = 30 highest-scoring stocks, equally weighted, rebalanced monthly. Simulated, not actual trading. Does not include transaction costs, taxes, or slippage.

Several observations stand out from this data. The top-30 portfolio outperformed the S&P 500 in both the in-sample and out-of-sample periods. The out-of-sample alpha (+8.4%) is lower than the in-sample alpha (+13.9%), which is expected — no model works as well on unseen data as on its training data. However, the alpha remained substantial and positive.

The full-period cumulative alpha of +34.2 percentage points demonstrates the power of consistent, modest annual outperformance compounded over multiple years.

Risk-Adjusted Returns: The Sharpe Ratio

Raw returns tell only half the story. The Sharpe ratio measures return per unit of risk. A higher Sharpe means the portfolio delivered more return for each unit of volatility endured.

The out-of-sample Sharpe ratio of 1.12 is notably higher than the in-sample Sharpe of 0.61. This is the opposite of what you see with overfit strategies, where risk-adjusted returns typically degrade dramatically on unseen data. An improving Sharpe ratio out-of-sample is one of the strongest signals that a model has captured genuine market patterns rather than noise.

For context, a Sharpe ratio above 1.0 is generally considered good for a long-only equity strategy. The S&P 500's historical Sharpe ratio typically falls between 0.3 and 0.7 depending on the measurement period.

Drawdown Analysis

Maximum drawdown measures the largest peak-to-trough decline during the test period. It answers the question every investor actually cares about: how bad did it get at the worst point?

Portfolio	Maximum Drawdown
Top-30 Portfolio	-14.3%
S&P 500 (SPY)	-23.9%

The top-30 portfolio experienced roughly 40% less drawdownthan the S&P 500. A maximum drawdown of -14.3% versus -23.9% means the formula-selected stocks fell significantly less during the worst market conditions of the test period.

This matters more than most investors realize. A 24% drawdown requires a 31.6% recovery to break even. A 14% drawdown requires only a 16.3% recovery. Smaller drawdowns mean faster recoveries and less temptation to panic-sell at the bottom.

The lower drawdown is likely a consequence of the Quality axis receiving the largest weight (40%). High-quality companies with strong balance sheets and consistent profitability tend to decline less during market stress. The formula is inherently biased toward resilient businesses, and that bias shows up most clearly during corrections.

In-Sample vs. Out-of-Sample: Why It Matters

The distinction between in-sample and out-of-sample performance is the single most important concept in quantitative finance testing. Many backtests you encounter online are exclusively in-sample: the model was designed and tested on the same data. This guarantees nothing about future performance.

Here is what our walk-forward split reveals:

In-sample alpha (+13.9%): The formula was designed to capture these returns. While encouraging, in-sample performance is always inflated because the model was tuned to this specific environment. It tells us the formula can describe the past, not that it can predict the future.

Out-of-sample alpha (+8.4%):This is the number that matters. The formula was locked before this period began. No weights were adjusted, no thresholds tweaked, no models added or removed. The formula simply ran forward on data it had never seen — and it still delivered meaningful alpha.

Alpha decay of ~40%:The out-of-sample alpha is approximately 40% lower than in-sample. This is normal and expected. Strategies always perform somewhat worse on unseen data. The fact that alpha remained substantially positive — not zero or negative — is the key finding.

Consider what an overfit strategy would look like: spectacular in-sample returns followed by zero or negative out-of-sample alpha. The Sharpe ratio would collapse. Drawdowns would increase. Our results show the opposite pattern — risk-adjusted returns improved and drawdowns remained controlled.

Known Weaknesses and Limitations

No investment strategy works in all market conditions. Transparency requires being explicit about where this formula struggles:

Underperforms in sideways, range-bound markets. During flat market periods within the 2024–2026 out-of-sample window, the top-30 portfolio returned -3.2% while the S&P 500 returned +3.0%. The formula relies on momentum as its second-largest axis (35%). When prices move sideways, momentum signals become noisy and unreliable, leading to poor stock selection.
Hypothetical backtested results only. The backtest is a simulation, not a track record of actual trades. No real money was invested during the test period. Simulated results are inherently more optimistic than real-world results.
No transaction costs, taxes, or slippage modeled. Monthly rebalancing of 30 stocks generates meaningful trading costs. Commission-free brokers still impose bid-ask spreads. Taxes on short-term capital gains (from monthly turnover) would reduce net returns. Market impact when entering and exiting positions is not reflected.
Monthly rebalancing assumed. The backtest rebalances on the first trading day of each month. Real investors may rebalance at different frequencies or in response to different triggers, producing different results.
S&P 500 universe introduces survivorship bias. The formula scores current S&P 500 constituents. Companies that were in the index during the test period but later removed (due to bankruptcy, acquisition, or relegation) are not fully accounted for.
Past performance does not guarantee future results. Market regimes change. The factors that drove alpha in 2021–2026 may not persist in future periods. Correlation structures shift, monetary policy evolves, and new market dynamics emerge that historical models cannot anticipate.

What These Results Mean

The backtest results suggest — but do not prove — that the AlphaStocks composite formula identifies genuinely useful investment signals. Specifically:

The combination of quality, value, momentum, and timing factors appears to produce returns above the benchmark across multiple market conditions. The formula outperformed in the post-COVID rally, the 2022 bear market, and the 2024–2025 AI-driven bull run. It underperformed in sideways periods.

The value-trap killer (Timing = min(Value, Momentum)) appears to reduce drawdowns. By requiring both cheapness and rising prices, the formula avoids the classic trap of buying stocks that are cheap and getting cheaper. This likely explains the substantially lower maximum drawdown.

The out-of-sample results are weaker than in-sample, as expected, but remain meaningfully positive. A strategy that maintains alpha on unseen data, with improving risk-adjusted returns, is a stronger signal than spectacular in-sample numbers that collapse forward.

These results do not mean the formula will continue to outperform. They mean it has demonstrated the ability to do so in a controlled, honestly disclosed test.

How to Use Backtest Results in Your Research

Backtest results are a starting point, not a conclusion. Here is how to incorporate them into a sound investment process:

Use scores as a screening tool, not a decision tool. The stock screener and rankings help you narrow 500+ stocks to a manageable shortlist. The final investment decision should incorporate your own research, risk tolerance, and financial situation.
Understand why a stock scores well, not just that it does. A composite score of 8.5 is meaningless without understanding which axes are driving it. A stock scoring 8.5 because of strong quality and momentum is a very different investment than one scoring 8.5 because of extreme cheapness.
Do not ignore the limitations. If you are investing during a sideways market, the formula's momentum-heavy weighting may produce weaker signals. Be aware of when the strategy's known weaknesses are most likely to manifest.
Diversify beyond the formula. Even a backtested-validated strategy can underperform for extended periods. Do not allocate 100% of your portfolio to any single approach.
Consult a qualified financial adviser. AlphaStocks provides research tools, not personalized investment advice. Your individual circumstances — tax situation, time horizon, risk tolerance, income needs — should drive your investment decisions.

Frequently Asked Questions

What is walk-forward validation and why is it important?

Walk-forward validation is a backtesting method where a model is designed using one time period (in-sample) and then tested on a separate, future time period (out-of-sample) that was never seen during development. It is the gold standard for evaluating quantitative strategies because it reveals whether a model genuinely captures market patterns or merely overfits to historical noise. Without walk-forward validation, backtested results are essentially meaningless — any model can be made to look good on data it was designed to explain.

Is the AlphaStocks backtest overfit to historical data?

The strongest evidence against overfitting is the out-of-sample performance. The Sharpe ratio improved from 0.61 in-sample to 1.12 out-of-sample, which is the opposite of what you see with overfit strategies. Additionally, the formula uses only five well-documented investment models with fixed weights, rather than hundreds of data-mined parameters. The formula was intentionally kept simple to reduce overfitting risk. That said, no backtest can definitively prove a strategy is not overfit — only future live performance can do that.

Does the scoring formula always outperform the S&P 500?

No. The formula underperforms in sideways, range-bound markets. During such periods in the 2024–2026 test window, the top-30 portfolio returned -3.2% while the S&P 500 returned +3.0%. The strategy works best in trending markets — both up and down — where quality and momentum factors can express themselves. No strategy outperforms in all market conditions, and anyone claiming otherwise should be treated with skepticism.

Can past backtest results predict future performance?

No. Past performance, including walk-forward validated results, does not guarantee future returns. Market conditions change, correlations shift, and strategies that worked historically may not work in the future. What walk-forward validation does provide is evidence that a strategy captured real market patterns rather than noise. That evidence is useful context for research, but it is not a prediction.

Does the backtest include transaction costs and taxes?

No. The backtest is hypothetical and does not include transaction costs, taxes, slippage, or market impact. Real-world returns would be lower. Monthly rebalancing of 30 stocks generates meaningful trading costs that are not reflected in the reported numbers. For tax-advantaged accounts (like IRAs), the tax impact is reduced. For taxable accounts, the monthly turnover would generate short-term capital gains taxed at ordinary income rates.

Full Methodology Composite Score Explained Value Traps Explained Stock Rankings Stock Screener

Important Disclaimer: All backtest results presented on this page are hypothetical and do not represent actual trading or real portfolio returns. The AlphaStocks scoring formula was backtested using historical data and simulated portfolio construction. Results do not include transaction costs, commissions, taxes, slippage, or market impact. Actual results would differ, likely unfavorably. Monthly rebalancing was assumed; actual rebalancing frequency and timing would affect results.

Past performance, whether actual or backtested, does not guarantee future results. Investment in securities involves risk, including the possible loss of principal. AlphaStocks provides algorithm-generated research tools, not personalized investment advice. Always conduct your own due diligence and consult a qualified financial adviser before making investment decisions.

Model names reference the published investment methodologies of their respective authors. AlphaStocks is not affiliated with, endorsed by, or sponsored by Warren Buffett, Berkshire Hathaway, Peter Lynch, Fidelity Investments, Joel Greenblatt, Gotham Asset Management, or the estates of Benjamin Graham or Joseph Piotroski. Data sourced from SEC EDGAR filings and Alpaca Markets.

← Back to Methodology