I ran a simple Asset Class Allocation backtest on Chat GPT. Results are interesting

Asset Class Allocation Backtest (20th May 2019 – 19th April 2025) - Annually Rebalanced. I chose 20th May 2019 as a starting point as it was just before the election results of the new Govt.

XIRR vs. Estimated Max Drawdown

  • 80/10/10 (Equity/Debt/Gold) → 11.86% XIRR, 40.1% Drawdown
  • 60/20/20 → 11.70% XIRR, 30.7% Drawdown
  • 55/25/20 → 11.65% XIRR, 28.3% Drawdown
  • 40/30/30 → 11.54% XIRR, 22.1% Drawdown
  • 35/50/15 → 11.43% XIRR, 18.8% Drawdown

Equity - Nifty BEES
Debt - Bharat Bond ETF
Gold - Gold BEES

Returns for either allocation aren’t far away from each other, but drawdowns are significantly different.

I ran a simple Asset Class Allocation backtest on Chat GPT.

What do you mean by “ran on ChatGPT” ? :thinking:

(Hopefully, not just prompting it to “Run a backtest …” and
accepting the plausible but not guaranteed to be factually accurate output
without fact-checking it against actual data-sources.)

I simply prompted GPT to do the math with various allocation. The Instruments I have selected have been around for more than 15 years.

Hmmm… please do cross-verify the reported results.
LLMs often fail at basic math and logic, let alone calculate XIRR and dropdowns.

Source: A Categorical Archive of ChatGPT Failures.
( The above paper is a bit dated,
but, over the past year or so, nothing relevant to this behaviour
has fundamentally changed in the underlying LLM architecture. )

Thanks for sharing the paper. I wouldn’t put too much time into cross verifying as the results are fairly in line with the scheme presentations of most multi asset mutual funds. My XIRR numbers are a bit on the lower side as I have chosen a certain market peak as starting point. The XIRR changes to some degree when you chose major market bottoms as starting point (which is rarely the case for most investors).