Backtest results stay the same regardless of model used
🐛 Bug Report
I noticed that when I switch between different models (e.g., LGBModel, XGBModel, DNNModel), the backtest results remain exactly the same — including cumulative return, daily return, and other portfolio metrics.
📄 Code Snippet
Here is the simplified version of my model training code and backtesting code:
model training code:
each time when training,I just modify the task[model] param and experiment_name
data_handler_config = {
"freq":"day", #1min
"start_time": "2020-01-01",
"end_time": "2025-03-31",
"fit_start_time": "2020-01-01",
"fit_end_time": "2025-03-30",
"instruments": ["BTC","ETH","BNB","NEO","ONG"]
}
lgb_model={
"class": "LGBModel",
"module_path": "qlib.contrib.model.gbdt",
"kwargs": {
"loss": "mse",
"colsample_bytree": 0.8879,
"learning_rate": 0.0421,
"subsample": 0.8789,
"lambda_l1": 205.6999,
"lambda_l2": 580.9768,
"max_depth": 8,
"num_leaves": 210,
"num_threads": 20,
},
}
d_ensemble_model={
"class": "DEnsembleModel",
"module_path": "qlib.contrib.model.double_ensemble",
"kwargs": {
"model_class": "LGBModel",
"n_estimators": 5,
"decay": 0.9,
}
}
xgb_model={
"class": "XGBModel",
"module_path": "qlib.contrib.model.xgboost"
}
task = {
"model": xgb_model, #this is the 1st param I changed when training
"dataset": {
"class": "DatasetH",
"module_path": "qlib.data.dataset",
"kwargs": {
"handler": {
"class": "Alpha158",
"module_path": "qlib.contrib.data.handler",
"kwargs": data_handler_config,
},
"segments": {
"train": ("2020-01-01", "2023-12-31"),
"valid": ("2024-01-01", "2024-12-31"),
"test": ("2025-01-01", "2025-03-30"),
},
},
},
}
# model initialization
model = init_instance_by_config(task["model"])
dataset = init_instance_by_config(task["dataset"])
# start exp to train model, this is the 2nd param I changed when training
with R.start(experiment_name="xgb_exp_3"):
R.log_params(**flatten_dict(task))
model.fit(dataset)
R.save_objects(trained_model=model)
R.save_objects(trained_dataset=dataset)
rid = R.get_recorder().id
print(f"rid of xgb_exp_3 = {rid}")
backtesting code:
It's always the same,except the experiment_name param
port_analysis_config = {
"executor": {
"class": "SimulatorExecutor",
"module_path": "qlib.backtest.executor",
"kwargs": {
"time_per_step": "day",
"generate_portfolio_metrics": True,
},
},
"strategy": {
"class": "TopkDropoutStrategy",
"module_path": "qlib.contrib.strategy.signal_strategy",
"kwargs": {
"model": model,
"dataset": dataset,
"topk": 50,
"n_drop": 5,
},
},
"backtest": {
"start_time": "2025-01-01",
"end_time": "2025-03-30",
"account": 100000000,
"benchmark": benchmark,
"exchange_kwargs": {
"limit_threshold": 0.99,
"deal_price": "close",
"open_cost": 0.0005,
"close_cost": 0.0005,
"min_cost": 1,
},
},
}
with R.start(experiment_name="xgb_backtest"):
recorder = R.get_recorder(recorder_id=rid, experiment_name="xgb_exp")
model = recorder.load_object("trained_model")
recorder = R.get_recorder()
sr = SignalRecord(model, dataset, recorder)
sr.generate()
par = PortAnaRecord(recorder, port_analysis_config, "day")
par.generate()
this is the result I've got everytime:
'The following are analysis results of benchmark return(1day).'
risk
mean -0.001070
std 0.026967
annualized_return -0.254626
information_ratio -0.612033
max_drawdown -0.280937
'The following are analysis results of the excess return without cost(1day).'
risk
mean -0.000667
std 0.022932
annualized_return -0.158777
information_ratio -0.448802
max_drawdown -0.148111
'The following are analysis results of the excess return with cost(1day).'
risk
mean -0.001136
std 0.022926
annualized_return -0.270377
information_ratio -0.764456
max_drawdown -0.164187
'The following are analysis results of indicators(1day).'
value
ffr 1.0
pa 0.0
pos 0.0
Expected Behavior
I expect backtest results to vary when I change the underlying model.
Screenshot
Environment
Note: User could run cd scripts && python collect_info.py all under project directory to get system information
and paste them here directly.
- Qlib version: 0.9.6
- Python version: 3.8.10
- OS (
Windows,Linux,MacOS): windows10 - Commit number (optional, please provide it if you are using the dev version):
Additional Notes
@kyleluc Is it open ? If it is can I solve it ?
@kyleluc Is it open ? If it is can I solve it ?
Sure,if there are any details I haven't explained clearly, or if you need me to provide additional information, please feel free to let me know.
Qlib’s Recorder system (R.get_recorder and recorder.load_object("trained_model")) saves and loads models using experiment names and recorder IDs. In the original code, the same experiment name (xgb_exp_3) was used for all models, causing Qlib to overwrite or reuse the same model file. This means the backtest always uses the same model (e.g., the last one saved), regardless of the intended model.
The experiment_name="xgb_exp_3" is reused for all models (LGBModel, XGBModel, DEnsembleModel). Qlib’s Recorder saves models to a directory based on the experiment name (e.g., ~/.qlib/qlib_data/experiments/xgb_exp_3). Reusing the name overwrites the previous model file, so all backtests load the same model
I think this might be the issue @kyleluc
I have encountered some version issues while trying to resolve this meanwhile you try to check the above comment and let me know if there is any update !!
Qlib’s Recorder system (R.get_recorder and recorder.load_object("trained_model")) saves and loads models using experiment names and recorder IDs. In the original code, the same experiment name (xgb_exp_3) was used for all models, causing Qlib to overwrite or reuse the same model file. This means the backtest always uses the same model (e.g., the last one saved), regardless of the intended model.
The experiment_name="xgb_exp_3" is reused for all models (LGBModel, XGBModel, DEnsembleModel). Qlib’s Recorder saves models to a directory based on the experiment name (e.g., ~/.qlib/qlib_data/experiments/xgb_exp_3). Reusing the name overwrites the previous model file, so all backtests load the same model
I think this might be the issue @kyleluc Sorry, perhaps I didn’t emphasize clearly enough in my original description that I modified parameters each time I conducted an experiment. I did notice the experiment_name parameter. Let me simplify the code I ran and explain step by step what I did and what results I obtained:
Sorry, perhaps I didn’t emphasize clearly enough in my original description that I modified parameters each time I conducted an experiment. I did notice the experiment_name parameter. Let me simplify the code I ran and explain step by step what I did and what results I obtained:
- I ran the following model training code (1st training):
data_handler_config = {
...
}
lgb_model = {
...
}
d_ensemble_model = {
...
}
xgb_model = {
"class": "XGBModel",
"module_path": "qlib.contrib.model.xgboost"
}
task = {
"model": xgb_model, # This is the first parameter I changed
...other params...
}
...
# This is the second parameter I changed
with R.start(experiment_name="xgb_exp_3"):
...training code...
- I then ran the following code for backtesting (1st backtest):
port_analysis_config = {
...
},
"strategy": {
...
},
"backtest": {
...
},
}
# Here I changed to a new experiment_name each time
with R.start(experiment_name="xgb_backtest3"):
# When getting the recorder, I would set experiment_name to match the one used in step 1.
# In this case, it is "xgb_exp_3"
recorder = R.get_recorder(recorder_id=rid, experiment_name="xgb_exp_3")
...
- The backtest in step 2 produced a segment of output data, which I included in my original question. Let’s temporarily refer to it as
Result A. - I then tried training a new model and performed another backtest using the new model. Training code(2nd training):
data_handler_config = {
...
}
lgb_model = {
...
}
d_ensemble_model = {
...
}
xgb_model = {
...
}
task = {
"model": lgb_model, # This is the first parameter I changed — I switched to a different model config
...other params...
}
...
# This is the second parameter I changed — I used a new experiment name to save the model
with R.start(experiment_name="lgb_exp"):
...training code...
- I used the new model for backtesting, with the following code (2nd backtest):
port_analysis_config = {
...
},
"strategy": {
...
},
"backtest": {
...
},
}
# The experiment_name for backtesting was also changed
with R.start(experiment_name="lgb_backtest"):
# The experiment_name used in training was "lgb_exp", so I updated it here to match.
# All other code remains the same
recorder = R.get_recorder(recorder_id=rid, experiment_name="lgb_exp")
...
- I got the same
Result Aas in the first backtest.
@Akashpaila There’s one additional piece of information that might be useful — the core code I used for backtesting is as follows:
# The experiment_name parameter is changed each time
with R.start(experiment_name="xgb_backtest_3"):
# The experiment_name parameter used to get the recorder is also changed each time
recorder = R.get_recorder(recorder_id=rid, experiment_name="xgb_exp_3")
print(f"recorder(rid={rid}) loaded, It's experiment id = {recorder.experiment_id}")
model = recorder.load_object("trained_model")
print(f"model of {rid} loaded: {model}")
# prediction
recorder = R.get_recorder()
ba_rid = recorder.id
sr = SignalRecord(model, dataset, recorder)
# This is the first key point — I used the loaded model to make predictions and got prediction results
sr.generate()
port_analysis_config["strategy"]["kwargs"]["model"] = model
port_analysis_config["strategy"]["kwargs"]["dataset"] = dataset
# backtest & analysis
par = PortAnaRecord(recorder, port_analysis_config, "day")
# This is the second key point — I used the loaded model to run backtesting and got backtest results
par.generate()
In the code above, I used the loaded model to perform both prediction and backtesting.
Each time I changed the trained model, the prediction results varied — different models produced completely different prediction results.
However, the backtest results did not behave the same way. Regardless of which model was used, the backtest results always remained the same.
So, could the problem lie in the backtesting code — specifically, this part:
port_analysis_config["strategy"]["kwargs"]["model"] = model
port_analysis_config["strategy"]["kwargs"]["dataset"] = dataset
# backtest & analysis
# Could there be an issue with how the PortAnaRecord object is constructed here?
par = PortAnaRecord(recorder, port_analysis_config, "day")
par.generate()
After sr.generate(), confirm that the predictions are saved correctly and are model-specific Also Explicitly set the signal file in port_analysis_config Many Qlib strategies (e.g., TopkDropoutStrategy) rely on a signal file rather than directly calling model.predict(). Explicitly setting the signal file ensures the strategy uses the new predictions.
Training
with R.start(experiment_name="test_exp"): model = XGBModel(max_depth=3) # Simple model dataset = ... # Your dataset recorder = R.get_recorder() sr = SignalRecord(model, dataset, recorder) sr.generate() rid = recorder.id # Save recorder ID
Backtesting
with R.start(experiment_name="xgb_backtest_3"): # Ensure recorder matches the training run recorder = R.get_recorder(recorder_id=rid, experiment_name="xgb_exp_3") print(f"Recorder loaded: experiment_id={recorder.experiment_id}, objects={recorder.list_objects()}")
# Load and verify model
model = recorder.load_object("trained_model")
print(f"Model loaded: {model}")
# Generate predictions
sr = SignalRecord(model, dataset, recorder)
sr.generate()
predictions = recorder.load_object("predictions.pkl")
print(f"Predictions: {predictions.head()}") # Verify predictions
# Update port_analysis_config with correct signal
port_analysis_config["strategy"]["kwargs"]["model"] = model
port_analysis_config["strategy"]["kwargs"]["dataset"] = dataset
port_analysis_config["strategy"]["kwargs"]["signal"] = "predictions.pkl" # Explicitly set signal
# Verify strategy signals
strategy = port_analysis_config["strategy"]["class"](**port_analysis_config["strategy"]["kwargs"])
print(f"Strategy signals: {strategy.get_signal().head()}") # Debug signals
# Run backtest
par = PortAnaRecord(recorder, port_analysis_config, "day")
par.generate()
# Verify backtest output
positions = recorder.load_object("positions.pkl") # Adjust filename as needed
print(f"Backtest positions: {positions.head()}")
After sr.generate(), confirm that the predictions are saved correctly and are model-specific Also Explicitly set the signal file in port_analysis_config Many Qlib strategies (e.g., TopkDropoutStrategy) rely on a signal file rather than directly calling model.predict(). Explicitly setting the signal file ensures the strategy uses the new predictions. ...
The positions printed from different models are always the same, regardless of whether this line of code is used to explicitly set the signal:
port_analysis_config["strategy"]["kwargs"]["signal"] = "predictions.pkl"
@Akashpaila Hi,I’ve found the real cause of the issue. If you're still interested, here are the conclusions and reasons behind this phenomenon:
This is not a bug. The key lies in three parameters:
data_handler_config["instruments"] used during training, and
port_analysis_config["strategy"]["kwargs"]["topk"] and n_drop used during backtesting.
Since the strategy used is TopkDropoutStrategy, and I specified only 5 instruments in the training via data_handler_config["instruments"], based on the buy-sell logic of this strategy and my parameter settings (topk=50, n_drop=5), it ends up selling all 5 holdings and then buying all 5 available instruments every single day.
As a result, regardless of what prediction scores the model generates, the behavior of the strategy is always the same—repeating the same buy/sell cycle—so the final backtest results remain unchanged.
I’ll go ahead and close this issue. Thank you for your attention and efforts on this matter.
@kyleluc Yeah Got it now. Thanks for that !!