qlib icon indicating copy to clipboard operation
qlib copied to clipboard

[20964:MainThread](2024-07-26 23:06:46,892) INFO - qlib.ALSTM - [pytorch_alstm.py:245] - train nan, valid nan

Open SweetCone1 opened this issue 1 year ago • 4 comments

I encountered a problem: there was no issue when running LightGBM Alpha158 in the example file, but both ALSTM and KRNN resulted in train nan and valid nan issues, regardless of whether it was Alpha158 or Alpha360. Here is the error report: [20964:MainThread](2024-07-26 23:03:01,094) INFO - qlib.qrun - [cli.py:78] - Render the template with the context: {} [20964:MainThread](2024-07-26 23:03:01,107) INFO - qlib.Initialization - [config.py:416] - default_conf: client. [20964:MainThread](2024-07-26 23:03:01,109) INFO - qlib.Initialization - [init.py:74] - qlib successfully initialized based on client settings. [20964:MainThread](2024-07-26 23:03:01,109) INFO - qlib.Initialization - [init.py:76] - data_path={'__DEFAULT_FREQ': WindowsPath('C:/quant_data/qlib_bin')} [20964:MainThread](2024-07-26 23:03:01,111) INFO - qlib.workflow - [exp.py:258] - Experiment 1 starts running ... [20964:MainThread](2024-07-26 23:03:01,219) INFO - qlib.workflow - [recorder.py:341] - Recorder d73d6db63d0f4230ad2eba04096c6eb0 starts running under Experiment 1 ... warning: in the working copy of 'examples/workflow_by_code.ipynb', LF will be replaced by CRLF the next time Git touches it ModuleNotFoundError. XGBModel is skipped(optional: maybe installing xgboost can fix it). [20964:MainThread](2024-07-26 23:03:02,806) INFO - qlib.ALSTM - [pytorch_alstm.py:59] - ALSTM pytorch version... [20964:MainThread](2024-07-26 23:03:02,822) INFO - qlib.ALSTM - [pytorch_alstm.py:76] - ALSTM parameters setting: d_feat : 6 hidden_size : 64 num_layers : 2 dropout : 0.0 n_epochs : 200 lr : 0.001 metric : loss batch_size : 800 early_stop : 20 optimizer : adam loss_type : mse device : cuda:0 use_GPU : True seed : None [20964:MainThread](2024-07-26 23:03:02,824) INFO - qlib.ALSTM - [pytorch_alstm.py:119] - model: ALSTMModel( (net): Sequential( (fc_in): Linear(in_features=6, out_features=64, bias=True) (act): Tanh() ) (rnn): GRU(64, 64, num_layers=2, batch_first=True) (fc_out): Linear(in_features=128, out_features=1, bias=True) (att_net): Sequential( (att_fc_in): Linear(in_features=64, out_features=32, bias=True) (att_dropout): Dropout(p=0.0, inplace=False) (att_act): Tanh() (att_fc_out): Linear(in_features=32, out_features=1, bias=False) (att_softmax): Softmax(dim=1) ) ) [20964:MainThread](2024-07-26 23:03:02,824) INFO - qlib.ALSTM - [pytorch_alstm.py:120] - model size: 0.0502 MB [20964:MainThread](2024-07-26 23:06:07,731) INFO - qlib.timer - [log.py:127] - Time cost: 183.120s | Loading data Done [20964:MainThread](2024-07-26 23:06:27,218) INFO - qlib.timer - [log.py:127] - Time cost: 17.138s | RobustZScoreNorm Done [20964:MainThread](2024-07-26 23:06:28,487) INFO - qlib.timer - [log.py:127] - Time cost: 1.265s | Fillna Done [20964:MainThread](2024-07-26 23:06:30,228) INFO - qlib.timer - [log.py:127] - Time cost: 0.522s | DropnaLabel Done C:\Users\31878\miniconda3\envs\quant\lib\site-packages\qlib\data\dataset\processor.py:363: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df[cols] = t [20964:MainThread](2024-07-26 23:06:30,653) INFO - qlib.timer - [log.py:127] - Time cost: 0.424s | CSRankNorm Done [20964:MainThread](2024-07-26 23:06:30,788) INFO - qlib.timer - [log.py:127] - Time cost: 23.055s | fit & process data Done [20964:MainThread](2024-07-26 23:06:30,789) INFO - qlib.timer - [log.py:127] - Time cost: 206.178s | Init data Done [20964:MainThread](2024-07-26 23:06:30,810) WARNING - qlib.utils - [init.py:847] - The parameter reweighter with value None is ignored. [20964:MainThread](2024-07-26 23:06:32,877) INFO - qlib.ALSTM - [pytorch_alstm.py:235] - training... [20964:MainThread](2024-07-26 23:06:32,878) INFO - qlib.ALSTM - [pytorch_alstm.py:239] - Epoch0: [20964:MainThread](2024-07-26 23:06:32,899) INFO - qlib.ALSTM - [pytorch_alstm.py:240] - training... [20964:MainThread](2024-07-26 23:06:43,309) INFO - qlib.ALSTM - [pytorch_alstm.py:242] - evaluating... [20964:MainThread](2024-07-26 23:06:46,892) INFO - qlib.ALSTM - [pytorch_alstm.py:245] - train nan, valid nan [20964:MainThread](2024-07-26 23:06:46,894) INFO - qlib.ALSTM - [pytorch_alstm.py:239] - Epoch1: [20964:MainThread](2024-07-26 23:06:46,895) INFO - qlib.ALSTM - [pytorch_alstm.py:240] - training... [20964:MainThread](2024-07-26 23:06:55,218) INFO - qlib.ALSTM - [pytorch_alstm.py:242] - evaluating... [20964:MainThread](2024-07-26 23:06:59,019) INFO - qlib.ALSTM - [pytorch_alstm.py:245] - train nan, valid nan [20964:MainThread](2024-07-26 23:06:59,020) INFO - qlib.ALSTM - [pytorch_alstm.py:239] - Epoch2: [20964:MainThread](2024-07-26 23:06:59,020) INFO - qlib.ALSTM - [pytorch_alstm.py:240] - training... [20964:MainThread](2024-07-26 23:07:03,095) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[KeyboardInterrupt: ]. File "C:\Users\31878\miniconda3\envs\quant\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\31878\miniconda3\envs\quant\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\Users\31878\miniconda3\envs\quant\Scripts\qrun.exe_main.py", line 7, in sys.exit(run()) File "C:\Users\31878\miniconda3\envs\quant\lib\site-packages\qlib\workflow\cli.py", line 151, in run fire.Fire(workflow) File "C:\Users\31878\miniconda3\envs\quant\lib\site-packages\fire\core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "C:\Users\31878\miniconda3\envs\quant\lib\site-packages\fire\core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "C:\Users\31878\miniconda3\envs\quant\lib\site-packages\fire\core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "C:\Users\31878\miniconda3\envs\quant\lib\site-packages\qlib\workflow\cli.py", line 145, in workflow recorder = task_train(config.get("task"), experiment_name=experiment_name) File "C:\Users\31878\miniconda3\envs\quant\lib\site-packages\qlib\model\trainer.py", line 127, in task_train _exe_task(task_config) File "C:\Users\31878\miniconda3\envs\quant\lib\site-packages\qlib\model\trainer.py", line 49, in exe_task auto_filter_kwargs(model.fit)(dataset, reweighter=reweighter) File "C:\Users\31878\miniconda3\envs\quant\lib\site-packages\qlib\utils_init.py", line 850, in _func return func(*args, **new_kwargs) File "C:\Users\31878\miniconda3\envs\quant\lib\site-packages\qlib\contrib\model\pytorch_alstm.py", line 241, in fit self.train_epoch(x_train, y_train) File "C:\Users\31878\miniconda3\envs\quant\lib\site-packages\qlib\contrib\model\pytorch_alstm.py", line 169, in train_epoch feature = torch.from_numpy(x_train_values[indices[i : i + self.batch_size]]).float().to(self.device) KeyboardInterrupt: [20964:MainThread](2024-07-26 23:07:03,104) INFO - qlib.timer - [log.py:127] - Time cost: 0.003s | waiting async_log Done ^C

SweetCone1 avatar Jul 26 '24 15:07 SweetCone1

你好,我也遇到同样的问题,LSTM与transformer模型也是损失出现nan值,请问您那边解决这个问题了吗

gh8808 avatar Jan 17 '25 08:01 gh8808

同问

xzYue avatar Feb 28 '25 13:02 xzYue

What's your pandas version? I encountered a similar problem when running with pandas 2.2.3, I solved this by downgrading to pandas 1.5.3.

DarkLink avatar Mar 25 '25 01:03 DarkLink

What's your pandas version? I encountered a similar problem when running with pandas 2.2.3, I solved this by downgrading to pandas 1.5.3.

谢谢!问题解决了。

xzYue avatar Apr 17 '25 12:04 xzYue