RD-Agent The initial implementation of the code was incorrect, but RDAgent failed to detect it.

During the first loop, the implement of SMA10 is wrong, the wrong code is : df['daily_pct_change'] = df['$close'].pct_change() , which lead the whole following loops to a wrong direction. Can I simply do something to fix this issue through some config or prompts?

Jan 13 '25 15:01 liujianliuku

It's supposed to be df['daily_pct_change'] = df['$close'].groupby(level='instrument').pct_change()

Jan 13 '25 15:01 liujianliuku

Intended for generality, the prompts do not target specific factor codes. However, if you indeed wish to provide specific code hints, this can be achieved. You can try edit prompts in rdagent/components/coder, although it is not recommended.

Jan 14 '25 10:01 TPLin22

Intended for generality, the prompts do not target specific factor codes. However, if you indeed wish to provide specific code hints, this can be achieved. You can try edit prompts in rdagent/components/coder, although it is not recommended.

Do you have some better idea for the issue?

Jan 14 '25 15:01 liujianliu

Hi @liujianliu @liujianliuku ! Thank you for bringing this important issue to our attention and for your interest in improving RD-Agent!

You are correct that there is a bug in the implementation of SMA10 in the first loop, where the code df['daily_pct_change'] = df['$close'].pct_change() does not handle the two-level index in the data correctly. The correct implementation should be df['daily_pct_change'] = df['$close'].groupby(level='instrument').pct_change() as you pointed out.

To address this issue, a good approach would be to add a new evaluation step in the evaluator. The problem arises because the LLM fails to understand the two-level index in the data. To mitigate this, we can randomly select half of the instruments and feed them into the code to generate the factors. The evaluator will then test the factor values in the selected instruments against the original factor values calculated from the whole dataset. The evaluator will only give a pass signal if every value matches.

Also, this evaluator randomly selects the dates in the dataset which can also test the data leakage problem. 😄

Implementing this new evaluator is not very complex. However, as our team is currently busy working on new features, we encourage you to participate in our open-source repository and draft a pull request (PR) to address this bug. We will do our best to help review and refine the code to ensure the issue is fixed.

Thank you once again for your valuable contribution!

Jan 16 '25 07:01 peteryang1

Hi @liujianliu @liujianliuku ! Thank you for bringing this important issue to our attention and for your interest in improving RD-Agent!

You are correct that there is a bug in the implementation of SMA10 in the first loop, where the code df['daily_pct_change'] = df['$close'].pct_change() does not handle the two-level index in the data correctly. The correct implementation should be df['daily_pct_change'] = df['$close'].groupby(level='instrument').pct_change() as you pointed out.

To address this issue, a good approach would be to add a new evaluation step in the evaluator. The problem arises because the LLM fails to understand the two-level index in the data. To mitigate this, we can randomly select half of the instruments and feed them into the code to generate the factors. The evaluator will then test the factor values in the selected instruments against the original factor values calculated from the whole dataset. The evaluator will only give a pass signal if every value matches.

Also, this evaluator randomly selects the dates in the dataset which can also test the data leakage problem. 😄

Implementing this new evaluator is not very complex. However, as our team is currently busy working on new features, we encourage you to participate in our open-source repository and draft a pull request (PR) to address this bug. We will do our best to help review and refine the code to ensure the issue is fixed.

Thank you once again for your valuable contribution!

Thank you for providing the solution. I will try this method to submit a PR, but I can't guarantee the timeline.

Jan 16 '25 15:01 liujianliu