fix(evaluator_storage): correct docstring on directory organization
AgentScope Version
commit: 5c3a7705c3a922e8a41c88f91666c870737f9075
I am updating the latest code in main branch.
Description
fix(evaluator_storage): correct save path ordering in FileEvaluatorStorage
In docstring, the directory structure is:
The files are organized in a directory structure:
- save_dir/
- evaluation_result.json
- evaluation_meta.json
- {task_id}/
- {repeat_id}/
- solution.json
- evaluation/
- {metric_name}.json
But the implementation doesn't follow this structure.
Checklist
Please check the following items before code is ready to be reviewed.
- [x] Code has been formatted with
pre-commit run --all-filescommand - [x] All tests are passing
- [x] Docstrings are in Google style
- [x] Related documentation has been updated (e.g. links, examples, etc.)
- [x] Code is ready for review
@qbc2016 @DavdGao Please take a review, this is minor change.
@kristol07 Thanks for pointing out the issue, but the it seems like it's the typo in docstrings rather than the code implementation. Considering we are developing evaluation visualization in agentscope-studio with the current directory organization, maybe just fix the wrong description in docstrings instead?
@DavdGao I think the best approach depends on how you want to interpret or evaluate the results. In my situation, since there are multiple distinct testing scenarios and I want to assess my agent's stability in each one, I’m more interested in the outcomes of each repeated task within the same scenario. Therefore, grouping the results by task ID is preferable in my case, that's why I thought it's code error. On the other hand, if all the testing scenarios are of the same type, it makes more sense to group by repeat ID and review the overall results across all test scenarios, that may be the case of agentscope-studio.
Grouped by task (test case):
Grouped by repeatId:
@DavdGao I think the best approach depends on how you want to interpret or evaluate the results. In my situation, since there are multiple distinct testing scenarios and I want to assess my agent's stability in each one, I’m more interested in the outcomes of each repeated task within the same scenario. Therefore, grouping the results by task ID is preferable in my case, that's why I thought it's code error. On the other hand, if all the testing scenarios are of the same type, it makes more sense to group by repeat ID and review the overall results across all test scenarios, that may be the case of agentscope-studio.
Grouped by task (test case):
Grouped by repeatId:
Hi @DavdGao Do you have any suggestion on the flexibility to be provided to developers? For your comment, pr is updated already.

