agentscope icon indicating copy to clipboard operation
agentscope copied to clipboard

fix(evaluator_storage): correct docstring on directory organization

Open kristol07 opened this issue 2 months ago • 4 comments

AgentScope Version

commit: 5c3a7705c3a922e8a41c88f91666c870737f9075

I am updating the latest code in main branch.

Description

fix(evaluator_storage): correct save path ordering in FileEvaluatorStorage

In docstring, the directory structure is:

The files are organized in a directory structure:
    - save_dir/
        - evaluation_result.json
        - evaluation_meta.json
        - {task_id}/
            - {repeat_id}/
                - solution.json
                - evaluation/
                    - {metric_name}.json

But the implementation doesn't follow this structure.

image

Checklist

Please check the following items before code is ready to be reviewed.

  • [x] Code has been formatted with pre-commit run --all-files command
  • [x] All tests are passing
  • [x] Docstrings are in Google style
  • [x] Related documentation has been updated (e.g. links, examples, etc.)
  • [x] Code is ready for review

kristol07 avatar Nov 12 '25 10:11 kristol07

@qbc2016 @DavdGao Please take a review, this is minor change.

kristol07 avatar Nov 12 '25 10:11 kristol07

@kristol07 Thanks for pointing out the issue, but the it seems like it's the typo in docstrings rather than the code implementation. Considering we are developing evaluation visualization in agentscope-studio with the current directory organization, maybe just fix the wrong description in docstrings instead?

DavdGao avatar Nov 17 '25 04:11 DavdGao

@DavdGao I think the best approach depends on how you want to interpret or evaluate the results. In my situation, since there are multiple distinct testing scenarios and I want to assess my agent's stability in each one, I’m more interested in the outcomes of each repeated task within the same scenario. Therefore, grouping the results by task ID is preferable in my case, that's why I thought it's code error. On the other hand, if all the testing scenarios are of the same type, it makes more sense to group by repeat ID and review the overall results across all test scenarios, that may be the case of agentscope-studio.

Grouped by task (test case): image

Grouped by repeatId: image

kristol07 avatar Nov 17 '25 04:11 kristol07

@DavdGao I think the best approach depends on how you want to interpret or evaluate the results. In my situation, since there are multiple distinct testing scenarios and I want to assess my agent's stability in each one, I’m more interested in the outcomes of each repeated task within the same scenario. Therefore, grouping the results by task ID is preferable in my case, that's why I thought it's code error. On the other hand, if all the testing scenarios are of the same type, it makes more sense to group by repeat ID and review the overall results across all test scenarios, that may be the case of agentscope-studio.

Grouped by task (test case): image

Grouped by repeatId: image

Hi @DavdGao Do you have any suggestion on the flexibility to be provided to developers? For your comment, pr is updated already.

kristol07 avatar Nov 20 '25 02:11 kristol07

CLA assistant check
All committers have signed the CLA.

cla-assistant[bot] avatar Dec 02 '25 09:12 cla-assistant[bot]