Wenting Zhao issues

Results 8 issues of


                                            Wenting Zhao

same result on F1, Accuracy, precision

Hi, I have applied a customized dataset in the few shot setting, the training contains 500, valid contains 1k, and the test contains 10k examples, Then, the result on F1,...

Config file for Synthetic Datasets

Hi, I'm wondering if there is a config file for training the synthetic dataset? Thank you.

example file of processed data

I'm wondering if it is possible to release the processed data? I run into bugs while processing data according to the instruction. If it's not possible, could you provide an...

dependency parsing code and deduplication script

Hey, can you share a small portion of data, and the dependency parsing code, deduplication script? that would be of great value! thank you!

dependency parsing code, deduplication script

Hey congrats on the great work! I wonder if you can share the dependency parsing code, deduplication script? thank you!

Image Build Fails When Running SWE-bench Evaluation with OpenHands

Hi, I'm new to OpenHands and trying to evaluate SWE-bench using the following command: ``` ./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.gpt-4o-mini HEAD CodeActAgent 1 1 1 princeton-nlp/SWE-bench_Verified test ``` However, I ran into a...

[Dataset & Code Fix] Resolve gold patch validation failures caused by environment and dependency mismatches (Astropy & PSF Requests)

## Summary This issue reports test failures in SWE-bench Verified dataset due to outdated package dependencies and external service failures. The affected instances fail gold patch validation not due to...

[Bug Fix] Add retry mechanism in PSF Requests test to prevent false negatives

### Reference Issue - Fixes [#484](https://github.com/SWE-bench/SWE-bench/issues/484) for psf_requests_* gold patch failure on test cases #### What does this implement/fix? This PR adds a simple retry mechanism to the **PSF Requests**...