Wenting Zhao
Wenting Zhao
Hi, I have applied a customized dataset in the few shot setting, the training contains 500, valid contains 1k, and the test contains 10k examples, Then, the result on F1,...
Hi, I'm wondering if there is a config file for training the synthetic dataset? Thank you.
I'm wondering if it is possible to release the processed data? I run into bugs while processing data according to the instruction. If it's not possible, could you provide an...
Hey, can you share a small portion of data, and the dependency parsing code, deduplication script? that would be of great value! thank you!
Hey congrats on the great work! I wonder if you can share the dependency parsing code, deduplication script? thank you!
Hi, I'm new to OpenHands and trying to evaluate SWE-bench using the following command: ``` ./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.gpt-4o-mini HEAD CodeActAgent 1 1 1 princeton-nlp/SWE-bench_Verified test ``` However, I ran into a...
## Summary This issue reports test failures in SWE-bench Verified dataset due to outdated package dependencies and external service failures. The affected instances fail gold patch validation not due to...
### Reference Issue - Fixes [#484](https://github.com/SWE-bench/SWE-bench/issues/484) for psf_requests_* gold patch failure on test cases #### What does this implement/fix? This PR adds a simple retry mechanism to the **PSF Requests**...