Results 1 comments of Akhil Appana

Proposal: Use [SWE bench verified](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified) as a scoring framework to evaluate the performance of Gemini CLI. Or do we already have a different plan of action?