Devon icon indicating copy to clipboard operation
Devon copied to clipboard

Agent regression tests

Open akiradev0x opened this issue 1 year ago • 2 comments

In order to make it easy for people to contribute, we need a good system for checking regressions.

At the moment, I propose using a simple file editing task or something where the model has to search, create and edit files. Running this with each supported model on a PR or something like that. Unsure of frequency as of now.

akiradev0x avatar May 20 '24 19:05 akiradev0x

As of now, Devon can't be run in headless mode, or am I wrong here? So wouldn't a requirement for this issue be a headless mode for Devon?

Idea: spin off this issue into a "Regression Testing Spec document" where we can draft a TESTING.md file with step-by-step guide following your file editing task (search, create, edit files)

amphetamarina avatar May 23 '24 16:05 amphetamarina

Really good idea, lets get a headless mode working. This shouldnt be too hard, basically we just remove the user tools and force the task to be some task set via the cli in __main__.py

akiradev0x avatar May 27 '24 07:05 akiradev0x

just added headless mode.

akiradev0x avatar May 27 '24 22:05 akiradev0x

closing issue and creating another for step by step testing workflows

akiradev0x avatar May 27 '24 22:05 akiradev0x