Fast-LLM
Fast-LLM copied to clipboard
Parallel tests v2
✨ Description
A simplified version of #273, where resources are allocated statically for each workers. This works fine, with some big caveats:
- Multi-gpu tests and spawned processes run at random and ignore memory limits, so there can be OOMs with lots of workers. On my 4-gpu runs the limit is around 20 workers.
- Tests with dependencies are skipped if they don't run in the same worker as their dependencies. I'm planning to fix this with smart scheduling in a follow-up PR. Skipped tests include most multi-gpu tests, which means the 20-worker limit above is likely overestimated. [Edit: found a really simple solution, working on it.]
- Tests have a hard-coded memory limit of 5 GB (though spawned processes ignore it). All current test seems ok with this, so it's fine for now.
This PR isn't that useful by itself given the skipped tests, but it's a good step forward, and I suggest merging right away to keep PRs small, and do the rest in follow-up PRs.
🔍 Type of change
Select all that apply:
- [ ] 🐛 Bug fix (non-breaking change that addresses a specific issue)
- [x] 🚀 New feature (non-breaking change that adds functionality)
- [ ] ⚠️ Breaking change (a change that could affect existing functionality)
- [x] 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
- [x] 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
- [ ] 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
- [ ] 📝 Documentation change (updates documentation, including new content or typo fixes)
- [ ] 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)