Grantham/add-attach_shm-template
Why are the changes needed?
Adding SHM is a necessity for multi-GPU ML training workloads. It is currently not immediately obvious how to do this.
What changes were proposed in this pull request?
This PR simply adds a convenience function to generate a PodTemplate configured to attach SHM to a task.
Additionally, this PR adds a directory for future contributions around similar PodTemplate wrappers in the future.
How was this patch tested?
I have used this function for my workflows to attach SHM.
More tests to be added soon.
Check all the applicable boxes
- [ ] I updated the documentation accordingly.
- [ ] All new and existing tests passed.
- [ ] All commits are signed-off.
Summary by Bito
This PR implements and validates shared memory (SHM) attachment functionality for multi-GPU ML training workloads. It introduces a pod_templates package with attach_shm utility function for configuring shared memory in tasks. The implementation includes core functionality and comprehensive testing suite that verifies SHM pod template properties including name, size (5Gi), and proper template attachment.Unit tests added: True
Estimated effort to review (1-5, lower is better): 1
Codecov Report
Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.
Project coverage is 75.72%. Comparing base (
bc0e8c0) to head (87aa4a7). Report is 3 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| flytekit/extras/pod_templates/attach_shm.py | 0.00% | 4 Missing :warning: |
| flytekit/extras/pod_templates/__init__.py | 0.00% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #3020 +/- ##
===========================================
+ Coverage 51.35% 75.72% +24.37%
===========================================
Files 204 203 -1
Lines 21446 21280 -166
Branches 2729 2733 +4
===========================================
+ Hits 11014 16115 +5101
+ Misses 9834 4361 -5473
- Partials 598 804 +206
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Code Review Agent Run #dd1f99
Actionable Suggestions - 3
-
flytekit/extras/pod_templates/attach_shm.py - 2
- Consider adding size parameter validation · Line 4-4
- Potential insecure temp file usage · Line 16-16
-
tests/flytekit/unit/extras/templates/test_pod_templates.py - 1
- Test lacks assertions for functionality verification · Line 4-10
Review Details
-
Files reviewed - 3 · Commit Range:
a77aab8..44e2ea3- flytekit/extras/pod_templates/__init__.py
- flytekit/extras/pod_templates/attach_shm.py
- tests/flytekit/unit/extras/templates/test_pod_templates.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Changelist by Bito
This pull request implements the following key changes.
| Key Change | Files Impacted |
|---|---|
| New Feature - Shared Memory Pod Template Implementation |
- - - |
Code Review Agent Run #8c9989
Actionable Suggestions - 1
-
tests/flytekit/unit/extras/templates/test_pod_templates.py - 1
- Missing task decorator for pod template · Line 14-14
Review Details
-
Files reviewed - 1 · Commit Range:
44e2ea3..067c902- tests/flytekit/unit/extras/templates/test_pod_templates.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Should this be API for
PodTemplateitself, to allow for mutating an existing pod template?from flytekit.core.pod_template import PodTemplate pod_template = ( PodTemplate() .with_container(...) .with_shim(...) )
I had put this together as a MVP to add SHM to a pod, but I would like this more extensible solution. I think that it could allow for the more intuitive ability to define SHM within flytekit.Resources