Feature request: Implement mechanism to update `classesloaded.txt` file for automatic priming
Use case
PLEASE READ: Priming documentation: https://github.com/aws-powertools/powertools-lambda-java/blob/main/Priming.md
PR introducing class-preloading: https://github.com/aws-powertools/powertools-lambda-java/pull/1861.
This project uses class pre-loading to implement automatic priming to reduce AWS Snapstart restore duration. The class pre-loader reads the classesloaded.txt of a powertools module that implements automatic priming and attempts to load each class listed in this file before AWS Snapstart takes a memory snapshot. If a class is not found, it will be ignored (this is the case for test classes for example).
The goal of this issue is to design and implement a mechanism that keeps the classesloaded.txt file automatically up-to-date as the project and code in each module evolves. An individual contributor should not have any knowledge about AWS Snapstart or priming techniques when contributing a change to this project. This process should be as automated as possible.
Solution/User Experience
Idea (please suggest alternatives if you have another idea)
Create a GitHub workflow that runs when a merge to main branch happens.
Workflow Steps:
- Merge to Main - Trigger on push to main branch (after PR merge)
- Checkout Code - Get the latest main branch code
-
Java Files Changed? - Check if any
.javafiles were modified in the merge - Identify Affected Powertools Modules - Determine which modules need updates
- Generate classesloaded.txt - Create the runtime classes file for each module
-
Clean Files - Apply
sedcommands as per Priming documentation- Example
sedcommand:sed 's/.*\[class,load\] \([^ ]*\) source:.*/\1/' classloaded.txt > classloaded_clean.txt
- Example
- Sort File Contents - Sort file contents to assure stable diffs
- Files Have Diff? - Check if generated files differ from existing ones
- Create Update PR - Create a new PR with the updated classesloaded.txt files (if there is a diff)
flowchart TD
A[Merge to Main Branch] --> B[Checkout Code]
B --> C{Java Files Changed?}
C -->|No| D[Stop - No Action Needed]
C -->|Yes| E[Identify Affected Powertools Modules]
E --> F[Generate classesloaded.txt for Each Module]
F --> G[Clean Affected Files Using sed Command]
G --> H[Sort File Contents to Assure Stable Diffs]
H --> I{Files Have Diff?}
I -->|No| J[Stop - No Changes]
I -->|Yes| K[Create New Branch]
K --> L[Commit Changes to New Branch]
L --> M[Create PR to Update classesloaded.txt]
M --> N[End - PR Ready for Review]
style A fill:#e1f5fe
style D fill:#ffebee
style J fill:#ffebee
style N fill:#e8f5e8
Alternative solutions
Acknowledgment
- [x] This feature request meets Powertools for AWS Lambda (Java) Tenets
- [ ] Should this be considered in other Powertools for AWS Lambda languages? i.e. Python, TypeScript, and .NET
Future readers
Please react with 👍 and your use case to help us understand customer demand.
Hi @phipag Great details. I am happy to look into this automation.
Hey @subhash686, thanks for engaging. This sounds awesome 🚀 .
Let me assign this issue to you and add it to our current iteration. Feel free to post questions and let me know if you need any assistance testing things.
Hey @subhash686,
I made a small update to my initial design proposal. I think it is better that we do not directly commit into a PR opened by a contributor. Instead, we should trigger the workflow when a merge to the main branch happens and create a new PR automatically if needed. This can be reviewed separately by a maintainer. Similar to dependabot, but for classesloaded.txt.
Hi @phipag and @subhash686, thanks for working on this. The priming pattern is super useful for customers working with Java on Lambda.
While the initial idea of mutating the PullRequest by regenerating files and including them works, I'm not a big fan of it and I think we should have an automation for this. Just for more context, we were doing something similar to this in Powertools Python and at some point all the PRs got modified with new files and it became hard to review PRs and understand responsibilities and who modified things.
That said, I think we have 2 options and I don't have preference:
1/ Create a new workflow that runs on_push to the main branch, detects the folders modified by this PR, and regenerates the classloaded.txt file.
2/ Create a workflow that runs every day at 9:00 AM, for example, and iterates over all files regenerating classloaded.txt file. In this case, you don't need to worry about detecting the changed code and will try best effort.
I'm not sure if running this script will change files that shouldn't be changed, but if not, you don't need to worry about sed/sha and other stuff, just check if some file has changed and stop the workflow if not: you can use git status --porcelain for that. But I'm might be wrong here.
Pls let me know if need any help with this workflow.
Hey @subhash686,
let me know if you still like to work on this issue. After giving it some thought, I believe it might be hard for you to test the automation in a repository fork with GitHub actions. Potentially, for this one it is easier for the maintainers to work on it.
Let me know what you think. I can propose an initial draft of automation and we can review it together as well.
I also created a full list of priming related tasks as sub-issues here in case you would like to work on a different Snapstart priming topic: https://github.com/aws-powertools/powertools-lambda-java/issues/1588
Hi @phipag I was wondering how much I could play with Github actions as a contributor. Happy to collaborate and review with you while you or other maintainers take care of it. Thanks.