Model parallel v2 llama finetuning notebook fixes
Description of changes:
- Updating the model parallel v2 README to clarify usage of shared-scripts directory
- Disabling fp8 by default for backward compatibility
- Updating llma finetuning example with inline comments for FSX args and upgrade command for pytest
Testing done: Ran smp-finetuning-llama-fsdp-tp.ipynb in sagemaker notebook and ensured sagemaker training job succeded
Merge Checklist
Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.
- [x] I have read the CONTRIBUTING doc and adhered to the example notebook best practices
- [x] I have updated any necessary documentation, including READMEs
- [x] I have tested my notebook(s) and ensured it runs end-to-end
- [x] I have linted my notebook(s) and code using
black-nb -l 100 {path}/{notebook-name}.ipynb
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
You use something like:
ON_SIT|...|LINKMSG|206|.... ?
And it hapens with the V3.10 slave script?
AFAIK: We didn't touch the slave script for a long time (beside some compatibility changes for openSim) so I guess V3.00 and perhaps even V2.x is affected?
Can we test it with the "animesh slave script", because all (important) messages are queued and so it shouldn't be affected ...
We have seen something very similar while changing poses in the new alpha system for animesh adjusters.
Can you give me a hint to reproduce this?
I tested in V3.00 and yes it does happen there. It is probably in V2.01 but did not test. I believe that prior would not be an issue as card contents are not cached. Things are being run much faster now from cache.
With the new animesh issue, There were 4 sitters using SCHMO lines. The first 2 sitters were fine and sometimes the 3rd was also fine but not always. The 4 sitter just would not work properly. When poses were changed, the AV's would change to the new animation but would not move to the proper locations. Another user reports they have not seen this with their usage. I will see if I have that build in my inventory.