Michael Clifford
Michael Clifford
Not sure how you plan to implement this, but sounds like it would require the addition of an ever growing set of template repo's (is that right?). Have you considered...
> wouldn't that single repo become too big/complex? e.g. the NLP stack alone already has 4 overlays. Its certainly a trade off to consider. Managing 1 complex repo vs complexity...
@Sara-KS yes, `DDPJobDefinition._dry_run(cluster)` will generate the dry_run output.
There is a parameter in the `DDPJobDefiniton()` that allows you to define mounts. see https://github.com/project-codeflare/codeflare-sdk/blob/baec8585b2bd918becd030951bf43e3504d43ada/src/codeflare_sdk/job/jobs.py#L62C11-L62C11 And the syntax should be similar to how we handle `script_args'. So something like: ```...
nvm. sorry, @dfeddema just fully read your last comment and see that you still got errors with that approach.
Since you are not using a Ray cluster for this I think you need to do the following to see the dry_run output. ``` jobdef = DDPJobDefinition(name="resnet50", script="pytorch/pytorch_imagenet_resnet50.py", script_args=arg_list, scheduler_args={"namespace":...
Thanks for pointing this out @tedhtchang. I recall we ran into this issue before, and we could not determine a regex that provide stable results, so we added a list...
@slemeur We've already started to document how to use our images for Continue https://github.com/containers/ai-lab-recipes/blob/main/recipes/natural_language_processing/code_generation/llms-vscode-integration.md But a tighter integration with AI Lab would be nice.
Do you get an error like this in the model server pod? ``` result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ � File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/llama_chat_format.py", line 247, in _convert_text_completion_chunks_to_chat for i, chunk in enumerate(chunks):...