Sean Huver

Results 7 comments of Sean Huver

MegaMolBART (released last month) would be interesting to compare against the other generative models: https://ngc.nvidia.com/catalog/models/nvidia:clara:megamolbart

You can find the solution here: https://stackoverflow.com/questions/50677544/reflection-padding-conv2d You'll need to add the 'get_config()' super method to the ReflectionPadding2D Class. It worked for me.

I'm seeing the same on ARM for Ubuntu with a pip install (no docker). My hardware is an NVIDIA IGX Orin Devkit w/ A6000 dGPU.

llama.cpp has a neat [api_like_OAI.py](https://github.com/ggerganov/llama.cpp/blob/916a9acdd0a411426690400ebe2bb7ce840a6bba/examples/server/api_like_OAI.py#L4) drop in Flask server to mimic all the OpenAI API calls. You should be able to run that, then set ``openai.api_base = `"http://127.0.0.1:8081"```, and llama2...

@parththakor -- NVIDIA NIMs work for me. Here's my code snippet: if args.model == "NIM-llama3.1-405b": print(f"Using NVIDIA NIM API with {args.model}.") os.environ["OPENAI_API_BASE"] = "https://integrate.api.nvidia.com/v1" os.environ["OPENAI_API_KEY"] = os.environ.get("NVIDIA_API_KEY") client_model = "openai/meta/llama-3.1-405b-instruct"...