Dung M. Dao

Results 6 comments of Dung M. Dao

@hwchase17 This issue is still there in `0.0.171`

Same question here

Thanks for replying. I don't know but somehow it still uses 34GB even when I switched to the `fp16` branch of the HuggingFace model weights you linked to, and I...

I guess you can try inference with GPU, after making some modifications to the code: ```python llama/memory_pool.py: self.sess = ort.InferenceSession(onnxfile, providers=['CUDAExecutionProvider']) ``` Find all the files with `import onnxruntime` and...

> I am struggling to get it to run, did you already make it run? Could you please tell me how many tokens/second you get out of the 7b or...

> Hi, can you provide information about the size of files? Sure, the file size is 1.9 GB and recorded for about 25 seconds