elife33

Results 8 comments of elife33

GPU: + python multilayer_perceptron.py Extracting /tmp/data/train-images-idx3-ubyte.gz Extracting /tmp/data/train-labels-idx1-ubyte.gz Extracting /tmp/data/t10k-images-idx3-ubyte.gz Extracting /tmp/data/t10k-labels-idx1-ubyte.gz OpenCL platform: Apple OpenCL device: AMD Radeon R9 M370X Compute Engine I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties:...

Bitcoin mining Data courtesy CompuBench Radeon R9 M370X Mac 111.14 mHash/s CompuBench 1.5 (Bitcoin mining) Data courtesy CompuBench Core i7 4870HQ 30.62 mHash/s

Face detection Data courtesy CompuBench Radeon R9 M370X Mac 25.65 mPixels/s CompuBench 1.5 (Face detection) Core i7 4870HQ 18.08 mPixels/s

I also got a segmentation fault when I run ./bin/caffe time -model lenet.prototxt @strin @sh1r0 Have you found the solution?

希望早日能支持mac呀呀呀呀呀!!!!

I can run inference with 13B on 2x3090 24Gb with same command as @carlos-gemmell: elife@rtx:/Extra/work/lab/llama$ CUDA_VISIBLE_DEVICES="0,1" torchrun --nproc_per_node 2 example.py --ckpt_dir $TARGET_FOLDER/13B --tokenizer_path $TARGET_FOLDER/tokenizer.model WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable...

After moved .pth to SSD, the load time reduced to 48.42 seconds. The machine's RAM size is 32GB. elife@rtx:/Extra/work/lab/llama$ CUDA_VISIBLE_DEVICES="0,1" torchrun --nproc_per_node 2 example.py --ckpt_dir $TARGET_FOLDER/13B --tokenizer_path $TARGET_FOLDER/tokenizer.model WARNING:torch.distributed.run: *****************************************...

and https://github.com/tloen/llama-int8 is able to load 13B model and inference on one 3090