elife33 comments

Results 8 comments of


                                            elife33

Execution speed is much slower than CPU in MacBook Pro?

GPU: + python multilayer_perceptron.py Extracting /tmp/data/train-images-idx3-ubyte.gz Extracting /tmp/data/train-labels-idx1-ubyte.gz Extracting /tmp/data/t10k-images-idx3-ubyte.gz Extracting /tmp/data/t10k-labels-idx1-ubyte.gz OpenCL platform: Apple OpenCL device: AMD Radeon R9 M370X Compute Engine I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties:...

Execution speed is much slower than CPU in MacBook Pro?

Bitcoin mining Data courtesy CompuBench Radeon R9 M370X Mac 111.14 mHash/s CompuBench 1.5 (Bitcoin mining) Data courtesy CompuBench Core i7 4870HQ 30.62 mHash/s

Execution speed is much slower than CPU in MacBook Pro?

Face detection Data courtesy CompuBench Radeon R9 M370X Mac 25.65 mPixels/s CompuBench 1.5 (Face detection) Core i7 4870HQ 18.08 mPixels/s

run `./bin/caffe train` through android adb

I also got a segmentation fault when I run ./bin/caffe time -model lenet.prototxt @strin @sh1r0 Have you found the solution?

macOS 的支持计划

希望早日能支持mac呀呀呀呀呀!!!!

Able to load 13B model on 2x3090 24Gb! But not inference... :(

I can run inference with 13B on 2x3090 24Gb with same command as @carlos-gemmell: elife@rtx:/Extra/work/lab/llama$ CUDA_VISIBLE_DEVICES="0,1" torchrun --nproc_per_node 2 example.py --ckpt_dir $TARGET_FOLDER/13B --tokenizer_path $TARGET_FOLDER/tokenizer.model WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable...

Able to load 13B model on 2x3090 24Gb! But not inference... :(

After moved .pth to SSD, the load time reduced to 48.42 seconds. The machine's RAM size is 32GB. elife@rtx:/Extra/work/lab/llama$ CUDA_VISIBLE_DEVICES="0,1" torchrun --nproc_per_node 2 example.py --ckpt_dir $TARGET_FOLDER/13B --tokenizer_path $TARGET_FOLDER/tokenizer.model WARNING:torch.distributed.run: *****************************************...

Able to load 13B model on 2x3090 24Gb! But not inference... :(

and https://github.com/tloen/llama-int8 is able to load 13B model and inference on one 3090