ROCm Port
Currently I can say that for regular users the CLBlast version is much easier to run. If you want the most performance, though, HIP is for you.
Remember to tweak the new settings LLAMA_CUDA_DMMV_X and LLAMA_CUDA_DMMV_Y
I get the best result with 64 and 4, for example.
Note for unsupported GPU users:
You need to use an environment variable to force ROCm to run:
export HSA_OVERRIDE_GFX_VERSION=10.3.0
This will make it work in the currently running shell, after that ./main and other llama.cpp commands will run.
rocBLAS is only released for a limited number of GPUs: gfx900 gfx906 gfx908 gfx90a gfx1030
If you look in /opt/rocm/lib/rocblas/library/ you should see a lot of files, but only for some GPUs, for others you need to find something that is close enough, like gfx1030 instead of gfx1033, and then that becomes 10.3.0 for the environment variable.
ROCm port
I just define all the cudaXxx functions to hipXxx etc. This may seem stupidly simple but it's exactly the same kind of trick AMD uses to make HIP code compile with nvcc, you can see it in /opt/rocm/include/hip/nvidia_detail/nvidia_hip_runtime_api.h (for some reason I can't find the source for this anywhere online but it has a free license, so if you want, I can post it).
HIP can also compile the Cuda kernel programs without any major modifications, just some header stuff.
Compiling
To this, you need the ROCm developer kit and hipBLAS which may be a separate package.
With CMake I have to invoke:
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake -DLLAMA_HIPBLAS=ON
It is probably unavoidable to use the LLVM Clang compiler You can use the ROCm included one or the system one, but mixing it with GCC objects is just asking for trouble.
Makefile should work, too, pass in LLAMA_HIPBLAS=1. You can use the env variable ROCM_PATH if ROCm is not installed at /opt/rocm:
make -j4 LLAMA_HIPBLAS=1
Makefile will override the compilers to ROCm LLVM, so it should be a simple command to compile. But you should be able to override the compilers on the make command line.
Docker
Probably the best option right now is using Docker with AMD's images:
FROM rocm/dev-ubuntu-22.04:5.5-complete AS build
WORKDIR /app
COPY . ./
RUN make LLAMA_HIPBLAS=1
ENV PATH="/app:$PATH"
CMD [ "main" ]
Save it somewhere as rocm.Dockerfile then in llama.cpp's source do:
docker build -f /path/to/rocm.Dockerfile . -t llama.cpp:rocm
Then run it like this:
docker run --rm -it --init \
--device /dev/dri --device /dev/kfd \
-v/my/models:/models llama.cpp:rocm \
main -m /models/llama-7b-q4_2.bin -p "$(cat prompts/dan.txt)"
You can also add the override like this: -e HSA_OVERRIDE_GFX_VERSION=10.3.0
All the commands are there besides main, you can also run /bin/bash for a dev shell, mount the llama.cpp source somewhere and use it for development. It is a bit of a thick image, for end users, maybe too big, I want to trim it down but the AMD stuff is bloated.
What's up with the compilers?
Regarding hipcc, it is not really a compiler, I had a lot of problems with it, it couldn't compile and link .cpp and .o files together (like hipcc main.cpp llama.o ggml.o ...). If you open it in a text editor you see it's a Perl script and all it does is provide some default flags for the Clang compiler. It might work in CMake, since CMake always compiles to objects first.
It shouldn't be a requirement to use AMD's version of Clang, it is possible to use any normal Clang or LLVM (maybe even Zig?) to compile the device code. In the CMake build I added a warning if the compiler is not Clang but it won't stop you from experimenting (well, it will probably fail to compile the .cu file).
If you use VS Code then the C/C++ plugin doesn't support HIP correctly, it sees in compileCommands.json (part of CMake's output) that the .cu file is using a language argument -x hip and it doesn't know what that is, so the whole file is locked to the C language even if it's actually C++ and you'll see some red squiggles. This flag comes from the hip::device package in CMake.
In CMake it is harder to use different compilers in the same project (may need to use a subdirectory) than in Make, so currently the .cu file is handled as a C++ file and compiled with the rest of the C++ files, this is what AMD's vision is with HIP -- they should just be normal C++ files.
I also tried adding another language, HIP enable_language(HIP), to CMake but I had some trouble getting the CMake to configure in all environments consistently, maybe it it needs some package that was missing in the container. In this case, it would work more similar to Cuda: I can define the .cu file's language to be HIP, whatever compiler configured for HIP compiles it and a compiler that can link it correctly will link it to an executable. When it was working on Arch, it configured it automatically like: CMAKE_CXX_COMPILER=/usr/bin/g++ and CMAKE_HIP_COMPILER=/usr/bin/clang++ and it was working correctly, using the HIP compliler to link in the end. This would be the ideal solution, it would give the user the most control over the config -- if I got it to work, that is :stuck_out_tongue_winking_eye:. If someone more experienced with this knows how to do it, please go ahead.
For the Makefile I thought it would be easier to override the compilers, because it is supposed to be more beginner friendly and you can get a result in one command (that is if everything is installed properly). But it has some variables also.
What does hipBLAS do?
hipBLAS is just basically a wrapper around rocBLAS or cuBLAS. Well, all of HIP is supposed to be.
I have started moving all the cuda specific stuff to ggml-cuda.h/cu in ROCm/rocBLAS#1094, you could also move all the HIP stuff to ggml-cuda.h to keep ggml.c a bit more clean. If this works well, it could be a nice way to support AMD GPUs. Do you have any performance numbers?
I'll try to rebase on your code. As for perf, it's about 38 ms for 7B, GPU is Vega64 What's the best way to do a measurement?
Either the perplexity time per pass or the prompt eval times with a big prompt seems good enough to measure performance, that's what I have been doing anyway. Use --no-mmap to make sure that there isn't any loading happening in the first eval.
7b-q4_0: 15.21 seconds per pass - ETA 2.77 hours
7b-f16: 16.30 seconds per pass - ETA 2.97 hours
13b-q4_0: 19.60 seconds per pass - ETA 3.57 hours
30b-q4_0: 29.70 seconds per pass - ETA 5.40 hours
GPU is used at about 30%, VRAM 2G
I'm now building it in AMD's official Docker image and it is giving me double the performance... 🤯
7b-q4_0: 5.84 seconds per pass - ETA 1.06 hours
7b-f16: 6.47 seconds per pass - ETA 1.18 hours
13b-q4_0: 9.89 seconds per pass - ETA 1.80 hours
30b-q4_0: 20.40 seconds per pass - ETA 3.71 hours
This is the rocprof trace from the Docker image:

And this one from the Arch:

It just seems faster because it loads the BLAS libraries faster.
@slaren can you check in Cuda, currently --memory_f32 is broken for me.
--memory_f32 seems to work fine for me with Cuda, I couldn't notice any issues.
Thank you for the great work.
~~Currently Perplexity not working in that PR.~~ ~~Running perplexity and it stuck after show the 655chunks, batch_size=512 GPU is still working. Let me try to wait more time for that...~~ @SlyEcho it's work, sorry I didn't make it because haven't deleted all the other flags.
Llama 30B Q4_2 F32: ETA 9h28m [1] 3.2521, [2] 3.6665, [3] 4.3870 [4] 4.3477, [5] 4.2213 Without F32 flag: ETA 8h47m [1] 3.2520, [2] 3.6665, [3] 4.3869, [4] 4.3476, [5] 4.2213, [6] 4.2205, [7] 4.4011, [8] 4.4856, [9] 4.7332, [10] 4.9523, [11] 5.1126, [12] 5.1601, [13] 5.1378 [14] 5.2206 [15] 5.3794 ......[100] 4.3098 and I decide to abort it 😅 Until now, compare with 30b Q4_1 result in the discussion post, it's keep accurate and perform better. 30b q4_1 result by Jason Titus
And my pc's running test with rocm suit 5.4.2 is below: 30b llama Q4_2 Running with DAN Master 50cb666 OpenBlas: real 1m29.206 User 14m47.035 Sys 6m13.047
Master 50cb666 with your ggml.c and ggml-cuda.cu
Hipblas: Real 0m57.723 User 7m23.156 Sys 0m3.356
Meanwhile maybe it's better to mention CXX also need to be changed to hipcc
Peak vram usage about 1.4 G, while running perplexity is about 2 G.
@slaren can you check in Cuda, currently
--memory_f32is broken for me.
This --memory_f32 is Working with gfx1035 (HSA gfx1030) indeed the vega integrated gpu 680M
More detail: I didn't set cxx=clang, but cxx=hipcc. Maybe that's the reason?
I think the issue with --memory_f32 is resolved for me at least. I will try to to a perplexity run.
Bonus picture, running on a Steam Desk with Steam OS. I have installed containerd so I don't have to install any ROCm stuff.

To achieve this, the env var HSA_OVERRIDE_GFX_VERSION=10.3.0 was used and llama.cpp built with GPU_TARGETS=gfx1030 because the native gfx1033 is not supported by rocBLAS yet. I'm sure a properly tuned rocBLAS build could be faster. Note that I have changed the GPU VRAM split in the BIOS to 4GB.
hipBLAS eval (plugged in 🔌): 49 ms per token. CPU eval (🔋🔌): 118 ms per token. OpenBLAS eval (🔋🔌): 84 ms per token.
I was trying to make it work on HIP too (here is my fork https://github.com/DGdev91/llama.cpp) but i wasn't able to make it work, it was stuck after showing the "llama_model_load_internal" rows. I have the same problem with this code, so i guess the issue wasn't in the code, but in my own setup. Also, this solution is indeed much cleaner than mine, so let's just work on this PR. My GPU is a RX 5700xt, and i use HSA_OVERRIDE_GFX_VERSION=10.3.0 too, it's a common workaround also for pytorch-related programs, like StableDiffusion.
Any idea on how can i try to figure out what is going on?
@DGdev91 that means it is crashing when trying to initialize HIP or hipBLAS.
What compiler did you use? The hipcc perl script is probably legacy and the integrated LLVM is the way to go, also the program should be linked with it and not GCC.
What is the GPU target that you used? Should be --offload-arch=gfx1030 if you want to use that.
The CMake file seems to be just broken.
EDIT: I forgot to mention, but when I managed to compile your code, it was running fine on the GPU :smiley:
@DGdev91 that means it is crashing when trying to initialize HIP or hipBLAS.
What compiler did you use? The
hipccperl script is probably legacy and the integrated LLVM is the way to go, also the program should be linked with it and not GCC.What is the GPU target that you used? Should be
--offload-arch=gfx1030if you want to use that.The CMake file seems to be just broken.
EDIT: I forgot to mention, but when I managed to compile your code, it was running fine on the GPU smiley
You are right, but forget my fork, it was just an experiment. i already said i prefer your solution, and i had the same exact issue even there. If my code worked for you (after correcting the makefile) we have another confirm it's an issue on my end. what is really wierd, it works just fine with StableDiffusion
I suspect it has something to do with the GPU architecture that is being built. My Makefile changes will detect the GPU of your system but that may not work if you're overriding it on the command line. On the Steam Deck I had to build it for one specific one (gfx1030) because that's the one rocBLAS supports.
This is something that should happen automatically and not be on the user to fix. I need to figure it out.
I suspect it has something to do with the GPU architecture that is being built. My Makefile changes will detect the GPU of your system but that may not work if you're overriding it on the command line. On the Steam Deck I had to build it for one specific one (gfx1030) because that's the one rocBLAS supports.
This is something that should happen automatically and not be on the user to fix. I need to figure it out.
I compiled it with make LLAMA_HIPBLAS=1 GPU_TARGETS=gfx1030 and launched export HSA_OVERRIDE_GFX_VERSION=10.3.0 before launching main. There must be something else.
Perplexity Testing for hipBLAS version
Code
Commit: 3a004b2a0166e412d8d54052c50bfd093611ad95
Models
I should mention that the Q4_0 models were converted some time ago so I don't know if they are "fresh" with the latest quantization fixes. The other ones I made recently from the F16 version.
find models -name 'llama-7b-*.bin' -exec sem -j4 shasum {} ';'
8c5fe788ceaf8077e505f8f43efaa8f8cfd6e3eb models/llama-7b-q4_0.bin
80a9d0bdf85dcddc83533a3aecf70eb9c542fdfa models/llama-7b-q4_2.bin
da9ebf470350d8912caa04bf54fc6aced8d9ef19 models/llama-7b-q4_1.bin
1cbe22cfd2600f4e3b2d247ed1b82504cde3be78 models/llama-7b-q4_3.bin
0512fdf961215612db5a47cb1f6539c55936523c models/llama-7b-f16.bin
Hardware
CPU: Intel Core i7 7700K (4c/8t), 4.7 GHz (OC) RAM: 32 GB DDR4, 2666 MT/s GPU: AMD Radeon Vega64 (8GB)
Arch Linux testing with:
OS: Arch Linux 6.2.11-arch1-1 BLAS: OpenBLAS 0.3.23-1 ROCm: 5.4.3
AMD official Docker with this Dockerfile:
rocm.Dockerfile
FROM rocm/dev-ubuntu-22.04
ARG GPU_TARGETS="gfx900"
ARG MAKE_JOBS=4
RUN apt-get update && \
apt-get --no-install-recommends install -y hipblas-dev
WORKDIR /app
COPY . ./
RUN make \
LLAMA_HIPBLAS=1 \
GPU_TARGETS="$GPU_TARGETS" \
-j $MAKE_JOBS \
main perplexity
STOPSIGNAL SIGKILL
ENV PATH="/app:$PATH"
CMD [ "main" ]
Compile with:
docker build -f ~/Desktop/rocm.Dockerfile . -t llama.cpp:rocm
Results
7B Q4_0, Arch: [655]6.2818
./build/bin/perplexity --no-mmap -m ./models/llama-7b-q4_0.bin -f ./models/wiki.test.raw
main: seed = 1682276609
llama.cpp: loading model from ./models/llama-7b-q4_0.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required = 5809.32 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
15.40 seconds per pass - ETA 2 hours 48 minutes
[1]4.3749,[2]4.9540,[3]5.8254,[4]6.4669,[5]6.5409,[6]6.5395,[7]6.7155,[8]6.8046,[9]7.1737,[10]7.4103,[11]7.6549,[12]7.6926,[13]7.6022,[14]7.6783,[15]7.9331,[16]7.5386,[17]7.4157,[18]7.3768,[19]7.0052,[20]6.9921,[21]6.8947,[22]6.7102,[23]6.6723,[24]6.5850,[25]6.5848,[26]6.4125,[27]6.2326,[28]6.1317,[29]6.0477,[30]5.8916,[31]5.8634,[32]5.8812,[33]5.8164,[34]5.8511,[35]5.8769,[36]5.9208,[37]5.9247,[38]5.9419,[39]5.9800,[40]6.0387,[41]6.0458,[42]6.0802,[43]6.0373,[44]6.0921,[45]6.0965,[46]6.0707,[47]6.0944,[48]6.0652,[49]6.0722,[50]6.0328,[51]6.0287,[52]6.0177,[53]6.0619,[54]6.0454,[55]6.0230,[56]6.0572,[57]6.0803,[58]6.1021,[59]6.1159,[60]6.1624,[61]6.1512,[62]6.2143,[63]6.2479,[64]6.2630,[65]6.3095,[66]6.3197,[67]6.3378,[68]6.3518,[69]6.3767,[70]6.4090,[71]6.4305,[72]6.4602,[73]6.5254,[74]6.5308,[75]6.5453,[76]6.5616,[77]6.5749,[78]6.5597,[79]6.5892,[80]6.5817,[81]6.5943,[82]6.5980,[83]6.5443,[84]6.5297,[85]6.5182,[86]6.4971,[87]6.4318,[88]6.4033,[89]6.3827,[90]6.3661,[91]6.3922,[92]6.3884,[93]6.3909,[94]6.3884,[95]6.4171,[96]6.4150,[97]6.4078,[98]6.4007,[99]6.3867,[100]6.3867,[101]6.4126,[102]6.4062,[103]6.4280,[104]6.4347,[105]6.4333,[106]6.4510,[107]6.4497,[108]6.4621,[109]6.4568,[110]6.4523,[111]6.4751,[112]6.4941,[113]6.4955,[114]6.4921,[115]6.5003,[116]6.4930,[117]6.4985,[118]6.5270,[119]6.5479,[120]6.5844,[121]6.6007,[122]6.6254,[123]6.6644,[124]6.6822,[125]6.6735,[126]6.7126,[127]6.7497,[128]6.7772,[129]6.7603,[130]6.7699,[131]6.7646,[132]6.7558,[133]6.7430,[134]6.7542,[135]6.7507,[136]6.7376,[137]6.7296,[138]6.7125,[139]6.7009,[140]6.6979,[141]6.6681,[142]6.6633,[143]6.6354,[144]6.6153,[145]6.6066,[146]6.5931,[147]6.6006,[148]6.6029,[149]6.5969,[150]6.5928,[151]6.5940,[152]6.5845,[153]6.5678,[154]6.5587,[155]6.5655,[156]6.5605,[157]6.5789,[158]6.5824,[159]6.5866,[160]6.5892,[161]6.6017,[162]6.5716,[163]6.5595,[164]6.5334,[165]6.5015,[166]6.4728,[167]6.4354,[168]6.4028,[169]6.3893,[170]6.3768,[171]6.3479,[172]6.3298,[173]6.3113,[174]6.2805,[175]6.2584,[176]6.2482,[177]6.2271,[178]6.2036,[179]6.1864,[180]6.1775,[181]6.1551,[182]6.1359,[183]6.1217,[184]6.1215,[185]6.1142,[186]6.1160,[187]6.1214,[188]6.1178,[189]6.1362,[190]6.1371,[191]6.1575,[192]6.1738,[193]6.1916,[194]6.2032,[195]6.2242,[196]6.2412,[197]6.2633,[198]6.2788,[199]6.2818,[200]6.2863,[201]6.2822,[202]6.3027,[203]6.3093,[204]6.3092,[205]6.3201,[206]6.3279,[207]6.3239,[208]6.3323,[209]6.3375,[210]6.3426,[211]6.3524,[212]6.3598,[213]6.3704,[214]6.3739,[215]6.3780,[216]6.3927,[217]6.4106,[218]6.4241,[219]6.4244,[220]6.4208,[221]6.4145,[222]6.4110,[223]6.4002,[224]6.3935,[225]6.3888,[226]6.4103,[227]6.4191,[228]6.4249,[229]6.4317,[230]6.4273,[231]6.4441,[232]6.4311,[233]6.4140,[234]6.3983,[235]6.3825,[236]6.3747,[237]6.3644,[238]6.3678,[239]6.3516,[240]6.3413,[241]6.3446,[242]6.3483,[243]6.3468,[244]6.3348,[245]6.3322,[246]6.3201,[247]6.3077,[248]6.3010,[249]6.2989,[250]6.3037,[251]6.2960,[252]6.2927,[253]6.2824,[254]6.2784,[255]6.2668,[256]6.2477,[257]6.2366,[258]6.2279,[259]6.2259,[260]6.2178,[261]6.2135,[262]6.2076,[263]6.2030,[264]6.1838,[265]6.1829,[266]6.1814,[267]6.1745,[268]6.1842,[269]6.1822,[270]6.1828,[271]6.1906,[272]6.1952,[273]6.1948,[274]6.1962,[275]6.2052,[276]6.2107,[277]6.2267,[278]6.2375,[279]6.2461,[280]6.2497,[281]6.2596,[282]6.2656,[283]6.2803,[284]6.2881,[285]6.2975,[286]6.3122,[287]6.3116,[288]6.3176,[289]6.3085,[290]6.2934,[291]6.2780,[292]6.2622,[293]6.2484,[294]6.2509,[295]6.2503,[296]6.2547,[297]6.2533,[298]6.2559,[299]6.2531,[300]6.2418,[301]6.2419,[302]6.2339,[303]6.2262,[304]6.2184,[305]6.2159,[306]6.2027,[307]6.2051,[308]6.2084,[309]6.1921,[310]6.1860,[311]6.1796,[312]6.1818,[313]6.1762,[314]6.1749,[315]6.1584,[316]6.1541,[317]6.1375,[318]6.1159,[319]6.1278,[320]6.1408,[321]6.1446,[322]6.1401,[323]6.1335,[324]6.1310,[325]6.1410,[326]6.1410,[327]6.1431,[328]6.1473,[329]6.1533,[330]6.1559,[331]6.1682,[332]6.1651,[333]6.1720,[334]6.1662,[335]6.1597,[336]6.1635,[337]6.1605,[338]6.1592,[339]6.1534,[340]6.1491,[341]6.1568,[342]6.1593,[343]6.1648,[344]6.1648,[345]6.1647,[346]6.1619,[347]6.1666,[348]6.1708,[349]6.1726,[350]6.1692,[351]6.1698,[352]6.1698,[353]6.1646,[354]6.1644,[355]6.1699,[356]6.1729,[357]6.1693,[358]6.1783,[359]6.1814,[360]6.1777,[361]6.1772,[362]6.1839,[363]6.1951,[364]6.2016,[365]6.2074,[366]6.2081,[367]6.2169,[368]6.2147,[369]6.2156,[370]6.2166,[371]6.2106,[372]6.2159,[373]6.2215,[374]6.2202,[375]6.2198,[376]6.2282,[377]6.2233,[378]6.2259,[379]6.2319,[380]6.2235,[381]6.2192,[382]6.2135,[383]6.2125,[384]6.2118,[385]6.2105,[386]6.2100,[387]6.2092,[388]6.2047,[389]6.1993,[390]6.1924,[391]6.1843,[392]6.1803,[393]6.1784,[394]6.1810,[395]6.1793,[396]6.1720,[397]6.1795,[398]6.1833,[399]6.1916,[400]6.1912,[401]6.1926,[402]6.1932,[403]6.1950,[404]6.2014,[405]6.1918,[406]6.1884,[407]6.1877,[408]6.1887,[409]6.2011,[410]6.2121,[411]6.2246,[412]6.2408,[413]6.2524,[414]6.2599,[415]6.2652,[416]6.2732,[417]6.2863,[418]6.2897,[419]6.2971,[420]6.3058,[421]6.3179,[422]6.3236,[423]6.3308,[424]6.3428,[425]6.3519,[426]6.3583,[427]6.3628,[428]6.3711,[429]6.3756,[430]6.3846,[431]6.3992,[432]6.4035,[433]6.4022,[434]6.3976,[435]6.3983,[436]6.4008,[437]6.4102,[438]6.4181,[439]6.4145,[440]6.4140,[441]6.4089,[442]6.4080,[443]6.4093,[444]6.4096,[445]6.4076,[446]6.4100,[447]6.4129,[448]6.4172,[449]6.4145,[450]6.4148,[451]6.4106,[452]6.3987,[453]6.3904,[454]6.3843,[455]6.3851,[456]6.3898,[457]6.3915,[458]6.3894,[459]6.3903,[460]6.3990,[461]6.3962,[462]6.3946,[463]6.3997,[464]6.3988,[465]6.3957,[466]6.3877,[467]6.3879,[468]6.3878,[469]6.3900,[470]6.3905,[471]6.3858,[472]6.3904,[473]6.3848,[474]6.3861,[475]6.3802,[476]6.3826,[477]6.3754,[478]6.3745,[479]6.3808,[480]6.3860,[481]6.3880,[482]6.3834,[483]6.3793,[484]6.3815,[485]6.3798,[486]6.3743,[487]6.3743,[488]6.3724,[489]6.3674,[490]6.3647,[491]6.3616,[492]6.3558,[493]6.3528,[494]6.3510,[495]6.3507,[496]6.3473,[497]6.3419,[498]6.3402,[499]6.3351,[500]6.3255,[501]6.3185,[502]6.3184,[503]6.3181,[504]6.3088,[505]6.3113,[506]6.3122,[507]6.3060,[508]6.3018,[509]6.3007,[510]6.3046,[511]6.3092,[512]6.3127,[513]6.3145,[514]6.3212,[515]6.3156,[516]6.3149,[517]6.3159,[518]6.3160,[519]6.3190,[520]6.3218,[521]6.3234,[522]6.3263,[523]6.3273,[524]6.3336,[525]6.3373,[526]6.3385,[527]6.3405,[528]6.3351,[529]6.3355,[530]6.3308,[531]6.3298,[532]6.3347,[533]6.3370,[534]6.3351,[535]6.3374,[536]6.3320,[537]6.3297,[538]6.3345,[539]6.3357,[540]6.3397,[541]6.3405,[542]6.3412,[543]6.3426,[544]6.3438,[545]6.3417,[546]6.3423,[547]6.3378,[548]6.3323,[549]6.3325,[550]6.3298,[551]6.3260,[552]6.3239,[553]6.3197,[554]6.3175,[555]6.3146,[556]6.3143,[557]6.3166,[558]6.3126,[559]6.3122,[560]6.3117,[561]6.3118,[562]6.3100,[563]6.3100,[564]6.3143,[565]6.3160,[566]6.3157,[567]6.3135,[568]6.3140,[569]6.3124,[570]6.3150,[571]6.3156,[572]6.3166,[573]6.3168,[574]6.3132,[575]6.3127,[576]6.3126,[577]6.3115,[578]6.3095,[579]6.3103,[580]6.3037,[581]6.2999,[582]6.2989,[583]6.2997,[584]6.3001,[585]6.2924,[586]6.2856,[587]6.2858,[588]6.2908,[589]6.2966,[590]6.2996,[591]6.3018,[592]6.3003,[593]6.2966,[594]6.2977,[595]6.2954,[596]6.2991,[597]6.2967,[598]6.2930,[599]6.2952,[600]6.2949,[601]6.2935,[602]6.2952,[603]6.2982,[604]6.2992,[605]6.3025,[606]6.3045,[607]6.3029,[608]6.2993,[609]6.3000,[610]6.3036,[611]6.3018,[612]6.3043,[613]6.3007,[614]6.2956,[615]6.2879,[616]6.2909,[617]6.2846,[618]6.2794,[619]6.2738,[620]6.2595,[621]6.2523,[622]6.2506,[623]6.2521,[624]6.2525,[625]6.2525,[626]6.2510,[627]6.2531,[628]6.2536,[629]6.2533,[630]6.2567,[631]6.2631,[632]6.2685,[633]6.2668,[634]6.2701,[635]6.2707,[636]6.2674,[637]6.2640,[638]6.2666,[639]6.2637,[640]6.2647,[641]6.2650,[642]6.2718,[643]6.2740,[644]6.2752,[645]6.2731,[646]6.2774,[647]6.2735,[648]6.2742,[649]6.2743,[650]6.2782,[651]6.2839,[652]6.2846,[653]6.2889,[654]6.2825,[655]6.2818,
llama_print_timings: load time = 16830.86 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 3533301.29 ms / 335360 tokens ( 10.54 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 3572772.20 ms
7B Q4_0 --memory_f32, Arch: [655]6.2838,
./build/bin/perplexity --no-mmap -m ./models/llama-7b-q4_0.bin --memory_f32 -f ./models/wiki.test.raw
main: seed = 1682280920
llama.cpp: loading model from ./models/llama-7b-q4_0.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required = 5809.32 MB (+ 2052.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 512.00 MB
system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
15.63 seconds per pass - ETA 2 hours 50 minutes
[1]4.3801,[2]4.9556,[3]5.8270,[4]6.4693,[5]6.5437,[6]6.5414,[7]6.7176,[8]6.8070,[9]7.1757,[10]7.4122,[11]7.6567,[12]7.6957,[13]7.6057,[14]7.6821,[15]7.9367,[16]7.5419,[17]7.4189,[18]7.3798,[19]7.0077,[20]6.9948,[21]6.8969,[22]6.7124,[23]6.6743,[24]6.5868,[25]6.5871,[26]6.4149,[27]6.2349,[28]6.1341,[29]6.0498,[30]5.8938,[31]5.8659,[32]5.8839,[33]5.8189,[34]5.8538,[35]5.8795,[36]5.9233,[37]5.9272,[38]5.9444,[39]5.9825,[40]6.0413,[41]6.0483,[42]6.0827,[43]6.0398,[44]6.0944,[45]6.0989,[46]6.0730,[47]6.0968,[48]6.0675,[49]6.0746,[50]6.0352,[51]6.0310,[52]6.0201,[53]6.0642,[54]6.0477,[55]6.0251,[56]6.0595,[57]6.0825,[58]6.1044,[59]6.1183,[60]6.1648,[61]6.1536,[62]6.2166,[63]6.2503,[64]6.2654,[65]6.3119,[66]6.3221,[67]6.3402,[68]6.3541,[69]6.3791,[70]6.4114,[71]6.4328,[72]6.4626,[73]6.5277,[74]6.5331,[75]6.5475,[76]6.5638,[77]6.5771,[78]6.5619,[79]6.5915,[80]6.5839,[81]6.5968,[82]6.6005,[83]6.5468,[84]6.5323,[85]6.5209,[86]6.4997,[87]6.4344,[88]6.4059,[89]6.3854,[90]6.3688,[91]6.3949,[92]6.3910,[93]6.3936,[94]6.3911,[95]6.4198,[96]6.4178,[97]6.4106,[98]6.4036,[99]6.3896,[100]6.3896,[101]6.4155,[102]6.4091,[103]6.4309,[104]6.4376,[105]6.4362,[106]6.4539,[107]6.4526,[108]6.4649,[109]6.4596,[110]6.4551,[111]6.4779,[112]6.4970,[113]6.4984,[114]6.4950,[115]6.5033,[116]6.4959,[117]6.5014,[118]6.5299,[119]6.5508,[120]6.5872,[121]6.6035,[122]6.6283,[123]6.6673,[124]6.6850,[125]6.6763,[126]6.7154,[127]6.7524,[128]6.7799,[129]6.7630,[130]6.7725,[131]6.7673,[132]6.7584,[133]6.7457,[134]6.7568,[135]6.7534,[136]6.7402,[137]6.7322,[138]6.7151,[139]6.7035,[140]6.7005,[141]6.6707,[142]6.6659,[143]6.6379,[144]6.6178,[145]6.6092,[146]6.5957,[147]6.6031,[148]6.6054,[149]6.5994,[150]6.5953,[151]6.5965,[152]6.5870,[153]6.5703,[154]6.5613,[155]6.5680,[156]6.5630,[157]6.5813,[158]6.5849,[159]6.5890,[160]6.5916,[161]6.6041,[162]6.5739,[163]6.5619,[164]6.5357,[165]6.5039,[166]6.4751,[167]6.4377,[168]6.4051,[169]6.3916,[170]6.3791,[171]6.3502,[172]6.3322,[173]6.3136,[174]6.2829,[175]6.2607,[176]6.2505,[177]6.2295,[178]6.2059,[179]6.1887,[180]6.1798,[181]6.1574,[182]6.1382,[183]6.1240,[184]6.1238,[185]6.1165,[186]6.1182,[187]6.1237,[188]6.1200,[189]6.1384,[190]6.1393,[191]6.1597,[192]6.1761,[193]6.1938,[194]6.2054,[195]6.2264,[196]6.2434,[197]6.2655,[198]6.2811,[199]6.2840,[200]6.2886,[201]6.2844,[202]6.3049,[203]6.3115,[204]6.3114,[205]6.3224,[206]6.3302,[207]6.3262,[208]6.3347,[209]6.3398,[210]6.3449,[211]6.3547,[212]6.3621,[213]6.3727,[214]6.3763,[215]6.3803,[216]6.3951,[217]6.4129,[218]6.4264,[219]6.4267,[220]6.4231,[221]6.4168,[222]6.4133,[223]6.4024,[224]6.3958,[225]6.3910,[226]6.4126,[227]6.4212,[228]6.4271,[229]6.4338,[230]6.4294,[231]6.4463,[232]6.4332,[233]6.4160,[234]6.4004,[235]6.3846,[236]6.3768,[237]6.3664,[238]6.3698,[239]6.3536,[240]6.3433,[241]6.3466,[242]6.3504,[243]6.3488,[244]6.3368,[245]6.3342,[246]6.3221,[247]6.3098,[248]6.3030,[249]6.3010,[250]6.3057,[251]6.2981,[252]6.2947,[253]6.2844,[254]6.2804,[255]6.2688,[256]6.2497,[257]6.2386,[258]6.2299,[259]6.2279,[260]6.2197,[261]6.2154,[262]6.2095,[263]6.2050,[264]6.1858,[265]6.1850,[266]6.1835,[267]6.1766,[268]6.1863,[269]6.1843,[270]6.1850,[271]6.1928,[272]6.1974,[273]6.1969,[274]6.1983,[275]6.2073,[276]6.2128,[277]6.2288,[278]6.2397,[279]6.2483,[280]6.2518,[281]6.2617,[282]6.2678,[283]6.2825,[284]6.2902,[285]6.2997,[286]6.3144,[287]6.3138,[288]6.3198,[289]6.3107,[290]6.2956,[291]6.2802,[292]6.2644,[293]6.2505,[294]6.2530,[295]6.2524,[296]6.2567,[297]6.2553,[298]6.2579,[299]6.2551,[300]6.2439,[301]6.2440,[302]6.2359,[303]6.2282,[304]6.2204,[305]6.2180,[306]6.2047,[307]6.2072,[308]6.2104,[309]6.1941,[310]6.1880,[311]6.1816,[312]6.1838,[313]6.1782,[314]6.1769,[315]6.1604,[316]6.1562,[317]6.1395,[318]6.1179,[319]6.1298,[320]6.1428,[321]6.1466,[322]6.1422,[323]6.1355,[324]6.1331,[325]6.1431,[326]6.1430,[327]6.1451,[328]6.1494,[329]6.1554,[330]6.1579,[331]6.1703,[332]6.1671,[333]6.1741,[334]6.1682,[335]6.1618,[336]6.1655,[337]6.1625,[338]6.1612,[339]6.1555,[340]6.1511,[341]6.1589,[342]6.1614,[343]6.1669,[344]6.1668,[345]6.1667,[346]6.1638,[347]6.1686,[348]6.1727,[349]6.1746,[350]6.1712,[351]6.1717,[352]6.1717,[353]6.1665,[354]6.1664,[355]6.1718,[356]6.1749,[357]6.1712,[358]6.1802,[359]6.1833,[360]6.1795,[361]6.1791,[362]6.1858,[363]6.1970,[364]6.2035,[365]6.2093,[366]6.2100,[367]6.2188,[368]6.2166,[369]6.2175,[370]6.2185,[371]6.2125,[372]6.2178,[373]6.2234,[374]6.2221,[375]6.2217,[376]6.2301,[377]6.2252,[378]6.2278,[379]6.2338,[380]6.2254,[381]6.2211,[382]6.2154,[383]6.2144,[384]6.2137,[385]6.2124,[386]6.2119,[387]6.2111,[388]6.2066,[389]6.2012,[390]6.1943,[391]6.1862,[392]6.1822,[393]6.1803,[394]6.1828,[395]6.1812,[396]6.1738,[397]6.1814,[398]6.1852,[399]6.1935,[400]6.1931,[401]6.1945,[402]6.1950,[403]6.1969,[404]6.2032,[405]6.1937,[406]6.1903,[407]6.1895,[408]6.1905,[409]6.2029,[410]6.2139,[411]6.2264,[412]6.2427,[413]6.2542,[414]6.2618,[415]6.2670,[416]6.2750,[417]6.2881,[418]6.2916,[419]6.2990,[420]6.3077,[421]6.3197,[422]6.3255,[423]6.3326,[424]6.3446,[425]6.3537,[426]6.3602,[427]6.3647,[428]6.3730,[429]6.3775,[430]6.3865,[431]6.4011,[432]6.4054,[433]6.4041,[434]6.3995,[435]6.4002,[436]6.4027,[437]6.4121,[438]6.4200,[439]6.4164,[440]6.4158,[441]6.4108,[442]6.4099,[443]6.4112,[444]6.4115,[445]6.4095,[446]6.4118,[447]6.4147,[448]6.4191,[449]6.4164,[450]6.4167,[451]6.4124,[452]6.4006,[453]6.3922,[454]6.3862,[455]6.3869,[456]6.3917,[457]6.3934,[458]6.3912,[459]6.3922,[460]6.4009,[461]6.3981,[462]6.3965,[463]6.4016,[464]6.4007,[465]6.3976,[466]6.3895,[467]6.3898,[468]6.3897,[469]6.3919,[470]6.3924,[471]6.3876,[472]6.3923,[473]6.3866,[474]6.3880,[475]6.3821,[476]6.3844,[477]6.3773,[478]6.3764,[479]6.3827,[480]6.3879,[481]6.3899,[482]6.3854,[483]6.3813,[484]6.3835,[485]6.3818,[486]6.3763,[487]6.3763,[488]6.3744,[489]6.3694,[490]6.3668,[491]6.3637,[492]6.3579,[493]6.3549,[494]6.3531,[495]6.3528,[496]6.3493,[497]6.3440,[498]6.3422,[499]6.3372,[500]6.3275,[501]6.3206,[502]6.3204,[503]6.3202,[504]6.3109,[505]6.3134,[506]6.3143,[507]6.3081,[508]6.3038,[509]6.3027,[510]6.3067,[511]6.3113,[512]6.3148,[513]6.3166,[514]6.3233,[515]6.3177,[516]6.3169,[517]6.3180,[518]6.3181,[519]6.3211,[520]6.3238,[521]6.3255,[522]6.3284,[523]6.3294,[524]6.3357,[525]6.3394,[526]6.3406,[527]6.3426,[528]6.3372,[529]6.3376,[530]6.3329,[531]6.3319,[532]6.3368,[533]6.3391,[534]6.3372,[535]6.3395,[536]6.3341,[537]6.3318,[538]6.3366,[539]6.3378,[540]6.3417,[541]6.3426,[542]6.3433,[543]6.3447,[544]6.3459,[545]6.3437,[546]6.3444,[547]6.3398,[548]6.3343,[549]6.3345,[550]6.3318,[551]6.3280,[552]6.3260,[553]6.3217,[554]6.3195,[555]6.3166,[556]6.3163,[557]6.3186,[558]6.3146,[559]6.3142,[560]6.3137,[561]6.3139,[562]6.3120,[563]6.3120,[564]6.3163,[565]6.3180,[566]6.3177,[567]6.3155,[568]6.3160,[569]6.3144,[570]6.3170,[571]6.3176,[572]6.3186,[573]6.3188,[574]6.3151,[575]6.3147,[576]6.3145,[577]6.3135,[578]6.3114,[579]6.3122,[580]6.3056,[581]6.3018,[582]6.3008,[583]6.3016,[584]6.3020,[585]6.2943,[586]6.2875,[587]6.2878,[588]6.2927,[589]6.2985,[590]6.3015,[591]6.3037,[592]6.3022,[593]6.2985,[594]6.2996,[595]6.2973,[596]6.3010,[597]6.2987,[598]6.2949,[599]6.2971,[600]6.2969,[601]6.2954,[602]6.2971,[603]6.3001,[604]6.3012,[605]6.3044,[606]6.3065,[607]6.3048,[608]6.3013,[609]6.3019,[610]6.3056,[611]6.3037,[612]6.3062,[613]6.3026,[614]6.2975,[615]6.2898,[616]6.2928,[617]6.2865,[618]6.2814,[619]6.2757,[620]6.2615,[621]6.2542,[622]6.2525,[623]6.2540,[624]6.2545,[625]6.2544,[626]6.2529,[627]6.2550,[628]6.2555,[629]6.2552,[630]6.2586,[631]6.2650,[632]6.2704,[633]6.2687,[634]6.2720,[635]6.2726,[636]6.2694,[637]6.2659,[638]6.2686,[639]6.2657,[640]6.2666,[641]6.2669,[642]6.2738,[643]6.2759,[644]6.2772,[645]6.2750,[646]6.2793,[647]6.2755,[648]6.2761,[649]6.2762,[650]6.2801,[651]6.2858,[652]6.2865,[653]6.2908,[654]6.2844,[655]6.2838,
llama_print_timings: load time = 17052.08 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 4082758.39 ms / 335360 tokens ( 12.17 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 4118136.22 ms
7B Q4_0, Docker: [655]6.2819,
docker run -it --rm -v$PWD/models:/models --device /dev/dri --device /dev/kfd llama.cpp:rocm perplexity -m /models/llama-7b-q4_0.bin --no-mmap -f /models/wiki.test.raw
main: seed = 1682287852
llama.cpp: loading model from /models/llama-7b-q4_0.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required = 5809.32 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
5.45 seconds per pass - ETA 59 minutes
[1]4.3749,[2]4.9542,[3]5.8256,[4]6.4671,[5]6.5411,[6]6.5396,[7]6.7157,[8]6.8048,[9]7.1739,[10]7.4104,[11]7.6551,[12]7.6928,[13]7.6023,[14]7.6784,[15]7.9332,[16]7.5387,[17]7.4157,[18]7.3769,[19]7.0053,[20]6.9922,[21]6.8948,[22]6.7103,[23]6.6723,[24]6.5850,[25]6.5849,[26]6.4126,[27]6.2327,[28]6.1319,[29]6.0478,[30]5.8917,[31]5.8635,[32]5.8813,[33]5.8165,[34]5.8513,[35]5.8770,[36]5.9209,[37]5.9248,[38]5.9420,[39]5.9801,[40]6.0388,[41]6.0459,[42]6.0803,[43]6.0375,[44]6.0922,[45]6.0966,[46]6.0707,[47]6.0945,[48]6.0653,[49]6.0723,[50]6.0329,[51]6.0287,[52]6.0178,[53]6.0619,[54]6.0455,[55]6.0231,[56]6.0573,[57]6.0803,[58]6.1021,[59]6.1160,[60]6.1625,[61]6.1512,[62]6.2143,[63]6.2479,[64]6.2631,[65]6.3095,[66]6.3198,[67]6.3379,[68]6.3518,[69]6.3767,[70]6.4090,[71]6.4305,[72]6.4603,[73]6.5255,[74]6.5309,[75]6.5453,[76]6.5617,[77]6.5750,[78]6.5597,[79]6.5892,[80]6.5817,[81]6.5943,[82]6.5980,[83]6.5443,[84]6.5297,[85]6.5182,[86]6.4971,[87]6.4318,[88]6.4033,[89]6.3828,[90]6.3662,[91]6.3922,[92]6.3884,[93]6.3909,[94]6.3885,[95]6.4172,[96]6.4151,[97]6.4078,[98]6.4007,[99]6.3868,[100]6.3867,[101]6.4126,[102]6.4063,[103]6.4280,[104]6.4347,[105]6.4333,[106]6.4510,[107]6.4497,[108]6.4621,[109]6.4568,[110]6.4523,[111]6.4752,[112]6.4942,[113]6.4955,[114]6.4921,[115]6.5004,[116]6.4931,[117]6.4986,[118]6.5271,[119]6.5480,[120]6.5844,[121]6.6008,[122]6.6255,[123]6.6645,[124]6.6823,[125]6.6736,[126]6.7127,[127]6.7497,[128]6.7772,[129]6.7604,[130]6.7699,[131]6.7647,[132]6.7558,[133]6.7431,[134]6.7543,[135]6.7508,[136]6.7377,[137]6.7297,[138]6.7126,[139]6.7009,[140]6.6979,[141]6.6682,[142]6.6633,[143]6.6354,[144]6.6154,[145]6.6067,[146]6.5931,[147]6.6007,[148]6.6030,[149]6.5969,[150]6.5928,[151]6.5940,[152]6.5846,[153]6.5678,[154]6.5587,[155]6.5655,[156]6.5605,[157]6.5789,[158]6.5824,[159]6.5866,[160]6.5893,[161]6.6017,[162]6.5716,[163]6.5596,[164]6.5334,[165]6.5016,[166]6.4728,[167]6.4355,[168]6.4028,[169]6.3893,[170]6.3768,[171]6.3479,[172]6.3299,[173]6.3113,[174]6.2806,[175]6.2585,[176]6.2483,[177]6.2272,[178]6.2037,[179]6.1865,[180]6.1776,[181]6.1552,[182]6.1360,[183]6.1217,[184]6.1216,[185]6.1143,[186]6.1160,[187]6.1215,[188]6.1178,[189]6.1363,[190]6.1371,[191]6.1575,[192]6.1739,[193]6.1916,[194]6.2033,[195]6.2243,[196]6.2413,[197]6.2633,[198]6.2789,[199]6.2818,[200]6.2864,[201]6.2822,[202]6.3027,[203]6.3093,[204]6.3092,[205]6.3201,[206]6.3279,[207]6.3239,[208]6.3324,[209]6.3376,[210]6.3426,[211]6.3524,[212]6.3598,[213]6.3704,[214]6.3740,[215]6.3780,[216]6.3928,[217]6.4107,[218]6.4241,[219]6.4244,[220]6.4209,[221]6.4146,[222]6.4110,[223]6.4002,[224]6.3936,[225]6.3888,[226]6.4104,[227]6.4191,[228]6.4250,[229]6.4317,[230]6.4273,[231]6.4442,[232]6.4311,[233]6.4140,[234]6.3984,[235]6.3825,[236]6.3748,[237]6.3644,[238]6.3678,[239]6.3516,[240]6.3413,[241]6.3446,[242]6.3484,[243]6.3468,[244]6.3349,[245]6.3323,[246]6.3201,[247]6.3078,[248]6.3010,[249]6.2990,[250]6.3037,[251]6.2961,[252]6.2927,[253]6.2825,[254]6.2785,[255]6.2669,[256]6.2477,[257]6.2367,[258]6.2280,[259]6.2260,[260]6.2178,[261]6.2135,[262]6.2076,[263]6.2031,[264]6.1838,[265]6.1830,[266]6.1815,[267]6.1745,[268]6.1842,[269]6.1822,[270]6.1829,[271]6.1907,[272]6.1953,[273]6.1948,[274]6.1963,[275]6.2052,[276]6.2107,[277]6.2268,[278]6.2376,[279]6.2462,[280]6.2497,[281]6.2596,[282]6.2657,[283]6.2804,[284]6.2882,[285]6.2976,[286]6.3123,[287]6.3117,[288]6.3177,[289]6.3086,[290]6.2935,[291]6.2781,[292]6.2623,[293]6.2485,[294]6.2509,[295]6.2504,[296]6.2547,[297]6.2533,[298]6.2559,[299]6.2531,[300]6.2419,[301]6.2420,[302]6.2339,[303]6.2262,[304]6.2184,[305]6.2160,[306]6.2028,[307]6.2052,[308]6.2084,[309]6.1921,[310]6.1860,[311]6.1796,[312]6.1819,[313]6.1763,[314]6.1750,[315]6.1585,[316]6.1542,[317]6.1375,[318]6.1159,[319]6.1278,[320]6.1409,[321]6.1447,[322]6.1402,[323]6.1335,[324]6.1311,[325]6.1411,[326]6.1410,[327]6.1432,[328]6.1474,[329]6.1534,[330]6.1559,[331]6.1683,[332]6.1651,[333]6.1721,[334]6.1662,[335]6.1598,[336]6.1635,[337]6.1605,[338]6.1592,[339]6.1535,[340]6.1492,[341]6.1569,[342]6.1594,[343]6.1649,[344]6.1648,[345]6.1648,[346]6.1619,[347]6.1666,[348]6.1708,[349]6.1727,[350]6.1693,[351]6.1698,[352]6.1698,[353]6.1646,[354]6.1644,[355]6.1699,[356]6.1730,[357]6.1693,[358]6.1784,[359]6.1815,[360]6.1777,[361]6.1773,[362]6.1839,[363]6.1951,[364]6.2016,[365]6.2075,[366]6.2082,[367]6.2169,[368]6.2147,[369]6.2156,[370]6.2167,[371]6.2107,[372]6.2159,[373]6.2216,[374]6.2202,[375]6.2199,[376]6.2283,[377]6.2234,[378]6.2259,[379]6.2320,[380]6.2235,[381]6.2193,[382]6.2135,[383]6.2125,[384]6.2118,[385]6.2106,[386]6.2100,[387]6.2092,[388]6.2047,[389]6.1993,[390]6.1924,[391]6.1844,[392]6.1803,[393]6.1784,[394]6.1810,[395]6.1794,[396]6.1720,[397]6.1795,[398]6.1834,[399]6.1917,[400]6.1913,[401]6.1927,[402]6.1932,[403]6.1951,[404]6.2014,[405]6.1919,[406]6.1885,[407]6.1877,[408]6.1887,[409]6.2011,[410]6.2121,[411]6.2246,[412]6.2409,[413]6.2524,[414]6.2600,[415]6.2652,[416]6.2732,[417]6.2863,[418]6.2898,[419]6.2972,[420]6.3058,[421]6.3179,[422]6.3237,[423]6.3308,[424]6.3428,[425]6.3519,[426]6.3583,[427]6.3628,[428]6.3711,[429]6.3756,[430]6.3846,[431]6.3992,[432]6.4035,[433]6.4022,[434]6.3976,[435]6.3983,[436]6.4008,[437]6.4102,[438]6.4181,[439]6.4146,[440]6.4140,[441]6.4089,[442]6.4080,[443]6.4094,[444]6.4097,[445]6.4076,[446]6.4100,[447]6.4129,[448]6.4172,[449]6.4145,[450]6.4149,[451]6.4106,[452]6.3987,[453]6.3904,[454]6.3843,[455]6.3851,[456]6.3898,[457]6.3915,[458]6.3894,[459]6.3904,[460]6.3991,[461]6.3962,[462]6.3946,[463]6.3997,[464]6.3988,[465]6.3958,[466]6.3877,[467]6.3880,[468]6.3879,[469]6.3901,[470]6.3906,[471]6.3858,[472]6.3904,[473]6.3848,[474]6.3862,[475]6.3803,[476]6.3826,[477]6.3754,[478]6.3745,[479]6.3808,[480]6.3860,[481]6.3880,[482]6.3835,[483]6.3793,[484]6.3816,[485]6.3798,[486]6.3743,[487]6.3743,[488]6.3724,[489]6.3674,[490]6.3647,[491]6.3617,[492]6.3559,[493]6.3528,[494]6.3510,[495]6.3508,[496]6.3473,[497]6.3419,[498]6.3402,[499]6.3352,[500]6.3255,[501]6.3185,[502]6.3184,[503]6.3182,[504]6.3088,[505]6.3113,[506]6.3122,[507]6.3061,[508]6.3018,[509]6.3007,[510]6.3046,[511]6.3092,[512]6.3127,[513]6.3146,[514]6.3212,[515]6.3157,[516]6.3149,[517]6.3159,[518]6.3160,[519]6.3190,[520]6.3218,[521]6.3234,[522]6.3263,[523]6.3274,[524]6.3336,[525]6.3373,[526]6.3385,[527]6.3405,[528]6.3351,[529]6.3356,[530]6.3308,[531]6.3298,[532]6.3347,[533]6.3370,[534]6.3351,[535]6.3375,[536]6.3321,[537]6.3297,[538]6.3345,[539]6.3357,[540]6.3397,[541]6.3406,[542]6.3413,[543]6.3426,[544]6.3438,[545]6.3417,[546]6.3423,[547]6.3378,[548]6.3323,[549]6.3325,[550]6.3298,[551]6.3260,[552]6.3240,[553]6.3197,[554]6.3175,[555]6.3146,[556]6.3143,[557]6.3166,[558]6.3126,[559]6.3122,[560]6.3117,[561]6.3119,[562]6.3100,[563]6.3100,[564]6.3144,[565]6.3161,[566]6.3158,[567]6.3135,[568]6.3141,[569]6.3124,[570]6.3150,[571]6.3157,[572]6.3167,[573]6.3168,[574]6.3132,[575]6.3127,[576]6.3126,[577]6.3116,[578]6.3095,[579]6.3103,[580]6.3037,[581]6.2999,[582]6.2990,[583]6.2997,[584]6.3001,[585]6.2924,[586]6.2856,[587]6.2858,[588]6.2908,[589]6.2966,[590]6.2996,[591]6.3018,[592]6.3003,[593]6.2966,[594]6.2977,[595]6.2954,[596]6.2991,[597]6.2968,[598]6.2930,[599]6.2952,[600]6.2950,[601]6.2935,[602]6.2953,[603]6.2982,[604]6.2992,[605]6.3025,[606]6.3046,[607]6.3029,[608]6.2994,[609]6.3000,[610]6.3037,[611]6.3018,[612]6.3043,[613]6.3007,[614]6.2956,[615]6.2879,[616]6.2909,[617]6.2846,[618]6.2795,[619]6.2738,[620]6.2596,[621]6.2524,[622]6.2506,[623]6.2521,[624]6.2526,[625]6.2525,[626]6.2510,[627]6.2531,[628]6.2536,[629]6.2534,[630]6.2568,[631]6.2631,[632]6.2685,[633]6.2668,[634]6.2701,[635]6.2707,[636]6.2675,[637]6.2640,[638]6.2667,[639]6.2638,[640]6.2647,[641]6.2650,[642]6.2719,[643]6.2740,[644]6.2753,[645]6.2732,[646]6.2774,[647]6.2736,[648]6.2743,[649]6.2744,[650]6.2782,[651]6.2839,[652]6.2846,[653]6.2889,[654]6.2825,[655]6.2819,
llama_print_timings: load time = 6811.20 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 3334178.62 ms / 335360 tokens ( 9.94 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 3367197.45 ms
7B Q4_0 --memory_f32, Docker: [655]6.2838,
docker run -it --rm -v$PWD/models:/models --device /dev/dri --device /dev/kfd llama.cpp:rocm perplexity -m /models/llama-7b-q4_0.bin --memory_f32 -f /models/wiki.test.raw
main: seed = 1682331507
llama.cpp: loading model from /models/llama-7b-q4_0.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 5809.32 MB (+ 2052.00 MB per state)
llama_init_from_file: kv self size = 512.00 MB
system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
6.72 seconds per pass - ETA 1 hours 13 minutes
[1]4.3801,[2]4.9555,[3]5.8269,[4]6.4692,[5]6.5436,[6]6.5413,[7]6.7175,[8]6.8070,[9]7.1756,[10]7.4121,[11]7.6567,[12]7.6957,[13]7.6057,[14]7.6821,[15]7.9367,[16]7.5419,[17]7.4189,[18]7.3798,[19]7.0077,[20]6.9948,[21]6.8969,[22]6.7124,[23]6.6744,[24]6.5868,[25]6.5871,[26]6.4149,[27]6.2349,[28]6.1341,[29]6.0499,[30]5.8939,[31]5.8660,[32]5.8840,[33]5.8189,[34]5.8538,[35]5.8795,[36]5.9233,[37]5.9272,[38]5.9444,[39]5.9825,[40]6.0413,[41]6.0483,[42]6.0827,[43]6.0398,[44]6.0945,[45]6.0989,[46]6.0730,[47]6.0968,[48]6.0675,[49]6.0746,[50]6.0352,[51]6.0311,[52]6.0201,[53]6.0642,[54]6.0477,[55]6.0251,[56]6.0595,[57]6.0826,[58]6.1044,[59]6.1183,[60]6.1648,[61]6.1537,[62]6.2167,[63]6.2503,[64]6.2654,[65]6.3119,[66]6.3221,[67]6.3402,[68]6.3542,[69]6.3791,[70]6.4114,[71]6.4328,[72]6.4626,[73]6.5278,[74]6.5331,[75]6.5475,[76]6.5638,[77]6.5771,[78]6.5619,[79]6.5915,[80]6.5840,[81]6.5968,[82]6.6005,[83]6.5468,[84]6.5323,[85]6.5209,[86]6.4998,[87]6.4344,[88]6.4060,[89]6.3854,[90]6.3688,[91]6.3949,[92]6.3910,[93]6.3936,[94]6.3911,[95]6.4198,[96]6.4178,[97]6.4106,[98]6.4036,[99]6.3896,[100]6.3896,[101]6.4155,[102]6.4091,[103]6.4309,[104]6.4377,[105]6.4362,[106]6.4539,[107]6.4526,[108]6.4649,[109]6.4596,[110]6.4551,[111]6.4780,[112]6.4970,[113]6.4984,[114]6.4950,[115]6.5033,[116]6.4959,[117]6.5014,[118]6.5299,[119]6.5508,[120]6.5872,[121]6.6035,[122]6.6283,[123]6.6673,[124]6.6850,[125]6.6763,[126]6.7154,[127]6.7524,[128]6.7799,[129]6.7630,[130]6.7725,[131]6.7673,[132]6.7585,[133]6.7457,[134]6.7569,[135]6.7534,[136]6.7402,[137]6.7322,[138]6.7151,[139]6.7035,[140]6.7005,[141]6.6707,[142]6.6659,[143]6.6379,[144]6.6178,[145]6.6092,[146]6.5957,[147]6.6032,[148]6.6054,[149]6.5994,[150]6.5953,[151]6.5965,[152]6.5870,[153]6.5703,[154]6.5613,[155]6.5680,[156]6.5630,[157]6.5814,[158]6.5849,[159]6.5891,[160]6.5916,[161]6.6041,[162]6.5739,[163]6.5619,[164]6.5357,[165]6.5039,[166]6.4751,[167]6.4378,[168]6.4051,[169]6.3916,[170]6.3791,[171]6.3502,[172]6.3322,[173]6.3136,[174]6.2829,[175]6.2608,[176]6.2505,[177]6.2295,[178]6.2059,[179]6.1887,[180]6.1798,[181]6.1574,[182]6.1382,[183]6.1240,[184]6.1238,[185]6.1165,[186]6.1182,[187]6.1237,[188]6.1200,[189]6.1384,[190]6.1393,[191]6.1597,[192]6.1761,[193]6.1938,[194]6.2054,[195]6.2264,[196]6.2434,[197]6.2655,[198]6.2811,[199]6.2840,[200]6.2886,[201]6.2844,[202]6.3049,[203]6.3115,[204]6.3114,[205]6.3224,[206]6.3302,[207]6.3262,[208]6.3347,[209]6.3398,[210]6.3449,[211]6.3547,[212]6.3621,[213]6.3727,[214]6.3763,[215]6.3803,[216]6.3951,[217]6.4129,[218]6.4264,[219]6.4267,[220]6.4231,[221]6.4168,[222]6.4133,[223]6.4024,[224]6.3958,[225]6.3910,[226]6.4126,[227]6.4212,[228]6.4271,[229]6.4338,[230]6.4294,[231]6.4463,[232]6.4332,[233]6.4160,[234]6.4004,[235]6.3846,[236]6.3768,[237]6.3664,[238]6.3698,[239]6.3536,[240]6.3433,[241]6.3466,[242]6.3504,[243]6.3488,[244]6.3368,[245]6.3342,[246]6.3221,[247]6.3098,[248]6.3030,[249]6.3010,[250]6.3057,[251]6.2981,[252]6.2947,[253]6.2844,[254]6.2804,[255]6.2688,[256]6.2497,[257]6.2386,[258]6.2299,[259]6.2279,[260]6.2197,[261]6.2154,[262]6.2095,[263]6.2050,[264]6.1858,[265]6.1850,[266]6.1835,[267]6.1766,[268]6.1863,[269]6.1843,[270]6.1850,[271]6.1928,[272]6.1974,[273]6.1969,[274]6.1983,[275]6.2073,[276]6.2128,[277]6.2288,[278]6.2397,[279]6.2483,[280]6.2518,[281]6.2617,[282]6.2678,[283]6.2825,[284]6.2903,[285]6.2997,[286]6.3144,[287]6.3138,[288]6.3198,[289]6.3107,[290]6.2956,[291]6.2802,[292]6.2644,[293]6.2505,[294]6.2530,[295]6.2524,[296]6.2567,[297]6.2553,[298]6.2579,[299]6.2551,[300]6.2439,[301]6.2440,[302]6.2359,[303]6.2282,[304]6.2204,[305]6.2180,[306]6.2047,[307]6.2072,[308]6.2104,[309]6.1941,[310]6.1880,[311]6.1816,[312]6.1838,[313]6.1782,[314]6.1769,[315]6.1604,[316]6.1562,[317]6.1395,[318]6.1179,[319]6.1298,[320]6.1429,[321]6.1466,[322]6.1422,[323]6.1356,[324]6.1331,[325]6.1431,[326]6.1430,[327]6.1451,[328]6.1494,[329]6.1554,[330]6.1579,[331]6.1703,[332]6.1671,[333]6.1741,[334]6.1682,[335]6.1618,[336]6.1655,[337]6.1625,[338]6.1612,[339]6.1555,[340]6.1511,[341]6.1589,[342]6.1614,[343]6.1669,[344]6.1668,[345]6.1667,[346]6.1638,[347]6.1686,[348]6.1727,[349]6.1746,[350]6.1712,[351]6.1717,[352]6.1717,[353]6.1665,[354]6.1664,[355]6.1718,[356]6.1749,[357]6.1712,[358]6.1802,[359]6.1833,[360]6.1795,[361]6.1791,[362]6.1858,[363]6.1970,[364]6.2035,[365]6.2093,[366]6.2100,[367]6.2188,[368]6.2166,[369]6.2175,[370]6.2185,[371]6.2125,[372]6.2178,[373]6.2234,[374]6.2221,[375]6.2217,[376]6.2301,[377]6.2252,[378]6.2278,[379]6.2338,[380]6.2254,[381]6.2211,[382]6.2154,[383]6.2144,[384]6.2137,[385]6.2124,[386]6.2119,[387]6.2111,[388]6.2066,[389]6.2012,[390]6.1943,[391]6.1862,[392]6.1822,[393]6.1803,[394]6.1828,[395]6.1812,[396]6.1738,[397]6.1814,[398]6.1852,[399]6.1935,[400]6.1931,[401]6.1945,[402]6.1950,[403]6.1969,[404]6.2032,[405]6.1937,[406]6.1903,[407]6.1895,[408]6.1905,[409]6.2029,[410]6.2139,[411]6.2264,[412]6.2427,[413]6.2542,[414]6.2618,[415]6.2670,[416]6.2750,[417]6.2881,[418]6.2916,[419]6.2990,[420]6.3077,[421]6.3197,[422]6.3255,[423]6.3326,[424]6.3446,[425]6.3537,[426]6.3602,[427]6.3647,[428]6.3730,[429]6.3775,[430]6.3865,[431]6.4011,[432]6.4054,[433]6.4041,[434]6.3995,[435]6.4002,[436]6.4027,[437]6.4121,[438]6.4200,[439]6.4164,[440]6.4158,[441]6.4108,[442]6.4099,[443]6.4112,[444]6.4115,[445]6.4095,[446]6.4118,[447]6.4147,[448]6.4191,[449]6.4164,[450]6.4167,[451]6.4124,[452]6.4006,[453]6.3922,[454]6.3862,[455]6.3869,[456]6.3917,[457]6.3934,[458]6.3912,[459]6.3922,[460]6.4009,[461]6.3981,[462]6.3965,[463]6.4016,[464]6.4007,[465]6.3976,[466]6.3895,[467]6.3898,[468]6.3897,[469]6.3919,[470]6.3924,[471]6.3876,[472]6.3923,[473]6.3866,[474]6.3880,[475]6.3821,[476]6.3844,[477]6.3773,[478]6.3764,[479]6.3827,[480]6.3879,[481]6.3899,[482]6.3854,[483]6.3813,[484]6.3835,[485]6.3818,[486]6.3763,[487]6.3763,[488]6.3744,[489]6.3694,[490]6.3667,[491]6.3637,[492]6.3579,[493]6.3549,[494]6.3531,[495]6.3528,[496]6.3493,[497]6.3440,[498]6.3422,[499]6.3372,[500]6.3275,[501]6.3206,[502]6.3204,[503]6.3202,[504]6.3109,[505]6.3134,[506]6.3143,[507]6.3081,[508]6.3038,[509]6.3027,[510]6.3067,[511]6.3113,[512]6.3148,[513]6.3166,[514]6.3233,[515]6.3177,[516]6.3169,[517]6.3180,[518]6.3181,[519]6.3211,[520]6.3238,[521]6.3255,[522]6.3283,[523]6.3294,[524]6.3357,[525]6.3394,[526]6.3406,[527]6.3426,[528]6.3372,[529]6.3376,[530]6.3329,[531]6.3319,[532]6.3368,[533]6.3391,[534]6.3372,[535]6.3395,[536]6.3341,[537]6.3318,[538]6.3366,[539]6.3378,[540]6.3417,[541]6.3426,[542]6.3433,[543]6.3447,[544]6.3459,[545]6.3437,[546]6.3444,[547]6.3398,[548]6.3343,[549]6.3345,[550]6.3318,[551]6.3280,[552]6.3260,[553]6.3217,[554]6.3195,[555]6.3166,[556]6.3163,[557]6.3186,[558]6.3146,[559]6.3142,[560]6.3137,[561]6.3139,[562]6.3120,[563]6.3120,[564]6.3163,[565]6.3180,[566]6.3177,[567]6.3155,[568]6.3160,[569]6.3144,[570]6.3170,[571]6.3176,[572]6.3186,[573]6.3188,[574]6.3151,[575]6.3147,[576]6.3145,[577]6.3135,[578]6.3114,[579]6.3122,[580]6.3056,[581]6.3018,[582]6.3008,[583]6.3016,[584]6.3020,[585]6.2943,[586]6.2875,[587]6.2877,[588]6.2927,[589]6.2985,[590]6.3015,[591]6.3037,[592]6.3022,[593]6.2985,[594]6.2996,[595]6.2973,[596]6.3010,[597]6.2987,[598]6.2949,[599]6.2971,[600]6.2969,[601]6.2954,[602]6.2971,[603]6.3001,[604]6.3011,[605]6.3044,[606]6.3065,[607]6.3048,[608]6.3013,[609]6.3019,[610]6.3056,[611]6.3037,[612]6.3062,[613]6.3026,[614]6.2975,[615]6.2898,[616]6.2928,[617]6.2865,[618]6.2814,[619]6.2757,[620]6.2614,[621]6.2542,[622]6.2525,[623]6.2540,[624]6.2545,[625]6.2544,[626]6.2529,[627]6.2550,[628]6.2555,[629]6.2552,[630]6.2586,[631]6.2650,[632]6.2704,[633]6.2687,[634]6.2720,[635]6.2726,[636]6.2694,[637]6.2659,[638]6.2686,[639]6.2657,[640]6.2666,[641]6.2669,[642]6.2738,[643]6.2759,[644]6.2772,[645]6.2750,[646]6.2793,[647]6.2755,[648]6.2761,[649]6.2762,[650]6.2801,[651]6.2858,[652]6.2865,[653]6.2908,[654]6.2844,[655]6.2838,
llama_print_timings: load time = 11650.39 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 4610430.68 ms / 335360 tokens ( 13.75 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 4646862.42 ms
7B F16, Docker: [655]5.9564,
docker run -it --rm -v$PWD/models:/models --device /dev/dri --device /dev/kfd llama.cpp:rocm perplexity -m /models/llama-7b-f16.bin --no-mmap -f /models/wiki.test.raw
main: seed = 1682338603
llama.cpp: loading model from /models/llama-7b-f16.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 1 (mostly F16)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 13161547.11 KB
llama_model_load_internal: mem required = 14645.07 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
5.79 seconds per pass - ETA 1 hours 3 minutes
[1]4.2324,[2]4.7328,[3]5.5850,[4]6.1712,[5]6.2985,[6]6.2643,[7]6.4560,[8]6.5508,[9]6.8799,[10]7.1226,[11]7.3342,[12]7.3542,[13]7.2699,[14]7.3195,[15]7.5602,[16]7.1907,[17]7.0812,[18]7.0277,[19]6.6794,[20]6.6695,[21]6.5787,[22]6.4057,[23]6.3766,[24]6.2860,[25]6.2829,[26]6.1235,[27]5.9531,[28]5.8556,[29]5.7688,[30]5.6166,[31]5.5870,[32]5.6072,[33]5.5522,[34]5.5821,[35]5.6047,[36]5.6413,[37]5.6418,[38]5.6522,[39]5.6845,[40]5.7347,[41]5.7433,[42]5.7807,[43]5.7433,[44]5.7999,[45]5.8026,[46]5.7772,[47]5.7975,[48]5.7731,[49]5.7745,[50]5.7360,[51]5.7323,[52]5.7230,[53]5.7679,[54]5.7525,[55]5.7312,[56]5.7595,[57]5.7791,[58]5.7980,[59]5.8151,[60]5.8558,[61]5.8487,[62]5.9057,[63]5.9365,[64]5.9499,[65]5.9914,[66]5.9998,[67]6.0170,[68]6.0314,[69]6.0549,[70]6.0847,[71]6.1056,[72]6.1367,[73]6.1949,[74]6.1989,[75]6.2124,[76]6.2242,[77]6.2354,[78]6.2210,[79]6.2483,[80]6.2418,[81]6.2527,[82]6.2567,[83]6.2069,[84]6.1891,[85]6.1766,[86]6.1557,[87]6.0916,[88]6.0670,[89]6.0476,[90]6.0336,[91]6.0562,[92]6.0504,[93]6.0511,[94]6.0487,[95]6.0758,[96]6.0754,[97]6.0698,[98]6.0639,[99]6.0511,[100]6.0501,[101]6.0737,[102]6.0689,[103]6.0889,[104]6.0961,[105]6.0961,[106]6.1125,[107]6.1118,[108]6.1251,[109]6.1202,[110]6.1167,[111]6.1388,[112]6.1588,[113]6.1608,[114]6.1570,[115]6.1628,[116]6.1539,[117]6.1588,[118]6.1868,[119]6.2082,[120]6.2423,[121]6.2567,[122]6.2808,[123]6.3170,[124]6.3342,[125]6.3251,[126]6.3631,[127]6.3985,[128]6.4280,[129]6.4134,[130]6.4216,[131]6.4180,[132]6.4108,[133]6.3979,[134]6.4077,[135]6.4038,[136]6.3934,[137]6.3862,[138]6.3688,[139]6.3586,[140]6.3551,[141]6.3264,[142]6.3230,[143]6.2934,[144]6.2733,[145]6.2644,[146]6.2529,[147]6.2563,[148]6.2567,[149]6.2515,[150]6.2474,[151]6.2494,[152]6.2398,[153]6.2242,[154]6.2158,[155]6.2226,[156]6.2179,[157]6.2344,[158]6.2386,[159]6.2430,[160]6.2457,[161]6.2574,[162]6.2300,[163]6.2188,[164]6.1960,[165]6.1661,[166]6.1397,[167]6.1037,[168]6.0740,[169]6.0606,[170]6.0500,[171]6.0241,[172]6.0076,[173]5.9916,[174]5.9623,[175]5.9412,[176]5.9301,[177]5.9106,[178]5.8884,[179]5.8719,[180]5.8626,[181]5.8416,[182]5.8242,[183]5.8109,[184]5.8101,[185]5.8029,[186]5.8040,[187]5.8101,[188]5.8063,[189]5.8232,[190]5.8240,[191]5.8445,[192]5.8602,[193]5.8764,[194]5.8872,[195]5.9080,[196]5.9233,[197]5.9438,[198]5.9585,[199]5.9614,[200]5.9663,[201]5.9611,[202]5.9793,[203]5.9863,[204]5.9848,[205]5.9948,[206]6.0016,[207]5.9979,[208]6.0061,[209]6.0101,[210]6.0151,[211]6.0257,[212]6.0326,[213]6.0428,[214]6.0451,[215]6.0475,[216]6.0614,[217]6.0792,[218]6.0920,[219]6.0918,[220]6.0883,[221]6.0832,[222]6.0811,[223]6.0719,[224]6.0648,[225]6.0611,[226]6.0812,[227]6.0890,[228]6.0942,[229]6.1002,[230]6.0970,[231]6.1133,[232]6.1020,[233]6.0861,[234]6.0718,[235]6.0519,[236]6.0454,[237]6.0361,[238]6.0388,[239]6.0245,[240]6.0147,[241]6.0165,[242]6.0202,[243]6.0185,[244]6.0076,[245]6.0047,[246]5.9939,[247]5.9826,[248]5.9756,[249]5.9732,[250]5.9778,[251]5.9710,[252]5.9679,[253]5.9586,[254]5.9534,[255]5.9426,[256]5.9254,[257]5.9136,[258]5.9058,[259]5.9036,[260]5.8957,[261]5.8916,[262]5.8862,[263]5.8811,[264]5.8589,[265]5.8584,[266]5.8566,[267]5.8502,[268]5.8588,[269]5.8570,[270]5.8580,[271]5.8655,[272]5.8688,[273]5.8691,[274]5.8716,[275]5.8797,[276]5.8855,[277]5.9009,[278]5.9107,[279]5.9200,[280]5.9227,[281]5.9323,[282]5.9380,[283]5.9524,[284]5.9602,[285]5.9686,[286]5.9819,[287]5.9814,[288]5.9871,[289]5.9791,[290]5.9639,[291]5.9494,[292]5.9350,[293]5.9221,[294]5.9243,[295]5.9235,[296]5.9281,[297]5.9269,[298]5.9297,[299]5.9273,[300]5.9169,[301]5.9169,[302]5.9093,[303]5.9010,[304]5.8929,[305]5.8895,[306]5.8773,[307]5.8795,[308]5.8825,[309]5.8673,[310]5.8620,[311]5.8558,[312]5.8579,[313]5.8525,[314]5.8509,[315]5.8356,[316]5.8304,[317]5.8147,[318]5.7950,[319]5.8065,[320]5.8185,[321]5.8229,[322]5.8190,[323]5.8124,[324]5.8097,[325]5.8197,[326]5.8199,[327]5.8220,[328]5.8258,[329]5.8316,[330]5.8342,[331]5.8463,[332]5.8435,[333]5.8502,[334]5.8449,[335]5.8390,[336]5.8428,[337]5.8406,[338]5.8399,[339]5.8350,[340]5.8308,[341]5.8387,[342]5.8415,[343]5.8462,[344]5.8463,[345]5.8468,[346]5.8444,[347]5.8484,[348]5.8518,[349]5.8541,[350]5.8509,[351]5.8517,[352]5.8517,[353]5.8461,[354]5.8462,[355]5.8512,[356]5.8542,[357]5.8508,[358]5.8597,[359]5.8622,[360]5.8590,[361]5.8586,[362]5.8654,[363]5.8764,[364]5.8823,[365]5.8874,[366]5.8887,[367]5.8971,[368]5.8948,[369]5.8957,[370]5.8971,[371]5.8919,[372]5.8966,[373]5.9012,[374]5.8997,[375]5.8998,[376]5.9063,[377]5.9020,[378]5.9047,[379]5.9104,[380]5.9027,[381]5.8994,[382]5.8945,[383]5.8938,[384]5.8934,[385]5.8924,[386]5.8919,[387]5.8917,[388]5.8882,[389]5.8832,[390]5.8765,[391]5.8691,[392]5.8652,[393]5.8636,[394]5.8661,[395]5.8649,[396]5.8579,[397]5.8648,[398]5.8685,[399]5.8760,[400]5.8762,[401]5.8776,[402]5.8786,[403]5.8805,[404]5.8869,[405]5.8775,[406]5.8743,[407]5.8739,[408]5.8755,[409]5.8868,[410]5.8975,[411]5.9086,[412]5.9240,[413]5.9348,[414]5.9422,[415]5.9476,[416]5.9552,[417]5.9669,[418]5.9704,[419]5.9770,[420]5.9856,[421]5.9969,[422]6.0009,[423]6.0078,[424]6.0182,[425]6.0267,[426]6.0329,[427]6.0372,[428]6.0453,[429]6.0503,[430]6.0583,[431]6.0720,[432]6.0758,[433]6.0751,[434]6.0711,[435]6.0720,[436]6.0745,[437]6.0839,[438]6.0912,[439]6.0882,[440]6.0873,[441]6.0824,[442]6.0810,[443]6.0823,[444]6.0828,[445]6.0810,[446]6.0833,[447]6.0862,[448]6.0903,[449]6.0879,[450]6.0888,[451]6.0850,[452]6.0715,[453]6.0631,[454]6.0575,[455]6.0585,[456]6.0631,[457]6.0651,[458]6.0629,[459]6.0635,[460]6.0719,[461]6.0692,[462]6.0679,[463]6.0717,[464]6.0706,[465]6.0679,[466]6.0604,[467]6.0605,[468]6.0603,[469]6.0623,[470]6.0627,[471]6.0581,[472]6.0623,[473]6.0572,[474]6.0584,[475]6.0523,[476]6.0539,[477]6.0469,[478]6.0458,[479]6.0513,[480]6.0557,[481]6.0574,[482]6.0531,[483]6.0491,[484]6.0510,[485]6.0489,[486]6.0432,[487]6.0429,[488]6.0407,[489]6.0360,[490]6.0337,[491]6.0308,[492]6.0253,[493]6.0226,[494]6.0209,[495]6.0204,[496]6.0167,[497]6.0112,[498]6.0095,[499]6.0053,[500]5.9962,[501]5.9897,[502]5.9899,[503]5.9894,[504]5.9808,[505]5.9830,[506]5.9837,[507]5.9780,[508]5.9741,[509]5.9735,[510]5.9769,[511]5.9814,[512]5.9849,[513]5.9869,[514]5.9930,[515]5.9877,[516]5.9867,[517]5.9878,[518]5.9874,[519]5.9904,[520]5.9928,[521]5.9940,[522]5.9967,[523]5.9974,[524]6.0030,[525]6.0061,[526]6.0070,[527]6.0087,[528]6.0038,[529]6.0043,[530]5.9994,[531]5.9984,[532]6.0029,[533]6.0052,[534]6.0035,[535]6.0056,[536]6.0004,[537]5.9984,[538]6.0032,[539]6.0043,[540]6.0080,[541]6.0083,[542]6.0094,[543]6.0109,[544]6.0120,[545]6.0102,[546]6.0110,[547]6.0069,[548]6.0023,[549]6.0025,[550]5.9996,[551]5.9963,[552]5.9941,[553]5.9906,[554]5.9886,[555]5.9857,[556]5.9852,[557]5.9875,[558]5.9838,[559]5.9834,[560]5.9833,[561]5.9835,[562]5.9814,[563]5.9810,[564]5.9853,[565]5.9873,[566]5.9872,[567]5.9850,[568]5.9856,[569]5.9844,[570]5.9871,[571]5.9876,[572]5.9886,[573]5.9887,[574]5.9852,[575]5.9846,[576]5.9845,[577]5.9831,[578]5.9812,[579]5.9818,[580]5.9755,[581]5.9719,[582]5.9708,[583]5.9717,[584]5.9719,[585]5.9646,[586]5.9579,[587]5.9585,[588]5.9633,[589]5.9684,[590]5.9714,[591]5.9735,[592]5.9724,[593]5.9692,[594]5.9702,[595]5.9679,[596]5.9711,[597]5.9691,[598]5.9663,[599]5.9684,[600]5.9679,[601]5.9664,[602]5.9672,[603]5.9700,[604]5.9708,[605]5.9742,[606]5.9761,[607]5.9745,[608]5.9713,[609]5.9721,[610]5.9755,[611]5.9738,[612]5.9764,[613]5.9729,[614]5.9680,[615]5.9610,[616]5.9637,[617]5.9578,[618]5.9532,[619]5.9479,[620]5.9347,[621]5.9282,[622]5.9266,[623]5.9281,[624]5.9286,[625]5.9288,[626]5.9278,[627]5.9300,[628]5.9301,[629]5.9297,[630]5.9328,[631]5.9384,[632]5.9439,[633]5.9425,[634]5.9459,[635]5.9466,[636]5.9432,[637]5.9398,[638]5.9422,[639]5.9392,[640]5.9401,[641]5.9403,[642]5.9468,[643]5.9489,[644]5.9501,[645]5.9483,[646]5.9522,[647]5.9482,[648]5.9491,[649]5.9493,[650]5.9531,[651]5.9583,[652]5.9594,[653]5.9632,[654]5.9571,[655]5.9564,
llama_print_timings: load time = 11891.56 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 3755163.27 ms / 335360 tokens ( 11.20 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 3794021.15 ms
7B Q4_1, Docker: [655]6.1290,
docker run -it --rm -v$PWD/models:/models --device /dev/dri --device /dev/kfd llama.cpp:rocm perplexity -m /models/llama-7b-q4_1.bin --no-mmap -f /models/wiki.test.raw
main: seed = 1682342791
llama.cpp: loading model from /models/llama-7b-q4_1.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 3 (mostly Q4_1)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4936267.11 KB
llama_model_load_internal: mem required = 6612.57 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
5.21 seconds per pass - ETA 56 minutes
[1]4.4323,[2]4.8863,[3]5.7761,[4]6.3814,[5]6.4911,[6]6.4638,[7]6.6548,[8]6.7572,[9]7.0838,[10]7.3394,[11]7.5618,[12]7.6045,[13]7.5323,[14]7.5955,[15]7.8405,[16]7.4403,[17]7.3181,[18]7.2617,[19]6.8886,[20]6.8673,[21]6.7698,[22]6.5975,[23]6.5679,[24]6.4790,[25]6.4809,[26]6.3162,[27]6.1360,[28]6.0296,[29]5.9400,[30]5.7779,[31]5.7483,[32]5.7658,[33]5.7091,[34]5.7423,[35]5.7643,[36]5.8049,[37]5.8079,[38]5.8115,[39]5.8458,[40]5.8938,[41]5.9067,[42]5.9474,[43]5.9071,[44]5.9658,[45]5.9726,[46]5.9454,[47]5.9647,[48]5.9383,[49]5.9370,[50]5.8961,[51]5.8903,[52]5.8785,[53]5.9269,[54]5.9100,[55]5.8886,[56]5.9167,[57]5.9356,[58]5.9544,[59]5.9729,[60]6.0160,[61]6.0048,[62]6.0620,[63]6.0919,[64]6.1024,[65]6.1472,[66]6.1572,[67]6.1761,[68]6.1893,[69]6.2130,[70]6.2412,[71]6.2616,[72]6.2930,[73]6.3496,[74]6.3531,[75]6.3683,[76]6.3813,[77]6.3933,[78]6.3796,[79]6.4081,[80]6.4020,[81]6.4178,[82]6.4235,[83]6.3713,[84]6.3554,[85]6.3430,[86]6.3217,[87]6.2604,[88]6.2376,[89]6.2164,[90]6.2009,[91]6.2245,[92]6.2182,[93]6.2174,[94]6.2142,[95]6.2430,[96]6.2421,[97]6.2365,[98]6.2300,[99]6.2155,[100]6.2138,[101]6.2387,[102]6.2327,[103]6.2523,[104]6.2604,[105]6.2595,[106]6.2763,[107]6.2764,[108]6.2882,[109]6.2820,[110]6.2782,[111]6.2997,[112]6.3200,[113]6.3235,[114]6.3198,[115]6.3254,[116]6.3161,[117]6.3214,[118]6.3491,[119]6.3717,[120]6.4076,[121]6.4225,[122]6.4466,[123]6.4839,[124]6.5025,[125]6.4928,[126]6.5324,[127]6.5693,[128]6.6014,[129]6.5853,[130]6.5951,[131]6.5913,[132]6.5826,[133]6.5700,[134]6.5797,[135]6.5762,[136]6.5649,[137]6.5579,[138]6.5414,[139]6.5304,[140]6.5264,[141]6.4978,[142]6.4955,[143]6.4675,[144]6.4464,[145]6.4385,[146]6.4268,[147]6.4309,[148]6.4313,[149]6.4269,[150]6.4230,[151]6.4258,[152]6.4149,[153]6.3993,[154]6.3906,[155]6.3969,[156]6.3919,[157]6.4084,[158]6.4117,[159]6.4171,[160]6.4203,[161]6.4325,[162]6.4049,[163]6.3934,[164]6.3699,[165]6.3385,[166]6.3114,[167]6.2734,[168]6.2433,[169]6.2304,[170]6.2190,[171]6.1931,[172]6.1760,[173]6.1603,[174]6.1305,[175]6.1096,[176]6.0981,[177]6.0784,[178]6.0553,[179]6.0387,[180]6.0287,[181]6.0072,[182]5.9899,[183]5.9766,[184]5.9759,[185]5.9687,[186]5.9694,[187]5.9750,[188]5.9709,[189]5.9890,[190]5.9906,[191]6.0118,[192]6.0274,[193]6.0442,[194]6.0558,[195]6.0776,[196]6.0935,[197]6.1144,[198]6.1302,[199]6.1334,[200]6.1386,[201]6.1341,[202]6.1530,[203]6.1607,[204]6.1598,[205]6.1706,[206]6.1776,[207]6.1744,[208]6.1829,[209]6.1874,[210]6.1918,[211]6.2028,[212]6.2108,[213]6.2210,[214]6.2242,[215]6.2267,[216]6.2408,[217]6.2595,[218]6.2735,[219]6.2740,[220]6.2702,[221]6.2643,[222]6.2621,[223]6.2519,[224]6.2450,[225]6.2412,[226]6.2615,[227]6.2703,[228]6.2761,[229]6.2819,[230]6.2788,[231]6.2952,[232]6.2836,[233]6.2669,[234]6.2516,[235]6.2328,[236]6.2266,[237]6.2167,[238]6.2190,[239]6.2039,[240]6.1932,[241]6.1956,[242]6.1985,[243]6.1965,[244]6.1854,[245]6.1824,[246]6.1716,[247]6.1596,[248]6.1522,[249]6.1487,[250]6.1530,[251]6.1460,[252]6.1421,[253]6.1328,[254]6.1282,[255]6.1171,[256]6.0992,[257]6.0866,[258]6.0783,[259]6.0760,[260]6.0677,[261]6.0633,[262]6.0578,[263]6.0518,[264]6.0313,[265]6.0306,[266]6.0293,[267]6.0225,[268]6.0305,[269]6.0293,[270]6.0292,[271]6.0371,[272]6.0405,[273]6.0408,[274]6.0431,[275]6.0518,[276]6.0575,[277]6.0727,[278]6.0826,[279]6.0913,[280]6.0939,[281]6.1042,[282]6.1099,[283]6.1250,[284]6.1326,[285]6.1406,[286]6.1534,[287]6.1526,[288]6.1586,[289]6.1498,[290]6.1340,[291]6.1184,[292]6.1034,[293]6.0905,[294]6.0925,[295]6.0918,[296]6.0968,[297]6.0962,[298]6.0997,[299]6.0974,[300]6.0864,[301]6.0859,[302]6.0783,[303]6.0693,[304]6.0607,[305]6.0572,[306]6.0448,[307]6.0469,[308]6.0498,[309]6.0338,[310]6.0280,[311]6.0217,[312]6.0240,[313]6.0182,[314]6.0166,[315]6.0009,[316]5.9962,[317]5.9799,[318]5.9594,[319]5.9713,[320]5.9835,[321]5.9877,[322]5.9835,[323]5.9766,[324]5.9733,[325]5.9843,[326]5.9843,[327]5.9863,[328]5.9897,[329]5.9953,[330]5.9982,[331]6.0104,[332]6.0076,[333]6.0148,[334]6.0091,[335]6.0027,[336]6.0059,[337]6.0036,[338]6.0026,[339]5.9973,[340]5.9932,[341]6.0011,[342]6.0040,[343]6.0086,[344]6.0088,[345]6.0089,[346]6.0060,[347]6.0100,[348]6.0137,[349]6.0159,[350]6.0131,[351]6.0139,[352]6.0140,[353]6.0077,[354]6.0081,[355]6.0134,[356]6.0164,[357]6.0133,[358]6.0226,[359]6.0250,[360]6.0220,[361]6.0216,[362]6.0284,[363]6.0395,[364]6.0459,[365]6.0509,[366]6.0528,[367]6.0615,[368]6.0588,[369]6.0599,[370]6.0617,[371]6.0565,[372]6.0615,[373]6.0661,[374]6.0647,[375]6.0648,[376]6.0715,[377]6.0670,[378]6.0694,[379]6.0753,[380]6.0675,[381]6.0642,[382]6.0597,[383]6.0588,[384]6.0583,[385]6.0573,[386]6.0570,[387]6.0571,[388]6.0535,[389]6.0483,[390]6.0418,[391]6.0341,[392]6.0298,[393]6.0284,[394]6.0312,[395]6.0298,[396]6.0224,[397]6.0291,[398]6.0330,[399]6.0406,[400]6.0403,[401]6.0417,[402]6.0429,[403]6.0448,[404]6.0512,[405]6.0421,[406]6.0390,[407]6.0387,[408]6.0405,[409]6.0521,[410]6.0632,[411]6.0746,[412]6.0906,[413]6.1016,[414]6.1093,[415]6.1144,[416]6.1222,[417]6.1344,[418]6.1378,[419]6.1451,[420]6.1543,[421]6.1657,[422]6.1697,[423]6.1766,[424]6.1871,[425]6.1958,[426]6.2025,[427]6.2071,[428]6.2153,[429]6.2208,[430]6.2288,[431]6.2426,[432]6.2466,[433]6.2459,[434]6.2413,[435]6.2424,[436]6.2449,[437]6.2548,[438]6.2623,[439]6.2590,[440]6.2580,[441]6.2531,[442]6.2512,[443]6.2522,[444]6.2528,[445]6.2507,[446]6.2529,[447]6.2559,[448]6.2602,[449]6.2578,[450]6.2586,[451]6.2546,[452]6.2425,[453]6.2342,[454]6.2284,[455]6.2291,[456]6.2343,[457]6.2365,[458]6.2345,[459]6.2351,[460]6.2436,[461]6.2409,[462]6.2395,[463]6.2438,[464]6.2425,[465]6.2398,[466]6.2324,[467]6.2331,[468]6.2329,[469]6.2352,[470]6.2358,[471]6.2311,[472]6.2362,[473]6.2308,[474]6.2321,[475]6.2264,[476]6.2283,[477]6.2213,[478]6.2203,[479]6.2260,[480]6.2304,[481]6.2322,[482]6.2276,[483]6.2235,[484]6.2252,[485]6.2232,[486]6.2172,[487]6.2169,[488]6.2149,[489]6.2100,[490]6.2079,[491]6.2052,[492]6.1996,[493]6.1968,[494]6.1950,[495]6.1947,[496]6.1910,[497]6.1854,[498]6.1839,[499]6.1794,[500]6.1700,[501]6.1636,[502]6.1636,[503]6.1631,[504]6.1542,[505]6.1565,[506]6.1573,[507]6.1519,[508]6.1481,[509]6.1475,[510]6.1511,[511]6.1558,[512]6.1596,[513]6.1615,[514]6.1679,[515]6.1625,[516]6.1617,[517]6.1627,[518]6.1623,[519]6.1655,[520]6.1676,[521]6.1690,[522]6.1718,[523]6.1726,[524]6.1784,[525]6.1817,[526]6.1826,[527]6.1841,[528]6.1791,[529]6.1797,[530]6.1745,[531]6.1728,[532]6.1777,[533]6.1800,[534]6.1785,[535]6.1807,[536]6.1755,[537]6.1733,[538]6.1784,[539]6.1792,[540]6.1829,[541]6.1831,[542]6.1838,[543]6.1854,[544]6.1864,[545]6.1844,[546]6.1852,[547]6.1813,[548]6.1765,[549]6.1762,[550]6.1735,[551]6.1698,[552]6.1675,[553]6.1638,[554]6.1616,[555]6.1585,[556]6.1580,[557]6.1602,[558]6.1564,[559]6.1562,[560]6.1561,[561]6.1566,[562]6.1542,[563]6.1539,[564]6.1585,[565]6.1607,[566]6.1607,[567]6.1588,[568]6.1592,[569]6.1577,[570]6.1605,[571]6.1609,[572]6.1614,[573]6.1611,[574]6.1576,[575]6.1571,[576]6.1570,[577]6.1551,[578]6.1529,[579]6.1531,[580]6.1468,[581]6.1431,[582]6.1423,[583]6.1431,[584]6.1434,[585]6.1360,[586]6.1291,[587]6.1297,[588]6.1344,[589]6.1400,[590]6.1429,[591]6.1451,[592]6.1438,[593]6.1405,[594]6.1415,[595]6.1392,[596]6.1426,[597]6.1404,[598]6.1379,[599]6.1401,[600]6.1401,[601]6.1388,[602]6.1407,[603]6.1432,[604]6.1441,[605]6.1479,[606]6.1499,[607]6.1483,[608]6.1447,[609]6.1452,[610]6.1488,[611]6.1473,[612]6.1499,[613]6.1463,[614]6.1415,[615]6.1340,[616]6.1366,[617]6.1305,[618]6.1256,[619]6.1201,[620]6.1063,[621]6.0995,[622]6.0979,[623]6.0996,[624]6.1001,[625]6.1002,[626]6.0993,[627]6.1019,[628]6.1021,[629]6.1016,[630]6.1047,[631]6.1103,[632]6.1160,[633]6.1145,[634]6.1179,[635]6.1184,[636]6.1149,[637]6.1115,[638]6.1141,[639]6.1109,[640]6.1119,[641]6.1120,[642]6.1185,[643]6.1204,[644]6.1215,[645]6.1198,[646]6.1240,[647]6.1202,[648]6.1213,[649]6.1215,[650]6.1256,[651]6.1310,[652]6.1322,[653]6.1361,[654]6.1297,[655]6.1290,
llama_print_timings: load time = 8444.43 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 3298326.00 ms / 335360 tokens ( 9.84 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 3338225.33 ms
7B Q4_2, Docker: [655]6.2002,
docker run -it --rm -v$PWD/models:/models --device /dev/dri --device /dev/kfd llama.cpp:rocm perplexity -m /models/llama-7b-q4_2.bin --no-mmap -f /models/wiki.test.raw
main: seed = 1682346906
llama.cpp: loading model from /models/llama-7b-q4_2.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 5 (mostly Q4_2)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required = 5809.32 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
5.62 seconds per pass - ETA 1 hours 1 minutes
[1]4.4374,[2]4.8772,[3]5.7681,[4]6.3925,[5]6.5142,[6]6.4884,[7]6.6803,[8]6.7903,[9]7.1376,[10]7.3772,[11]7.5927,[12]7.6131,[13]7.5335,[14]7.6098,[15]7.8668,[16]7.4772,[17]7.3521,[18]7.3157,[19]6.9532,[20]6.9431,[21]6.8483,[22]6.6732,[23]6.6360,[24]6.5422,[25]6.5428,[26]6.3758,[27]6.1929,[28]6.0919,[29]6.0021,[30]5.8428,[31]5.8105,[32]5.8319,[33]5.7736,[34]5.8103,[35]5.8352,[36]5.8781,[37]5.8817,[38]5.8969,[39]5.9326,[40]5.9921,[41]6.0049,[42]6.0453,[43]6.0030,[44]6.0581,[45]6.0611,[46]6.0371,[47]6.0585,[48]6.0306,[49]6.0337,[50]5.9922,[51]5.9876,[52]5.9770,[53]6.0202,[54]6.0027,[55]5.9780,[56]6.0049,[57]6.0244,[58]6.0453,[59]6.0629,[60]6.1072,[61]6.0985,[62]6.1567,[63]6.1927,[64]6.2088,[65]6.2552,[66]6.2626,[67]6.2811,[68]6.2987,[69]6.3254,[70]6.3581,[71]6.3803,[72]6.4105,[73]6.4727,[74]6.4784,[75]6.4929,[76]6.5055,[77]6.5178,[78]6.5028,[79]6.5305,[80]6.5214,[81]6.5310,[82]6.5349,[83]6.4802,[84]6.4626,[85]6.4516,[86]6.4288,[87]6.3638,[88]6.3353,[89]6.3155,[90]6.3010,[91]6.3247,[92]6.3185,[93]6.3207,[94]6.3176,[95]6.3453,[96]6.3436,[97]6.3384,[98]6.3311,[99]6.3161,[100]6.3170,[101]6.3427,[102]6.3372,[103]6.3573,[104]6.3649,[105]6.3640,[106]6.3790,[107]6.3764,[108]6.3901,[109]6.3834,[110]6.3793,[111]6.4015,[112]6.4218,[113]6.4244,[114]6.4213,[115]6.4285,[116]6.4199,[117]6.4258,[118]6.4552,[119]6.4756,[120]6.5110,[121]6.5286,[122]6.5540,[123]6.5918,[124]6.6105,[125]6.6001,[126]6.6398,[127]6.6765,[128]6.7069,[129]6.6906,[130]6.7006,[131]6.6967,[132]6.6882,[133]6.6753,[134]6.6859,[135]6.6817,[136]6.6691,[137]6.6608,[138]6.6454,[139]6.6349,[140]6.6299,[141]6.5998,[142]6.5962,[143]6.5663,[144]6.5457,[145]6.5365,[146]6.5232,[147]6.5297,[148]6.5298,[149]6.5236,[150]6.5186,[151]6.5200,[152]6.5081,[153]6.4913,[154]6.4825,[155]6.4894,[156]6.4842,[157]6.5023,[158]6.5054,[159]6.5108,[160]6.5127,[161]6.5255,[162]6.4955,[163]6.4823,[164]6.4575,[165]6.4257,[166]6.3978,[167]6.3596,[168]6.3278,[169]6.3146,[170]6.3033,[171]6.2754,[172]6.2583,[173]6.2412,[174]6.2109,[175]6.1888,[176]6.1788,[177]6.1583,[178]6.1346,[179]6.1178,[180]6.1090,[181]6.0874,[182]6.0696,[183]6.0557,[184]6.0553,[185]6.0480,[186]6.0489,[187]6.0550,[188]6.0504,[189]6.0677,[190]6.0690,[191]6.0907,[192]6.1069,[193]6.1243,[194]6.1357,[195]6.1569,[196]6.1728,[197]6.1938,[198]6.2088,[199]6.2128,[200]6.2175,[201]6.2129,[202]6.2335,[203]6.2417,[204]6.2405,[205]6.2511,[206]6.2578,[207]6.2542,[208]6.2626,[209]6.2669,[210]6.2724,[211]6.2821,[212]6.2895,[213]6.3000,[214]6.3023,[215]6.3061,[216]6.3214,[217]6.3395,[218]6.3527,[219]6.3533,[220]6.3488,[221]6.3441,[222]6.3413,[223]6.3307,[224]6.3234,[225]6.3195,[226]6.3405,[227]6.3493,[228]6.3539,[229]6.3599,[230]6.3564,[231]6.3734,[232]6.3605,[233]6.3436,[234]6.3285,[235]6.3116,[236]6.3042,[237]6.2939,[238]6.2969,[239]6.2813,[240]6.2712,[241]6.2738,[242]6.2775,[243]6.2755,[244]6.2639,[245]6.2611,[246]6.2492,[247]6.2367,[248]6.2292,[249]6.2272,[250]6.2312,[251]6.2243,[252]6.2207,[253]6.2107,[254]6.2066,[255]6.1956,[256]6.1775,[257]6.1655,[258]6.1570,[259]6.1551,[260]6.1478,[261]6.1437,[262]6.1383,[263]6.1331,[264]6.1137,[265]6.1127,[266]6.1113,[267]6.1047,[268]6.1140,[269]6.1122,[270]6.1132,[271]6.1211,[272]6.1241,[273]6.1240,[274]6.1258,[275]6.1337,[276]6.1394,[277]6.1549,[278]6.1651,[279]6.1738,[280]6.1767,[281]6.1859,[282]6.1921,[283]6.2069,[284]6.2145,[285]6.2233,[286]6.2369,[287]6.2367,[288]6.2425,[289]6.2335,[290]6.2178,[291]6.2024,[292]6.1871,[293]6.1732,[294]6.1753,[295]6.1749,[296]6.1789,[297]6.1773,[298]6.1800,[299]6.1770,[300]6.1657,[301]6.1659,[302]6.1582,[303]6.1501,[304]6.1420,[305]6.1395,[306]6.1268,[307]6.1291,[308]6.1326,[309]6.1167,[310]6.1106,[311]6.1045,[312]6.1074,[313]6.1017,[314]6.1001,[315]6.0837,[316]6.0788,[317]6.0625,[318]6.0413,[319]6.0535,[320]6.0661,[321]6.0703,[322]6.0661,[323]6.0592,[324]6.0566,[325]6.0669,[326]6.0667,[327]6.0684,[328]6.0720,[329]6.0782,[330]6.0808,[331]6.0932,[332]6.0902,[333]6.0973,[334]6.0916,[335]6.0847,[336]6.0880,[337]6.0852,[338]6.0850,[339]6.0796,[340]6.0752,[341]6.0831,[342]6.0853,[343]6.0901,[344]6.0899,[345]6.0898,[346]6.0868,[347]6.0912,[348]6.0944,[349]6.0964,[350]6.0928,[351]6.0934,[352]6.0935,[353]6.0875,[354]6.0880,[355]6.0933,[356]6.0961,[357]6.0928,[358]6.1020,[359]6.1049,[360]6.1012,[361]6.1008,[362]6.1077,[363]6.1193,[364]6.1256,[365]6.1313,[366]6.1324,[367]6.1415,[368]6.1389,[369]6.1394,[370]6.1407,[371]6.1349,[372]6.1398,[373]6.1451,[374]6.1436,[375]6.1435,[376]6.1506,[377]6.1458,[378]6.1484,[379]6.1541,[380]6.1461,[381]6.1422,[382]6.1368,[383]6.1359,[384]6.1353,[385]6.1347,[386]6.1344,[387]6.1338,[388]6.1298,[389]6.1246,[390]6.1177,[391]6.1100,[392]6.1058,[393]6.1042,[394]6.1067,[395]6.1053,[396]6.0976,[397]6.1055,[398]6.1095,[399]6.1178,[400]6.1175,[401]6.1192,[402]6.1200,[403]6.1221,[404]6.1287,[405]6.1186,[406]6.1151,[407]6.1145,[408]6.1158,[409]6.1277,[410]6.1385,[411]6.1499,[412]6.1658,[413]6.1777,[414]6.1852,[415]6.1903,[416]6.1981,[417]6.2105,[418]6.2143,[419]6.2215,[420]6.2303,[421]6.2420,[422]6.2469,[423]6.2538,[424]6.2655,[425]6.2744,[426]6.2811,[427]6.2856,[428]6.2940,[429]6.2990,[430]6.3074,[431]6.3216,[432]6.3258,[433]6.3247,[434]6.3202,[435]6.3210,[436]6.3232,[437]6.3328,[438]6.3403,[439]6.3371,[440]6.3367,[441]6.3315,[442]6.3301,[443]6.3314,[444]6.3317,[445]6.3299,[446]6.3325,[447]6.3355,[448]6.3401,[449]6.3376,[450]6.3389,[451]6.3346,[452]6.3218,[453]6.3130,[454]6.3073,[455]6.3084,[456]6.3130,[457]6.3151,[458]6.3129,[459]6.3132,[460]6.3217,[461]6.3188,[462]6.3170,[463]6.3219,[464]6.3209,[465]6.3177,[466]6.3098,[467]6.3096,[468]6.3093,[469]6.3113,[470]6.3116,[471]6.3067,[472]6.3116,[473]6.3060,[474]6.3068,[475]6.3005,[476]6.3025,[477]6.2953,[478]6.2940,[479]6.3000,[480]6.3046,[481]6.3063,[482]6.3018,[483]6.2976,[484]6.2999,[485]6.2983,[486]6.2928,[487]6.2928,[488]6.2905,[489]6.2857,[490]6.2834,[491]6.2805,[492]6.2745,[493]6.2715,[494]6.2699,[495]6.2703,[496]6.2667,[497]6.2611,[498]6.2592,[499]6.2545,[500]6.2448,[501]6.2381,[502]6.2383,[503]6.2377,[504]6.2288,[505]6.2314,[506]6.2324,[507]6.2268,[508]6.2228,[509]6.2220,[510]6.2257,[511]6.2306,[512]6.2338,[513]6.2358,[514]6.2422,[515]6.2366,[516]6.2357,[517]6.2366,[518]6.2366,[519]6.2397,[520]6.2422,[521]6.2438,[522]6.2468,[523]6.2477,[524]6.2532,[525]6.2568,[526]6.2580,[527]6.2598,[528]6.2548,[529]6.2550,[530]6.2503,[531]6.2492,[532]6.2542,[533]6.2564,[534]6.2548,[535]6.2572,[536]6.2516,[537]6.2493,[538]6.2539,[539]6.2550,[540]6.2588,[541]6.2590,[542]6.2601,[543]6.2615,[544]6.2627,[545]6.2603,[546]6.2611,[547]6.2567,[548]6.2518,[549]6.2515,[550]6.2485,[551]6.2450,[552]6.2428,[553]6.2389,[554]6.2365,[555]6.2336,[556]6.2331,[557]6.2354,[558]6.2316,[559]6.2310,[560]6.2308,[561]6.2307,[562]6.2286,[563]6.2286,[564]6.2330,[565]6.2351,[566]6.2348,[567]6.2328,[568]6.2333,[569]6.2317,[570]6.2343,[571]6.2348,[572]6.2358,[573]6.2360,[574]6.2327,[575]6.2322,[576]6.2321,[577]6.2308,[578]6.2287,[579]6.2293,[580]6.2224,[581]6.2186,[582]6.2174,[583]6.2183,[584]6.2185,[585]6.2112,[586]6.2044,[587]6.2047,[588]6.2096,[589]6.2150,[590]6.2178,[591]6.2199,[592]6.2185,[593]6.2150,[594]6.2158,[595]6.2136,[596]6.2170,[597]6.2149,[598]6.2118,[599]6.2139,[600]6.2133,[601]6.2118,[602]6.2134,[603]6.2166,[604]6.2175,[605]6.2209,[606]6.2229,[607]6.2211,[608]6.2179,[609]6.2185,[610]6.2220,[611]6.2202,[612]6.2229,[613]6.2191,[614]6.2138,[615]6.2065,[616]6.2093,[617]6.2031,[618]6.1980,[619]6.1923,[620]6.1781,[621]6.1710,[622]6.1694,[623]6.1710,[624]6.1715,[625]6.1717,[626]6.1704,[627]6.1724,[628]6.1725,[629]6.1719,[630]6.1751,[631]6.1808,[632]6.1864,[633]6.1847,[634]6.1880,[635]6.1888,[636]6.1857,[637]6.1824,[638]6.1851,[639]6.1822,[640]6.1831,[641]6.1834,[642]6.1899,[643]6.1920,[644]6.1931,[645]6.1912,[646]6.1953,[647]6.1914,[648]6.1925,[649]6.1926,[650]6.1967,[651]6.2023,[652]6.2032,[653]6.2071,[654]6.2008,[655]6.2002,
llama_print_timings: load time = 8085.81 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 3414572.54 ms / 335360 tokens ( 10.18 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 3449029.12 ms
7B Q4_3, Docker: [655]6.0619,
docker run -it --rm -v$PWD/models:/models --device /dev/dri --device /dev/kfd llama.cpp:rocm perplexity -m /models/llama-7b-q4_3.bin --no-mmap -f /models/wiki.test.raw
main: seed = 1682356946
llama.cpp: loading model from /models/llama-7b-q4_3.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 6 (mostly Q4_3)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4936267.11 KB
llama_model_load_internal: mem required = 6612.57 MB (+ 1026.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 8 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
5.95 seconds per pass - ETA 1 hours 4 minutes
[1]4.3494,[2]4.7745,[3]5.6675,[4]6.2874,[5]6.4224,[6]6.3707,[7]6.5479,[8]6.6453,[9]6.9852,[10]7.2514,[11]7.4535,[12]7.4779,[13]7.3964,[14]7.4636,[15]7.7126,[16]7.3279,[17]7.2090,[18]7.1599,[19]6.8048,[20]6.7914,[21]6.6994,[22]6.5295,[23]6.5008,[24]6.4058,[25]6.4156,[26]6.2545,[27]6.0814,[28]5.9817,[29]5.8923,[30]5.7342,[31]5.7033,[32]5.7206,[33]5.6627,[34]5.6956,[35]5.7173,[36]5.7587,[37]5.7623,[38]5.7694,[39]5.8020,[40]5.8530,[41]5.8653,[42]5.9052,[43]5.8678,[44]5.9249,[45]5.9253,[46]5.8998,[47]5.9191,[48]5.8935,[49]5.8922,[50]5.8505,[51]5.8451,[52]5.8341,[53]5.8787,[54]5.8591,[55]5.8357,[56]5.8629,[57]5.8831,[58]5.9029,[59]5.9213,[60]5.9619,[61]5.9534,[62]6.0125,[63]6.0403,[64]6.0536,[65]6.0964,[66]6.1041,[67]6.1232,[68]6.1375,[69]6.1625,[70]6.1932,[71]6.2158,[72]6.2466,[73]6.3047,[74]6.3090,[75]6.3245,[76]6.3370,[77]6.3496,[78]6.3362,[79]6.3624,[80]6.3552,[81]6.3677,[82]6.3722,[83]6.3214,[84]6.3041,[85]6.2914,[86]6.2693,[87]6.2097,[88]6.1854,[89]6.1655,[90]6.1488,[91]6.1725,[92]6.1670,[93]6.1680,[94]6.1655,[95]6.1930,[96]6.1916,[97]6.1878,[98]6.1812,[99]6.1680,[100]6.1683,[101]6.1917,[102]6.1871,[103]6.2064,[104]6.2133,[105]6.2121,[106]6.2295,[107]6.2300,[108]6.2432,[109]6.2365,[110]6.2303,[111]6.2516,[112]6.2716,[113]6.2740,[114]6.2700,[115]6.2758,[116]6.2669,[117]6.2726,[118]6.2999,[119]6.3217,[120]6.3564,[121]6.3711,[122]6.3946,[123]6.4319,[124]6.4494,[125]6.4394,[126]6.4789,[127]6.5147,[128]6.5449,[129]6.5290,[130]6.5368,[131]6.5312,[132]6.5237,[133]6.5113,[134]6.5209,[135]6.5165,[136]6.5048,[137]6.4964,[138]6.4786,[139]6.4684,[140]6.4647,[141]6.4374,[142]6.4326,[143]6.4037,[144]6.3828,[145]6.3746,[146]6.3634,[147]6.3664,[148]6.3664,[149]6.3613,[150]6.3575,[151]6.3596,[152]6.3501,[153]6.3342,[154]6.3260,[155]6.3326,[156]6.3281,[157]6.3447,[158]6.3487,[159]6.3539,[160]6.3562,[161]6.3678,[162]6.3399,[163]6.3276,[164]6.3041,[165]6.2736,[166]6.2468,[167]6.2097,[168]6.1796,[169]6.1652,[170]6.1539,[171]6.1269,[172]6.1090,[173]6.0932,[174]6.0639,[175]6.0424,[176]6.0305,[177]6.0112,[178]5.9887,[179]5.9718,[180]5.9625,[181]5.9414,[182]5.9235,[183]5.9097,[184]5.9083,[185]5.9006,[186]5.9010,[187]5.9075,[188]5.9036,[189]5.9212,[190]5.9223,[191]5.9434,[192]5.9589,[193]5.9754,[194]5.9867,[195]6.0079,[196]6.0237,[197]6.0440,[198]6.0586,[199]6.0620,[200]6.0665,[201]6.0616,[202]6.0801,[203]6.0874,[204]6.0859,[205]6.0967,[206]6.1035,[207]6.0994,[208]6.1080,[209]6.1120,[210]6.1170,[211]6.1269,[212]6.1338,[213]6.1440,[214]6.1465,[215]6.1486,[216]6.1633,[217]6.1805,[218]6.1938,[219]6.1935,[220]6.1896,[221]6.1843,[222]6.1824,[223]6.1736,[224]6.1670,[225]6.1634,[226]6.1834,[227]6.1920,[228]6.1971,[229]6.2033,[230]6.2003,[231]6.2162,[232]6.2048,[233]6.1881,[234]6.1737,[235]6.1548,[236]6.1483,[237]6.1384,[238]6.1405,[239]6.1260,[240]6.1157,[241]6.1175,[242]6.1210,[243]6.1193,[244]6.1085,[245]6.1052,[246]6.0943,[247]6.0828,[248]6.0761,[249]6.0736,[250]6.0782,[251]6.0713,[252]6.0678,[253]6.0581,[254]6.0525,[255]6.0405,[256]6.0226,[257]6.0107,[258]6.0026,[259]6.0003,[260]5.9921,[261]5.9881,[262]5.9824,[263]5.9771,[264]5.9585,[265]5.9581,[266]5.9564,[267]5.9498,[268]5.9584,[269]5.9570,[270]5.9575,[271]5.9655,[272]5.9693,[273]5.9691,[274]5.9714,[275]5.9798,[276]5.9857,[277]6.0012,[278]6.0114,[279]6.0208,[280]6.0234,[281]6.0337,[282]6.0395,[283]6.0545,[284]6.0628,[285]6.0712,[286]6.0841,[287]6.0842,[288]6.0898,[289]6.0816,[290]6.0666,[291]6.0517,[292]6.0368,[293]6.0239,[294]6.0261,[295]6.0251,[296]6.0296,[297]6.0280,[298]6.0312,[299]6.0286,[300]6.0177,[301]6.0176,[302]6.0096,[303]6.0007,[304]5.9919,[305]5.9884,[306]5.9764,[307]5.9785,[308]5.9813,[309]5.9657,[310]5.9603,[311]5.9538,[312]5.9560,[313]5.9502,[314]5.9487,[315]5.9330,[316]5.9279,[317]5.9122,[318]5.8926,[319]5.9047,[320]5.9170,[321]5.9211,[322]5.9171,[323]5.9105,[324]5.9077,[325]5.9179,[326]5.9179,[327]5.9202,[328]5.9239,[329]5.9299,[330]5.9332,[331]5.9456,[332]5.9430,[333]5.9500,[334]5.9448,[335]5.9389,[336]5.9427,[337]5.9405,[338]5.9398,[339]5.9350,[340]5.9309,[341]5.9389,[342]5.9418,[343]5.9461,[344]5.9466,[345]5.9471,[346]5.9449,[347]5.9488,[348]5.9522,[349]5.9546,[350]5.9512,[351]5.9519,[352]5.9524,[353]5.9464,[354]5.9477,[355]5.9528,[356]5.9562,[357]5.9527,[358]5.9621,[359]5.9646,[360]5.9614,[361]5.9612,[362]5.9680,[363]5.9789,[364]5.9852,[365]5.9901,[366]5.9914,[367]5.9997,[368]5.9971,[369]5.9980,[370]5.9997,[371]5.9944,[372]5.9992,[373]6.0039,[374]6.0024,[375]6.0024,[376]6.0089,[377]6.0041,[378]6.0069,[379]6.0130,[380]6.0056,[381]6.0024,[382]5.9974,[383]5.9965,[384]5.9962,[385]5.9950,[386]5.9946,[387]5.9944,[388]5.9910,[389]5.9861,[390]5.9792,[391]5.9717,[392]5.9678,[393]5.9661,[394]5.9690,[395]5.9677,[396]5.9601,[397]5.9669,[398]5.9713,[399]5.9791,[400]5.9791,[401]5.9804,[402]5.9813,[403]5.9832,[404]5.9894,[405]5.9803,[406]5.9772,[407]5.9766,[408]5.9783,[409]5.9897,[410]6.0010,[411]6.0123,[412]6.0280,[413]6.0389,[414]6.0467,[415]6.0521,[416]6.0601,[417]6.0720,[418]6.0755,[419]6.0825,[420]6.0912,[421]6.1028,[422]6.1064,[423]6.1134,[424]6.1238,[425]6.1330,[426]6.1393,[427]6.1438,[428]6.1518,[429]6.1570,[430]6.1652,[431]6.1790,[432]6.1826,[433]6.1817,[434]6.1775,[435]6.1784,[436]6.1808,[437]6.1905,[438]6.1980,[439]6.1948,[440]6.1937,[441]6.1888,[442]6.1876,[443]6.1888,[444]6.1896,[445]6.1875,[446]6.1900,[447]6.1929,[448]6.1967,[449]6.1942,[450]6.1950,[451]6.1909,[452]6.1783,[453]6.1702,[454]6.1647,[455]6.1653,[456]6.1701,[457]6.1718,[458]6.1699,[459]6.1706,[460]6.1790,[461]6.1765,[462]6.1752,[463]6.1787,[464]6.1776,[465]6.1750,[466]6.1674,[467]6.1680,[468]6.1677,[469]6.1699,[470]6.1703,[471]6.1656,[472]6.1701,[473]6.1647,[474]6.1659,[475]6.1598,[476]6.1614,[477]6.1545,[478]6.1536,[479]6.1597,[480]6.1641,[481]6.1658,[482]6.1614,[483]6.1573,[484]6.1591,[485]6.1573,[486]6.1517,[487]6.1515,[488]6.1493,[489]6.1445,[490]6.1422,[491]6.1395,[492]6.1340,[493]6.1311,[494]6.1292,[495]6.1289,[496]6.1252,[497]6.1198,[498]6.1182,[499]6.1138,[500]6.1045,[501]6.0981,[502]6.0982,[503]6.0975,[504]6.0887,[505]6.0905,[506]6.0915,[507]6.0862,[508]6.0823,[509]6.0817,[510]6.0850,[511]6.0897,[512]6.0931,[513]6.0953,[514]6.1016,[515]6.0961,[516]6.0952,[517]6.0962,[518]6.0956,[519]6.0986,[520]6.1009,[521]6.1022,[522]6.1050,[523]6.1057,[524]6.1114,[525]6.1145,[526]6.1156,[527]6.1172,[528]6.1122,[529]6.1131,[530]6.1078,[531]6.1064,[532]6.1112,[533]6.1135,[534]6.1118,[535]6.1138,[536]6.1085,[537]6.1063,[538]6.1114,[539]6.1124,[540]6.1160,[541]6.1162,[542]6.1174,[543]6.1189,[544]6.1198,[545]6.1179,[546]6.1189,[547]6.1149,[548]6.1098,[549]6.1100,[550]6.1070,[551]6.1036,[552]6.1014,[553]6.0976,[554]6.0953,[555]6.0922,[556]6.0915,[557]6.0940,[558]6.0903,[559]6.0901,[560]6.0899,[561]6.0902,[562]6.0881,[563]6.0879,[564]6.0922,[565]6.0943,[566]6.0943,[567]6.0921,[568]6.0930,[569]6.0915,[570]6.0942,[571]6.0944,[572]6.0951,[573]6.0949,[574]6.0913,[575]6.0909,[576]6.0908,[577]6.0892,[578]6.0872,[579]6.0877,[580]6.0813,[581]6.0774,[582]6.0766,[583]6.0774,[584]6.0776,[585]6.0700,[586]6.0632,[587]6.0638,[588]6.0684,[589]6.0739,[590]6.0767,[591]6.0790,[592]6.0777,[593]6.0747,[594]6.0756,[595]6.0732,[596]6.0766,[597]6.0745,[598]6.0715,[599]6.0736,[600]6.0729,[601]6.0715,[602]6.0729,[603]6.0756,[604]6.0764,[605]6.0799,[606]6.0823,[607]6.0807,[608]6.0775,[609]6.0783,[610]6.0818,[611]6.0802,[612]6.0826,[613]6.0789,[614]6.0741,[615]6.0668,[616]6.0694,[617]6.0634,[618]6.0587,[619]6.0531,[620]6.0395,[621]6.0328,[622]6.0311,[623]6.0325,[624]6.0329,[625]6.0328,[626]6.0317,[627]6.0341,[628]6.0343,[629]6.0341,[630]6.0374,[631]6.0430,[632]6.0488,[633]6.0473,[634]6.0507,[635]6.0514,[636]6.0479,[637]6.0444,[638]6.0470,[639]6.0439,[640]6.0448,[641]6.0450,[642]6.0516,[643]6.0538,[644]6.0549,[645]6.0530,[646]6.0572,[647]6.0530,[648]6.0541,[649]6.0544,[650]6.0582,[651]6.0636,[652]6.0648,[653]6.0686,[654]6.0624,[655]6.0619,
llama_print_timings: load time = 9035.02 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 3789898.46 ms / 335360 tokens ( 11.30 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 3830652.16 ms
Looks good, I assume that the problems that some people are reporting are just due to incomplete support of ROCm for some GPUs, and probably there isn't much we can do about that.
On a side note, we should probably change the perplexity tool to report times on the second pass instead.
On a side note, we should probably change the perplexity tool to report times on the second pass instead.
Yes, this is a problem with some ROCm packages like the one in Arch. It takes up to 5 seconds to load the BLAS routines. In AMD's Docker image this doesn't happen but the rocBLAS library is like 3 GB :grimacing: in size.
For the compatibility, I want to create a Docker image that people can just run with one command (like the existing CPU ones).
On the Radeon 5700 XT everything works correctly except the perplexity test (the --memory_f32 perplexity test works). However the card isn't officially supported by ROCm and it doesn't seem to affect the functionality of the main program.
On the Radeon 5700 XT everything works correctly except the perplexity test (the --memory_f32 perplexity test works). However the card isn't officially supported by ROCm and it doesn't seem to affect the functionality of the main program.
What did you do to build and make it work? did you use the docker image or built it directly on your system? what flags/enviroment variables did you use?
I suspect it has something to do with the GPU architecture that is being built. My Makefile changes will detect the GPU of your system but that may not work if you're overriding it on the command line. On the Steam Deck I had to build it for one specific one (gfx1030) because that's the one rocBLAS supports.
This is something that should happen automatically and not be on the user to fix. I need to figure it out.
I compiled it with make LLAMA_HIPBLAS=1 GPU_TARGETS=gfx1030 and launched export HSA_OVERRIDE_GFX_VERSION=10.3.0 before launching main. There must be something else.
Try ( export CXX=hipcc ) before you compiling?
On the Radeon 5700 XT everything works correctly except the perplexity test (the --memory_f32 perplexity test works). However the card isn't officially supported by ROCm and it doesn't seem to affect the functionality of the main program.
What did you do to build and make it work? did you use the docker image or built it directly on your system? what flags/enviroment variables did you use?
I built it directly on my system. I've set HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030 in order to fake the compatibility of the GPU with ROCm.
Try ( export CXX=hipcc ) before you compiling?
i don't think that's correct. i was using hipcc in my code, but SlyEcho is using rocm's llvm instead. His approach is likely the most correct. Anyway, i tried it. didn't change anything.
I built it directly on my system. I've set HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030 in order to fake the compatibility of the GPU with ROCm.
Probably HCC_AMDGPU_TARGET=gfx1030 isn't needed, since --offload-arch is set in the makefile. But i tried it anyway. didn't help.
I'm pretty sure there's something wierd with my specific setup, but i don't get what's wrong. I'm on Arch, i tried with both the rocm packages in official repos and the opencl-amd-dev AUR package
Anyway, even i still haven't managed to test it myself, the code looks perfectly fine to me.
Try ( export CXX=hipcc ) before you compiling?
i don't think that's correct. i was using hipcc in my code, but SlyEcho is using rocm's llvm instead. His approach is likely the most correct.
Anyway, i tried it. didn't change anything.
I built it directly on my system. I've set HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030 in order to fake the compatibility of the GPU with ROCm.
Probably HCC_AMDGPU_TARGET=gfx1030 isn't needed, since --offload-arch is set in the makefile. But i tried it anyway. didn't help.
I'm pretty sure there's something wierd with my specific setup, but i don't get what's wrong.
I'm on Arch, i tried with both the rocm packages in official repos and the opencl-amd-dev AUR package
Anyway, even i still haven't managed to test it myself, the code looks perfectly fine to me.
I cannot make sure, for me I just
export CXX=hipcc export HSA_OVERRIDE_GFX_VERSION=10.3.0 mkdir build cd build CMAKE_PREFIX_PATH=opt/rocm cmake ..
In the top I wrote about the compiler situation.
I can also share my dumb hipBLAS test code, it doesn't require any HIP device code, only hipBLAS. It calculates a matmul over two 4D tensors, first manually and then hipBLAS, and then you can compare and see if it is correct (it took some time to achieve even that!)
blaster.c
#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <stdbool.h>
#include <immintrin.h>
#include "hipblas/hipblas.h"
#define HIP_CHECK(x) do { \
hipError_t ret = (x); \
if (ret != HIP_SUCCESS) { \
fprintf(stderr, "HIP_CHECK failed: %s:%d: %s == %d (%s)\n", \
__FILE__, __LINE__, #x, ret, hipGetErrorString(ret)); \
exit(1); \
} \
} while(0)
#define HIPBLAS_CHECK(x) do { \
hipblasStatus_t ret = (x); \
if (ret != HIPBLAS_STATUS_SUCCESS) { \
fprintf(stderr, "HIPBLAS_CHECK failed: %s:%d: %s == %d (%s)\n", \
__FILE__, __LINE__, #x, ret, hipblasStatusToString(ret)); \
exit(1); \
} \
} while(0)
static float gemm_alpha_f32 = 1.0f;
static float gemm_beta_f32 = 0.0f;
static uint16_t gemm_alpha_f16 = 0x3c00;
static uint16_t gemm_beta_f16 = 0x0000;
typedef struct {
int q;
int p;
int m;
int n;
} dims;
typedef struct {
dims d;
float *a;
} tensor;
tensor *tensor_create(int q, int p, int m, int n, float *data) {
tensor *t = malloc(sizeof(tensor));
t->d = (dims) { q, p, m, n };
size_t s = q*p*m*n*sizeof(float);
HIP_CHECK(hipHostMalloc((void **)&t->a, s, hipHostMallocCoherent));
if (data != NULL) {
memcpy(t->a, data, s);
}
return t;
}
void printmat(tensor *m) {
int d0 = m->d.q;
int d1 = m->d.p;
int d2 = m->d.m;
int d3 = m->d.n;
int w = 5 * d3 * d1 + d1 * 1 + 1;
for (int i = 0; i < w; i++) printf("-");printf("\n");
for (int i0 = 0; i0 < d0; i0++) {
for (int i2 = 0; i2 < d2; i2++) {
printf("|");
for (int i1 = 0; i1 < d1; i1++) {
for (int i3 = 0; i3 < d3; i3++) {
int i = i0*d1*d2*d3 + i1*d2*d3 + i2*d3 + i3;
printf(" % 3.0f ", m->a[i]);
}
printf("|");
}
printf("\n");
}
for (int i = 0; i < w; i++) printf("-");printf("\n");
}
}
inline static int idx(tensor *T, int l, int k, int j, int i) {
return l*T->d.p*T->d.m*T->d.n + k*T->d.m*T->d.n + j*T->d.n + i;
}
inline static float *get(tensor *T, int l, int k, int j, int i) {
return &T->a[idx(T, l, k, j, i)];
}
int matmul2d(float *C, float *A, float *B, int m, int n, int p) {
for (int i = 0; i < m; i++) {
for (int j = 0; j < p; j++) {
float sum = 0;
for (int k = 0; k < n; k++) {
sum += A[i*n + k] * B[k*p + j];
}
C[i*p + j] = sum;
}
}
}
int matmul(tensor *C, tensor *A, tensor *B) {
assert(C->d.q == A->d.q && A->d.q == B->d.q);
assert(C->d.p == A->d.p && A->d.p == B->d.p);
assert(A->d.n == B->d.m);
int q = A->d.q;
int p = A->d.p;
for (int l = 0; l < q; l++) {
for (int k = 0; k < p; k++) {
int t = l*p + k;
matmul2d(
C->a+t*C->d.m*C->d.n,
A->a+t*A->d.m*A->d.n,
B->a+t*B->d.m*B->d.n,
A->d.m, A->d.n, B->d.n
);
}
}
return 0;
}
static hipblasHandle_t handle;
static hipStream_t stream;
void matmul_hip(tensor *C, tensor *A, tensor *B) {
assert(C->d.q == A->d.q && A->d.q == B->d.q);
assert(C->d.p == A->d.p && A->d.p == B->d.p);
int tm = C->d.q * C->d.p;
int ae = A->d.m * A->d.n;
int be = B->d.m * B->d.n;
int ce = C->d.m * C->d.n;
float *Ad, *Bd, *Cd;
//HIP_CHECK(hipMallocAsync((void **)&Ad, tm*ae*sizeof(float), stream));
//HIP_CHECK(hipMallocAsync((void **)&Bd, tm*be*sizeof(float), stream));
//HIP_CHECK(hipMallocAsync((void **)&Cd, tm*ce*sizeof(float), stream));
//HIP_CHECK(hipMemcpyAsync(Ad, A->a, tm*ae*sizeof(float), hipMemcpyHostToDevice, stream));
//HIP_CHECK(hipMemcpyAsync(Bd, B->a, tm*be*sizeof(float), hipMemcpyHostToDevice, stream));
HIP_CHECK(hipHostGetDevicePointer((void **)&Ad, A->a, 0));
HIP_CHECK(hipHostGetDevicePointer((void **)&Bd, B->a, 0));
HIP_CHECK(hipHostGetDevicePointer((void **)&Cd, C->a, 0));
//HIP_CHECK(hipDeviceSynchronize());
//HIP_CHECK(hipDeviceSynchronize());
HIPBLAS_CHECK(hipblasSgemmStridedBatched(handle,
HIPBLAS_OP_N, HIPBLAS_OP_N,
C->d.n, C->d.m, B->d.m,
&gemm_alpha_f32,
Bd, B->d.n, be,
Ad, A->d.n, ae,
&gemm_beta_f32,
Cd, C->d.n, ce,
tm));
//HIP_CHECK(hipMemcpyAsync(C->a, Cd, tm*ce*sizeof(float), hipMemcpyHostToDevice, stream));
//HIP_CHECK(hipFreeAsync(Ad, stream));
//HIP_CHECK(hipFreeAsync(Bd, stream));
//HIP_CHECK(hipFreeAsync(Cd, stream));
//HIP_CHECK(hipDeviceSynchronize());
HIP_CHECK(hipStreamSynchronize(stream));
}
int main() {
tensor *A = tensor_create(2, 2, 4, 2, (float[]) {
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31,
});
printf("A = \n");
printmat(A);
tensor *B = tensor_create(2, 2, 2, 4, (float[]) {
1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0, 0, 0, 1,
2, 0, 0, 0,
0, 0, 1, 0,
0, 0, 1, 0,
0, 0, 0, 1,
});
printf("B = \n");
printmat(B);
tensor *C = tensor_create(2, 2, 4, 4, NULL);
// hipblas time
HIP_CHECK(hipStreamCreateWithFlags(&stream, hipStreamNonBlocking));
HIPBLAS_CHECK(hipblasCreate(&handle));
HIPBLAS_CHECK(hipblasSetStream(handle, stream));
int device;
int managed_memory;
HIP_CHECK(hipGetDevice(&device));
HIP_CHECK(hipDeviceGetAttribute(&managed_memory, hipDeviceAttributeManagedMemory, device));
printf("device %d, managed memory: %d\n", device, managed_memory);
matmul_hip(C, A, B);
printf("C = A*B (via hipblas)\n");
printmat(C);
// check
matmul(C, A, B);
printf("C = A*B\n");
printmat(C);
HIPBLAS_CHECK(hipblasDestroy(handle));
HIP_CHECK(hipStreamDestroy(stream));
return 0;
}
Try ( export CXX=hipcc ) before you compiling?
i don't think that's correct. i was using hipcc in my code, but SlyEcho is using rocm's llvm instead. His approach is likely the most correct. Anyway, i tried it. didn't change anything.
I built it directly on my system. I've set HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030 in order to fake the compatibility of the GPU with ROCm.
Probably HCC_AMDGPU_TARGET=gfx1030 isn't needed, since --offload-arch is set in the makefile. But i tried it anyway. didn't help. I'm pretty sure there's something wierd with my specific setup, but i don't get what's wrong. I'm on Arch, i tried with both the rocm packages in official repos and the opencl-amd-dev AUR package Anyway, even i still haven't managed to test it myself, the code looks perfectly fine to me.
I cannot make sure, for me I just
export CXX=hipcc export HSA_OVERRIDE_GFX_VERSION=10.3.0 mkdir build cd build CMAKE_PREFIX_PATH=opt/rocm cmake ..
this worked for me! thanks :)