Add phi3 128K model support
The only difference between phi3 4k and 128k model is from the rotary embedding. 128k model adds long/short rope scaling factors (freq_factors) and an attn factor to each hidden dimension. The chosen of long/short factor is based on the total length of the input sequences, i.e, the kv context size.
seq_len = torch.max(position_ids) + 1
if seq_len > self.original_max_position_embeddings:
ext_factors = torch.tensor(self.long_factor, dtype=torch.float32, device=x.device)
else:
ext_factors = torch.tensor(self.short_factor, dtype=torch.float32, device=x.device)
inv_freq_shape = torch.arange(0, self.dim, 2, dtype=torch.int64, device=x.device).float() / self.dim
self.inv_freq = 1.0 / (ext_factors * self.base**inv_freq_shape)
The attn factor value is based on the postional embedding size.
scale = self.max_position_embeddings / self.original_max_position_embeddings
if scale <= 1.0:
scaling_factor = 1.0
else:
scaling_factor = math.sqrt(1 + math.log(scale) / math.log(self.original_max_position_embeddings))
Workflow
-
convert-hf-to-gguf.py: Write long/short freq factors to gguf metadata for phi3 model
-
llama.cpp:
- load the freq factors and attn factor from metadata
- take freq factors as an input tensor of phi3 model, and a source of k/q rope tensor
- choose the long or short freq_factors based on the context size when setting the tensor value
-
ggml: update rope op to support long/short freq factors:
- [x] CPU
- [x] CUDA
- [ ] Metal
- [ ] SYCL
- [ ] Vulkan
Test
- https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/tree/main
- clone the model and use
convert-hf-to-gguf.pyto convert to gguf format. - use passkey to test the model, e.g,:
- clone the model and use
passkey phi3_128k_fp16.gguf 500
Thanks @ggerganov for your help. I did not have device to test metal, so not implement that part.
Looking into this now
I would like to refactor the ggml_rope_custom API and remove ggml_rope_with_freq_factors before merging - will push in a bit
I would like to refactor the
ggml_rope_customAPI and removeggml_rope_with_freq_factorsbefore merging - will push in a bit
The refactor make the api looks more clean, truly great.
I would prefer if the scaling factors were exported as a tensor rather than metadata, it would remove quite a bit of code and it would be more efficient.
Yup, would be better to have the factors as tensors. @liuwei-git would you like to give this a go?
📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 538 iterations 🚀
Expand details for performance related PR only
- Concurrent users: 8, duration: 10m
- HTTP request : avg=8734.28ms p(95)=21554.83ms fails=, finish reason: stop=477 truncated=61
- Prompt processing (pp): avg=104.82tk/s p(95)=509.78tk/s
- Token generation (tg): avg=32.71tk/s p(95)=46.82tk/s
- ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=master commit=7528c705b0c741a68a1d85a523d827374c258195
More
---
config:
xyChart:
titleFontSize: 12
width: 900
height: 600
themeVariables:
xyChart:
titleColor: "#000000"
---
xychart-beta
title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
duration=10m 538 iterations"
y-axis "llamacpp:prompt_tokens_seconds"
x-axis "llamacpp:prompt_tokens_seconds" 1716324379 --> 1716325011
line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 815.29, 815.29, 815.29, 815.29, 815.29, 831.39, 831.39, 831.39, 831.39, 831.39, 842.08, 842.08, 842.08, 842.08, 842.08, 864.65, 864.65, 864.65, 864.65, 864.65, 898.12, 898.12, 898.12, 898.12, 898.12, 891.76, 891.76, 891.76, 891.76, 891.76, 902.51, 902.51, 902.51, 902.51, 902.51, 910.64, 910.64, 910.64, 910.64, 910.64, 916.48, 916.48, 916.48, 916.48, 916.48, 916.28, 916.28, 916.28, 916.28, 916.28, 945.03, 945.03, 945.03, 945.03, 945.03, 888.37, 888.37, 888.37, 888.37, 888.37, 904.45, 904.45, 904.45, 904.45, 904.45, 915.15, 915.15, 915.15, 915.15, 915.15, 915.13, 915.13, 915.13, 915.13, 915.13, 916.52, 916.52, 916.52, 916.52, 916.52, 912.88, 912.88, 912.88, 912.88, 912.88, 877.85, 877.85, 877.85, 877.85, 877.85, 878.11, 878.11, 878.11, 878.11, 878.11, 877.89, 877.89, 877.89, 877.89, 877.89, 881.6, 881.6, 881.6, 881.6, 881.6, 884.59, 884.59, 884.59, 884.59, 884.59, 878.57, 878.57, 878.57, 878.57, 878.57, 876.38, 876.38, 876.38, 876.38, 876.38, 873.85, 873.85, 873.85, 873.85, 873.85, 887.18, 887.18, 887.18, 887.18, 887.18, 882.68, 882.68, 882.68, 882.68, 882.68, 882.52, 882.52, 882.52, 882.52, 882.52, 882.48, 882.48, 882.48, 882.48, 882.48, 882.9, 882.9, 882.9, 882.9, 882.9, 882.05, 882.05, 882.05, 882.05, 882.05, 881.29, 881.29, 881.29, 881.29, 881.29, 882.95, 882.95, 882.95, 882.95, 882.95, 881.03, 881.03, 881.03, 881.03, 881.03, 883.45, 883.45, 883.45, 883.45, 883.45, 884.94, 884.94, 884.94, 884.94, 884.94, 882.21, 882.21, 882.21, 882.21, 882.21, 881.15, 881.15, 881.15, 881.15, 881.15, 883.08, 883.08, 883.08, 883.08, 883.08, 882.82, 882.82, 882.82, 882.82, 882.82, 887.34, 887.34, 887.34, 887.34, 887.34, 895.6, 895.6, 895.6, 895.6, 895.6, 895.55, 895.55, 895.55, 895.55, 895.55, 893.74, 893.74, 893.74, 893.74, 893.74, 891.05, 891.05, 891.05, 891.05, 891.05, 889.14, 889.14, 889.14, 889.14, 889.14, 895.34, 895.34, 895.34, 895.34, 895.34, 894.76, 894.76, 894.76, 894.76, 894.76, 892.45, 892.45, 892.45, 892.45, 892.45, 897.32, 897.32, 897.32, 897.32, 897.32, 896.01, 896.01, 896.01, 896.01, 896.01, 898.87, 898.87, 898.87, 898.87, 898.87, 901.0, 901.0, 901.0, 901.0, 901.0, 901.37, 901.37, 901.37, 901.37, 901.37, 907.17, 907.17, 907.17, 907.17, 907.17, 905.49, 905.49, 905.49, 905.49, 905.49, 906.02, 906.02, 906.02, 906.02, 906.02, 905.23, 905.23, 905.23, 905.23, 905.23, 905.62, 905.62, 905.62, 905.62, 905.62, 906.83, 906.83, 906.83, 906.83, 906.83, 906.83, 906.83]
More
---
config:
xyChart:
titleFontSize: 12
width: 900
height: 600
themeVariables:
xyChart:
titleColor: "#000000"
---
xychart-beta
title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
duration=10m 538 iterations"
y-axis "llamacpp:predicted_tokens_seconds"
x-axis "llamacpp:predicted_tokens_seconds" 1716324379 --> 1716325011
line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 37.79, 37.79, 37.79, 37.79, 37.79, 34.05, 34.05, 34.05, 34.05, 34.05, 29.04, 29.04, 29.04, 29.04, 29.04, 30.45, 30.45, 30.45, 30.45, 30.45, 30.35, 30.35, 30.35, 30.35, 30.35, 31.95, 31.95, 31.95, 31.95, 31.95, 32.94, 32.94, 32.94, 32.94, 32.94, 33.36, 33.36, 33.36, 33.36, 33.36, 33.84, 33.84, 33.84, 33.84, 33.84, 34.31, 34.31, 34.31, 34.31, 34.31, 34.4, 34.4, 34.4, 34.4, 34.4, 34.17, 34.17, 34.17, 34.17, 34.17, 33.84, 33.84, 33.84, 33.84, 33.84, 32.26, 32.26, 32.26, 32.26, 32.26, 32.23, 32.23, 32.23, 32.23, 32.23, 30.19, 30.19, 30.19, 30.19, 30.19, 30.46, 30.46, 30.46, 30.46, 30.46, 30.49, 30.49, 30.49, 30.49, 30.49, 30.42, 30.42, 30.42, 30.42, 30.42, 30.45, 30.45, 30.45, 30.45, 30.45, 30.38, 30.38, 30.38, 30.38, 30.38, 30.57, 30.57, 30.57, 30.57, 30.57, 30.67, 30.67, 30.67, 30.67, 30.67, 30.42, 30.42, 30.42, 30.42, 30.42, 30.48, 30.48, 30.48, 30.48, 30.48, 30.7, 30.7, 30.7, 30.7, 30.7, 30.57, 30.57, 30.57, 30.57, 30.57, 30.69, 30.69, 30.69, 30.69, 30.69, 31.06, 31.06, 31.06, 31.06, 31.06, 31.09, 31.09, 31.09, 31.09, 31.09, 31.14, 31.14, 31.14, 31.14, 31.14, 31.24, 31.24, 31.24, 31.24, 31.24, 31.28, 31.28, 31.28, 31.28, 31.28, 31.37, 31.37, 31.37, 31.37, 31.37, 31.3, 31.3, 31.3, 31.3, 31.3, 30.89, 30.89, 30.89, 30.89, 30.89, 30.3, 30.3, 30.3, 30.3, 30.3, 30.37, 30.37, 30.37, 30.37, 30.37, 30.55, 30.55, 30.55, 30.55, 30.55, 30.66, 30.66, 30.66, 30.66, 30.66, 30.8, 30.8, 30.8, 30.8, 30.8, 30.73, 30.73, 30.73, 30.73, 30.73, 30.66, 30.66, 30.66, 30.66, 30.66, 30.55, 30.55, 30.55, 30.55, 30.55, 29.84, 29.84, 29.84, 29.84, 29.84, 28.81, 28.81, 28.81, 28.81, 28.81, 28.91, 28.91, 28.91, 28.91, 28.91, 28.85, 28.85, 28.85, 28.85, 28.85, 28.82, 28.82, 28.82, 28.82, 28.82, 28.78, 28.78, 28.78, 28.78, 28.78, 28.78, 28.78, 28.78, 28.78, 28.78, 28.81, 28.81, 28.81, 28.81, 28.81, 28.83, 28.83, 28.83, 28.83, 28.83, 28.76, 28.76, 28.76, 28.76, 28.76, 28.76, 28.76, 28.76, 28.76, 28.76, 28.67, 28.67, 28.67, 28.67, 28.67, 28.71, 28.71, 28.71, 28.71, 28.71, 28.88, 28.88, 28.88, 28.88, 28.88, 29.03, 29.03, 29.03, 29.03, 29.03, 29.11, 29.11, 29.11, 29.11, 29.11, 29.19, 29.19]
Details
More
---
config:
xyChart:
titleFontSize: 12
width: 900
height: 600
themeVariables:
xyChart:
titleColor: "#000000"
---
xychart-beta
title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
duration=10m 538 iterations"
y-axis "llamacpp:kv_cache_usage_ratio"
x-axis "llamacpp:kv_cache_usage_ratio" 1716324379 --> 1716325011
line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.31, 0.31, 0.31, 0.31, 0.31, 0.33, 0.33, 0.33, 0.33, 0.33, 0.25, 0.25, 0.25, 0.25, 0.25, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.25, 0.25, 0.25, 0.25, 0.25, 0.39, 0.39, 0.39, 0.39, 0.39, 0.39, 0.39, 0.39, 0.39, 0.39, 0.3, 0.3, 0.3, 0.3, 0.3, 0.18, 0.18, 0.18, 0.18, 0.18, 0.1, 0.1, 0.1, 0.1, 0.1, 0.23, 0.23, 0.23, 0.23, 0.23, 0.21, 0.21, 0.21, 0.21, 0.21, 0.17, 0.17, 0.17, 0.17, 0.17, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.31, 0.31, 0.31, 0.31, 0.31, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.33, 0.33, 0.33, 0.33, 0.33, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.26, 0.26, 0.26, 0.26, 0.26, 0.43, 0.43, 0.43, 0.43, 0.43, 0.42, 0.42, 0.42, 0.42, 0.42, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.19, 0.19, 0.19, 0.19, 0.19, 0.45, 0.45, 0.45, 0.45, 0.45, 0.63, 0.63, 0.63, 0.63, 0.63, 0.66, 0.66, 0.66, 0.66, 0.66, 0.39, 0.39, 0.39, 0.39, 0.39, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.26, 0.26, 0.26, 0.26, 0.26, 0.08, 0.08, 0.08, 0.08, 0.08, 0.24, 0.24, 0.24, 0.24, 0.24, 0.18, 0.18, 0.18, 0.18, 0.18, 0.24, 0.24, 0.24, 0.24, 0.24, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.26, 0.26, 0.26, 0.26, 0.26, 0.2, 0.2, 0.2, 0.2, 0.2, 0.08, 0.08, 0.08, 0.08, 0.08, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.19]
More
---
config:
xyChart:
titleFontSize: 12
width: 900
height: 600
themeVariables:
xyChart:
titleColor: "#000000"
---
xychart-beta
title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
duration=10m 538 iterations"
y-axis "llamacpp:requests_processing"
x-axis "llamacpp:requests_processing" 1716324379 --> 1716325011
line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0]
I think this is going to cause the rope to be run on the CPU always, because the scheduler prefers running ops that use weights in the backend of the weights. I will fix that after this is merged.
Ok. Btw, do you see something that could affect the performance of phi-2 (no rope factors)? The benchmark is half the performance than usual (217 iters) and I'm wondering if it is a fluke, because I don't reproduce on my RTX 2060
Looking at the graphs, it seems that the load time increased, but the throughput looks similar. Maybe it was a fluke?
I can't reproduce it on my system either.
| GPU | Model | Test | t/s master | t/s liuwei-git/master | Speedup |
|---|---|---|---|---|---|
| RTX 3090 Ti | phi2 3B Q8_0 | pp512 | 8543.90 | 8519.48 | 1.00 |
| RTX 3090 Ti | phi2 3B Q8_0 | tg128 | 185.05 | 184.47 | 1.00 |
| RTX 3090 Ti | phi2 3B Q8_0 | pp512+tg128 | 808.40 | 808.70 | 1.00 |
the model seems to be doing rather poorly. I cannot tell if its a tokenizer issue or just the model itself, but I quantized the 128k medium instruct model to a q8_0 and its failing pretty simple logic questions. Perhaps its just not good with rather basic math?
I tried a temperature of 1 down to 0.6 and even down to 0 and its still not fairing well on logical questions. I was expecting more from a phi model, which leads me to think it may be some other underlying issue.
The question I asked that it specifically struggled on was:
There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?
It gave some pretty dumb answers such as the tape being over 1300 cm thick, and kept trying to correct itself, giving equally incorrect answers.
Did you try running a 16 bit gguf model and seeing how that performs?
On Tue, May 21, 2024 at 7:02 PM Cross @.***> wrote:
the model seems to be doing rather poorly. I cannot tell if its a tokenizer issue or just the model itself, but I quantized the 128k medium instruct model to a q8_0 and its failing pretty simple logic questions. Perhaps its just not good with rather basic math?
I tried a temperature of 1 down to 0.6 and even down to 0 and its still not fairing well on logical questions. I was expecting more from a phi model, which leads me to think it may be some other underlying issue.
— Reply to this email directly, view it on GitHub https://github.com/ggerganov/llama.cpp/pull/7225#issuecomment-2123159562, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVG6CQF3YEGLIEMJGROMUDZDOD4JAVCNFSM6AAAAABHSEOE66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTGE2TSNJWGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Did you try running a 16 bit gguf model and seeing how that performs?
I tried the prompt on the 128k mini instruct model (f16 and Q4_K_M from https://huggingface.co/AlessandroW/Phi-3-mini-128k-instruct-gguf) and both models performed similar to an older ChatGPT version https://neil.fraser.name/news/2023/02/17/. The quantized model replied:
<|user|> There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?<|end|><|assistant|> To find the thickness of the tape, we need to calculate the volume of the tape when it is rolled up and then use that to find the thickness.
First, let's calculate the volume of the cylinder formed by the outer diameter. The formula for the volume of a cylinder is V = πr²h, where r is the radius and h is the height (or length in this case).
The outer radius (r_outer) is half of the outer diameter, so r_outer = 10 cm / 2 = 5 cm.
The length of the tape (h) is 100 meters, but we need to convert it to centimeters because the radius is in centimeters. So, h = 100 meters * 100 cm/meter = 10000 cm.
Now, let's calculate the volume of the outer cylinder (V_outer):
V_outer = π * (r_outer)² * h
V_outer = π * (5 cm)² * 10000 cm
V_outer = π * 25 cm² * 10000 cm
V_outer = 250000π cm³
Next, let's calculate the volume of the inner cylinder (V_inner), which represents the empty space inside the tape. The inner radius (r_inner) is half of the inner diameter, so r_inner = 5 cm / 2 = 2.5 cm.
The volume of the inner cylinder (V_inner) is:
V_inner = π * (r_inner)² * h
V_inner = π * (2.5 cm)² * 10000 cm
V_inner = π * 6.25 cm² * 10000 cm
V_inner = 62500π cm³
The volume of the tape itself (V_tape) is the difference between the outer and are not standard mathematical operations, and thus are not applicable in this context.
Since the question seems to be asking for the volume of the tape itself, we will subtract the inner volume from the outer volume to find the volume of the tape:
V_tape = V_outer - V_inner
V_tape = 250000π cm³ - 62500π cm³
V_tape = 187500π cm³
To get the numerical value, we use the approximation π ≈ 3.14159:
V_tape ≈ 187500 * 3.14159 cm³
V_tape ≈ 588746.25 cm³
Therefore, the volume of the tape itself is approximately 588746.25 cubic centimeters.<|endoftext|>
The f16 model replied
<|user|> There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?<|end|><|assistant|> To find the thickness of the tape, we need to calculate the volume of the tape when it is rolled up and then use that to find the thickness.
First, let's find the volume of the entire length of the tape if it were a cylinder with the outer diameter. The formula for the volume of a cylinder is V = πr²h, where r is the radius and h is the height (or length in this case).
The outer radius (R) is half of the outer diameter, so R = 10 cm / 2 = 5 cm.
The length (h) of the tape is 100 meters, which we need to convert to centimeters because the diameter is in centimeters. 100 meters = 100 * 100 cm = 10000 cm.
Now, let's calculate the volume of the outer cylinder (V_outer):
V_outer = π * R² * h
V_outer = π * (5 cm)² * 10000 cm
V_outer = π * 25 cm² * 10000 cm
V_outer = 250000π cm³
Next, we need to calculate the volume of the inner cylinder, which represents the space that is not occupied by the tape. The inner radius (r) is half of the inner diameter, so r = 5 cm / 2 = 2.5 cm.
The volume of the inner cylinder (V_inner) is:
V_inner = π * r² * h
V_inner = π * (2.5 cm)² * 10000 cm
V_inner = π * 6.25 cm² * 10000 cm
V_inner = 62500π cm³
Now, to find the volume of the tape itself, we subtract the volume of the inner cylinder from the volume of the outer cylinder:
V_tile = V_outer - V_inner
V_tile = 250000π cm³ - 62500π cm³
V_tile = 187500π cm³
Finally, to find the thickness of the tape, we divide the volume of the tape by the surface area of the inner cylinder (since the thickness will be uniform and we are considering the volume that the tape occupies over the inner surface area):
Surface area of the inner cylinder (A_inner) is:
A_inner = 2π * r * h
A_inner = 2π * (2.5 cm) * 10000 cm
A_inner = 50000π cm²
The thickness of the tape (t) is:
t = V_tile / A_inner
t = 187500π cm³ / 50000π cm²
t = 3.75 cm
So, the thickness of the tape is 3.75 cm
Since the thickness of the tape is not a practical length (it's too large and not realistic for a thin tape), we need to reconsider our calculation. The thickness should be in the same order of magnitude as the width of the tape, which is 0.2 cm. Let's correct this by using the correct formula for the volume of the cylindrical shell (the tape):
V_tile = A_inner * t
We can now solve for the thickness (t):
t = V_tile / A_inner
t = 187500π cm³ / (π * 50000 cm * 100 cm)
t = 187500 / (50000 * 100)
t = 0.375 cm
Therefore, the thickness of the tape is 0.375 cm.<|endoftext|> [end of text]
the fp16 looks a bit better, at least it gets to an answer. How does the pytorch model answer compare?
On Wed, May 22, 2024 at 3:06 PM Dr. Alessandro Wollek < @.***> wrote:
Did you try running a 16 bit gguf model and seeing how that performs?
I tried the prompt on the 128k small instruct model (f16 and Q4_K_M from https://huggingface.co/AlessandroW/Phi-3-mini-128k-instruct-gguf) and both models performed similar to an older ChatGPT version https://neil.fraser.name/news/2023/02/17/. The quantized model replied:
<|user|> There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?<|end|><|assistant|> To find the thickness of the tape, we need to calculate the volume of the tape when it is rolled up and then use that to find the thickness.
First, let's calculate the volume of the cylinder formed by the outer diameter. The formula for the volume of a cylinder is V = πr²h, where r is the radius and h is the height (or length in this case).
The outer radius (r_outer) is half of the outer diameter, so r_outer = 10 cm / 2 = 5 cm.
The length of the tape (h) is 100 meters, but we need to convert it to centimeters because the radius is in centimeters. So, h = 100 meters * 100 cm/meter = 10000 cm.
Now, let's calculate the volume of the outer cylinder (V_outer): V_outer = π * (r_outer)² * h V_outer = π * (5 cm)² * 10000 cm V_outer = π * 25 cm² * 10000 cm V_outer = 250000π cm³
Next, let's calculate the volume of the inner cylinder (V_inner), which represents the empty space inside the tape. The inner radius (r_inner) is half of the inner diameter, so r_inner = 5 cm / 2 = 2.5 cm.
The volume of the inner cylinder (V_inner) is: V_inner = π * (r_inner)² * h V_inner = π * (2.5 cm)² * 10000 cm V_inner = π * 6.25 cm² * 10000 cm V_inner = 62500π cm³
The volume of the tape itself (V_tape) is the difference between the outer and are not standard mathematical operations, and thus are not applicable in this context.
Since the question seems to be asking for the volume of the tape itself, we will subtract the inner volume from the outer volume to find the volume of the tape:
V_tape = V_outer - V_inner V_tape = 250000π cm³ - 62500π cm³ V_tape = 187500π cm³
To get the numerical value, we use the approximation π ≈ 3.14159:
V_tape ≈ 187500 * 3.14159 cm³ V_tape ≈ 588746.25 cm³
Therefore, the volume of the tape itself is approximately 588746.25 cubic centimeters.<|endoftext|>
The f16 model replied
<|user|> There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?<|end|><|assistant|> To find the thickness of the tape, we need to calculate the volume of the tape when it is rolled up and then use that to find the thickness.
First, let's find the volume of the entire length of the tape if it were a cylinder with the outer diameter. The formula for the volume of a cylinder is V = πr²h, where r is the radius and h is the height (or length in this case).
The outer radius (R) is half of the outer diameter, so R = 10 cm / 2 = 5 cm.
The length (h) of the tape is 100 meters, which we need to convert to centimeters because the diameter is in centimeters. 100 meters = 100 * 100 cm = 10000 cm.
Now, let's calculate the volume of the outer cylinder (V_outer): V_outer = π * R² * h V_outer = π * (5 cm)² * 10000 cm V_outer = π * 25 cm² * 10000 cm V_outer = 250000π cm³
Next, we need to calculate the volume of the inner cylinder, which represents the space that is not occupied by the tape. The inner radius (r) is half of the inner diameter, so r = 5 cm / 2 = 2.5 cm.
The volume of the inner cylinder (V_inner) is: V_inner = π * r² * h V_inner = π * (2.5 cm)² * 10000 cm V_inner = π * 6.25 cm² * 10000 cm V_inner = 62500π cm³
Now, to find the volume of the tape itself, we subtract the volume of the inner cylinder from the volume of the outer cylinder:
V_tile = V_outer - V_inner V_tile = 250000π cm³ - 62500π cm³ V_tile = 187500π cm³
Finally, to find the thickness of the tape, we divide the volume of the tape by the surface area of the inner cylinder (since the thickness will be uniform and we are considering the volume that the tape occupies over the inner surface area):
Surface area of the inner cylinder (A_inner) is: A_inner = 2π * r * h A_inner = 2π * (2.5 cm) * 10000 cm A_inner = 50000π cm²
The thickness of the tape (t) is: t = V_tile / A_inner t = 187500π cm³ / 50000π cm² t = 3.75 cm
So, the thickness of the tape is 3.75 cm
Since the thickness of the tape is not a practical length (it's too large and not realistic for a thin tape), we need to reconsider our calculation. The thickness should be in the same order of magnitude as the width of the tape, which is 0.2 cm. Let's correct this by using the correct formula for the volume of the cylindrical shell (the tape):
V_tile = A_inner * t
We can now solve for the thickness (t):
t = V_tile / A_inner t = 187500π cm³ / (π * 50000 cm * 100 cm) t = 187500 / (50000 * 100) t = 0.375 cm
Therefore, the thickness of the tape is 0.375 cm.<|endoftext|> [end of text]
— Reply to this email directly, view it on GitHub https://github.com/ggerganov/llama.cpp/pull/7225#issuecomment-2124889496, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVG6CRGHB5QH5J3VVOUZC3ZDSQ6VAVCNFSM6AAAAABHSEOE66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRUHA4DSNBZGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I guess I got lucky with a seed. Phi-3-medium (q4 from https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF, temp 0.1):
There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?
Answer:
To find out how thick the tape is, we can use the formula for the volume of a cylindrical shell (the shape that represents the unrolled tape). The formula is V = πh(R^2 - r^2), where h is the height or length of the cylinder, R is the outer radius and r is the inner radius.
First, let's convert all measurements to meters for consistency:
- Length (height) of tape, h = 100 m
- Outer diameter = 10 cm = 0.1 m
- Inner diameter = 5 cm = 0.05 m
Now we can find the outer and inner radii by dividing the diameters by 2:
- Outer radius, R = 0.1 m / 2 = 0.05 m
- Inner radius, r = 0.05 m / 2 = 0.025 m
Now we can plug these values into the formula for volume of a cylindrical shell:
V = πh(R^2 - r^2)
We know that V (volume) is equal to the length times the thickness, so let's call the thickness "t" and set up an equation:
100 m * t = π * 100 m * ((0.05 m)^2 - (0.025 m)^2)
Now we can solve for t:
t = (π * ((0.05 m)^2 - (0.025 m)^2)) / 100
t ≈ (3.1416 * (0.0025 m^2 - 0.000625 m^2)) / 100
t ≈ (3.1416 * 0.001875 m^2) / 100
t ≈ 0.0059 m
So the thickness of the tape is approximately 0.0059 meters, or about 5.9 millimeters.
Did you try running a 16 bit gguf model and seeing how that performs?
I tried the prompt on the 128k mini instruct model (f16 and Q4_K_M from https://huggingface.co/AlessandroW/Phi-3-mini-128k-instruct-gguf) and both models performed similar to an older ChatGPT version https://neil.fraser.name/news/2023/02/17/. The quantized model replied:
<|user|> There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?<|end|><|assistant|> To find the thickness of the tape, we need to calculate the volume of the tape when it is rolled up and then use that to find the thickness. First, let's calculate the volume of the cylinder formed by the outer diameter. The formula for the volume of a cylinder is V = πr²h, where r is the radius and h is the height (or length in this case). The outer radius (r_outer) is half of the outer diameter, so r_outer = 10 cm / 2 = 5 cm. The length of the tape (h) is 100 meters, but we need to convert it to centimeters because the radius is in centimeters. So, h = 100 meters * 100 cm/meter = 10000 cm. Now, let's calculate the volume of the outer cylinder (V_outer): V_outer = π * (r_outer)² * h V_outer = π * (5 cm)² * 10000 cm V_outer = π * 25 cm² * 10000 cm V_outer = 250000π cm³ Next, let's calculate the volume of the inner cylinder (V_inner), which represents the empty space inside the tape. The inner radius (r_inner) is half of the inner diameter, so r_inner = 5 cm / 2 = 2.5 cm. The volume of the inner cylinder (V_inner) is: V_inner = π * (r_inner)² * h V_inner = π * (2.5 cm)² * 10000 cm V_inner = π * 6.25 cm² * 10000 cm V_inner = 62500π cm³ The volume of the tape itself (V_tape) is the difference between the outer and are not standard mathematical operations, and thus are not applicable in this context. Since the question seems to be asking for the volume of the tape itself, we will subtract the inner volume from the outer volume to find the volume of the tape: V_tape = V_outer - V_inner V_tape = 250000π cm³ - 62500π cm³ V_tape = 187500π cm³ To get the numerical value, we use the approximation π ≈ 3.14159: V_tape ≈ 187500 * 3.14159 cm³ V_tape ≈ 588746.25 cm³ Therefore, the volume of the tape itself is approximately 588746.25 cubic centimeters.<|endoftext|>The f16 model replied
<|user|> There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?<|end|><|assistant|> To find the thickness of the tape, we need to calculate the volume of the tape when it is rolled up and then use that to find the thickness. First, let's find the volume of the entire length of the tape if it were a cylinder with the outer diameter. The formula for the volume of a cylinder is V = πr²h, where r is the radius and h is the height (or length in this case). The outer radius (R) is half of the outer diameter, so R = 10 cm / 2 = 5 cm. The length (h) of the tape is 100 meters, which we need to convert to centimeters because the diameter is in centimeters. 100 meters = 100 * 100 cm = 10000 cm. Now, let's calculate the volume of the outer cylinder (V_outer): V_outer = π * R² * h V_outer = π * (5 cm)² * 10000 cm V_outer = π * 25 cm² * 10000 cm V_outer = 250000π cm³ Next, we need to calculate the volume of the inner cylinder, which represents the space that is not occupied by the tape. The inner radius (r) is half of the inner diameter, so r = 5 cm / 2 = 2.5 cm. The volume of the inner cylinder (V_inner) is: V_inner = π * r² * h V_inner = π * (2.5 cm)² * 10000 cm V_inner = π * 6.25 cm² * 10000 cm V_inner = 62500π cm³ Now, to find the volume of the tape itself, we subtract the volume of the inner cylinder from the volume of the outer cylinder: V_tile = V_outer - V_inner V_tile = 250000π cm³ - 62500π cm³ V_tile = 187500π cm³ Finally, to find the thickness of the tape, we divide the volume of the tape by the surface area of the inner cylinder (since the thickness will be uniform and we are considering the volume that the tape occupies over the inner surface area): Surface area of the inner cylinder (A_inner) is: A_inner = 2π * r * h A_inner = 2π * (2.5 cm) * 10000 cm A_inner = 50000π cm² The thickness of the tape (t) is: t = V_tile / A_inner t = 187500π cm³ / 50000π cm² t = 3.75 cm So, the thickness of the tape is 3.75 cm Since the thickness of the tape is not a practical length (it's too large and not realistic for a thin tape), we need to reconsider our calculation. The thickness should be in the same order of magnitude as the width of the tape, which is 0.2 cm. Let's correct this by using the correct formula for the volume of the cylindrical shell (the tape): V_tile = A_inner * t We can now solve for the thickness (t): t = V_tile / A_inner t = 187500π cm³ / (π * 50000 cm * 100 cm) t = 187500 / (50000 * 100) t = 0.375 cm Therefore, the thickness of the tape is 0.375 cm.<|endoftext|> [end of text]