llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Add phi3 128K model support

Open liuwei-git opened this issue 1 year ago • 1 comments

The only difference between phi3 4k and 128k model is from the rotary embedding. 128k model adds long/short rope scaling factors (freq_factors) and an attn factor to each hidden dimension. The chosen of long/short factor is based on the total length of the input sequences, i.e, the kv context size.

seq_len = torch.max(position_ids) + 1
if seq_len > self.original_max_position_embeddings:
    ext_factors = torch.tensor(self.long_factor, dtype=torch.float32, device=x.device)
else:
    ext_factors = torch.tensor(self.short_factor, dtype=torch.float32, device=x.device)

inv_freq_shape = torch.arange(0, self.dim, 2, dtype=torch.int64, device=x.device).float() / self.dim
self.inv_freq = 1.0 / (ext_factors * self.base**inv_freq_shape)

The attn factor value is based on the postional embedding size.

scale = self.max_position_embeddings / self.original_max_position_embeddings
if scale <= 1.0:
    scaling_factor = 1.0
else:
    scaling_factor = math.sqrt(1 + math.log(scale) / math.log(self.original_max_position_embeddings))

Workflow

  • convert-hf-to-gguf.py: Write long/short freq factors to gguf metadata for phi3 model

  • llama.cpp:

    • load the freq factors and attn factor from metadata
    • take freq factors as an input tensor of phi3 model, and a source of k/q rope tensor
    • choose the long or short freq_factors based on the context size when setting the tensor value
  • ggml: update rope op to support long/short freq factors:

    • [x] CPU
    • [x] CUDA
    • [ ] Metal
    • [ ] SYCL
    • [ ] Vulkan

Test

  • https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/tree/main
    • clone the model and use convert-hf-to-gguf.py to convert to gguf format.
    • use passkey to test the model, e.g,:
    passkey phi3_128k_fp16.gguf 500

liuwei-git avatar May 11 '24 19:05 liuwei-git

Thanks @ggerganov for your help. I did not have device to test metal, so not implement that part.

liuwei-git avatar May 12 '24 18:05 liuwei-git

Looking into this now

ggerganov avatar May 16 '24 08:05 ggerganov

I would like to refactor the ggml_rope_custom API and remove ggml_rope_with_freq_factors before merging - will push in a bit

ggerganov avatar May 16 '24 09:05 ggerganov

I would like to refactor the ggml_rope_custom API and remove ggml_rope_with_freq_factors before merging - will push in a bit

The refactor make the api looks more clean, truly great.

liuwei-git avatar May 16 '24 10:05 liuwei-git

I would prefer if the scaling factors were exported as a tensor rather than metadata, it would remove quite a bit of code and it would be more efficient.

slaren avatar May 16 '24 11:05 slaren

Yup, would be better to have the factors as tensors. @liuwei-git would you like to give this a go?

ggerganov avatar May 16 '24 12:05 ggerganov

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 538 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8734.28ms p(95)=21554.83ms fails=, finish reason: stop=477 truncated=61
  • Prompt processing (pp): avg=104.82tk/s p(95)=509.78tk/s
  • Token generation (tg): avg=32.71tk/s p(95)=46.82tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=master commit=7528c705b0c741a68a1d85a523d827374c258195

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 538 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716324379 --> 1716325011
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 815.29, 815.29, 815.29, 815.29, 815.29, 831.39, 831.39, 831.39, 831.39, 831.39, 842.08, 842.08, 842.08, 842.08, 842.08, 864.65, 864.65, 864.65, 864.65, 864.65, 898.12, 898.12, 898.12, 898.12, 898.12, 891.76, 891.76, 891.76, 891.76, 891.76, 902.51, 902.51, 902.51, 902.51, 902.51, 910.64, 910.64, 910.64, 910.64, 910.64, 916.48, 916.48, 916.48, 916.48, 916.48, 916.28, 916.28, 916.28, 916.28, 916.28, 945.03, 945.03, 945.03, 945.03, 945.03, 888.37, 888.37, 888.37, 888.37, 888.37, 904.45, 904.45, 904.45, 904.45, 904.45, 915.15, 915.15, 915.15, 915.15, 915.15, 915.13, 915.13, 915.13, 915.13, 915.13, 916.52, 916.52, 916.52, 916.52, 916.52, 912.88, 912.88, 912.88, 912.88, 912.88, 877.85, 877.85, 877.85, 877.85, 877.85, 878.11, 878.11, 878.11, 878.11, 878.11, 877.89, 877.89, 877.89, 877.89, 877.89, 881.6, 881.6, 881.6, 881.6, 881.6, 884.59, 884.59, 884.59, 884.59, 884.59, 878.57, 878.57, 878.57, 878.57, 878.57, 876.38, 876.38, 876.38, 876.38, 876.38, 873.85, 873.85, 873.85, 873.85, 873.85, 887.18, 887.18, 887.18, 887.18, 887.18, 882.68, 882.68, 882.68, 882.68, 882.68, 882.52, 882.52, 882.52, 882.52, 882.52, 882.48, 882.48, 882.48, 882.48, 882.48, 882.9, 882.9, 882.9, 882.9, 882.9, 882.05, 882.05, 882.05, 882.05, 882.05, 881.29, 881.29, 881.29, 881.29, 881.29, 882.95, 882.95, 882.95, 882.95, 882.95, 881.03, 881.03, 881.03, 881.03, 881.03, 883.45, 883.45, 883.45, 883.45, 883.45, 884.94, 884.94, 884.94, 884.94, 884.94, 882.21, 882.21, 882.21, 882.21, 882.21, 881.15, 881.15, 881.15, 881.15, 881.15, 883.08, 883.08, 883.08, 883.08, 883.08, 882.82, 882.82, 882.82, 882.82, 882.82, 887.34, 887.34, 887.34, 887.34, 887.34, 895.6, 895.6, 895.6, 895.6, 895.6, 895.55, 895.55, 895.55, 895.55, 895.55, 893.74, 893.74, 893.74, 893.74, 893.74, 891.05, 891.05, 891.05, 891.05, 891.05, 889.14, 889.14, 889.14, 889.14, 889.14, 895.34, 895.34, 895.34, 895.34, 895.34, 894.76, 894.76, 894.76, 894.76, 894.76, 892.45, 892.45, 892.45, 892.45, 892.45, 897.32, 897.32, 897.32, 897.32, 897.32, 896.01, 896.01, 896.01, 896.01, 896.01, 898.87, 898.87, 898.87, 898.87, 898.87, 901.0, 901.0, 901.0, 901.0, 901.0, 901.37, 901.37, 901.37, 901.37, 901.37, 907.17, 907.17, 907.17, 907.17, 907.17, 905.49, 905.49, 905.49, 905.49, 905.49, 906.02, 906.02, 906.02, 906.02, 906.02, 905.23, 905.23, 905.23, 905.23, 905.23, 905.62, 905.62, 905.62, 905.62, 905.62, 906.83, 906.83, 906.83, 906.83, 906.83, 906.83, 906.83]
                    
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 538 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716324379 --> 1716325011
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 37.79, 37.79, 37.79, 37.79, 37.79, 34.05, 34.05, 34.05, 34.05, 34.05, 29.04, 29.04, 29.04, 29.04, 29.04, 30.45, 30.45, 30.45, 30.45, 30.45, 30.35, 30.35, 30.35, 30.35, 30.35, 31.95, 31.95, 31.95, 31.95, 31.95, 32.94, 32.94, 32.94, 32.94, 32.94, 33.36, 33.36, 33.36, 33.36, 33.36, 33.84, 33.84, 33.84, 33.84, 33.84, 34.31, 34.31, 34.31, 34.31, 34.31, 34.4, 34.4, 34.4, 34.4, 34.4, 34.17, 34.17, 34.17, 34.17, 34.17, 33.84, 33.84, 33.84, 33.84, 33.84, 32.26, 32.26, 32.26, 32.26, 32.26, 32.23, 32.23, 32.23, 32.23, 32.23, 30.19, 30.19, 30.19, 30.19, 30.19, 30.46, 30.46, 30.46, 30.46, 30.46, 30.49, 30.49, 30.49, 30.49, 30.49, 30.42, 30.42, 30.42, 30.42, 30.42, 30.45, 30.45, 30.45, 30.45, 30.45, 30.38, 30.38, 30.38, 30.38, 30.38, 30.57, 30.57, 30.57, 30.57, 30.57, 30.67, 30.67, 30.67, 30.67, 30.67, 30.42, 30.42, 30.42, 30.42, 30.42, 30.48, 30.48, 30.48, 30.48, 30.48, 30.7, 30.7, 30.7, 30.7, 30.7, 30.57, 30.57, 30.57, 30.57, 30.57, 30.69, 30.69, 30.69, 30.69, 30.69, 31.06, 31.06, 31.06, 31.06, 31.06, 31.09, 31.09, 31.09, 31.09, 31.09, 31.14, 31.14, 31.14, 31.14, 31.14, 31.24, 31.24, 31.24, 31.24, 31.24, 31.28, 31.28, 31.28, 31.28, 31.28, 31.37, 31.37, 31.37, 31.37, 31.37, 31.3, 31.3, 31.3, 31.3, 31.3, 30.89, 30.89, 30.89, 30.89, 30.89, 30.3, 30.3, 30.3, 30.3, 30.3, 30.37, 30.37, 30.37, 30.37, 30.37, 30.55, 30.55, 30.55, 30.55, 30.55, 30.66, 30.66, 30.66, 30.66, 30.66, 30.8, 30.8, 30.8, 30.8, 30.8, 30.73, 30.73, 30.73, 30.73, 30.73, 30.66, 30.66, 30.66, 30.66, 30.66, 30.55, 30.55, 30.55, 30.55, 30.55, 29.84, 29.84, 29.84, 29.84, 29.84, 28.81, 28.81, 28.81, 28.81, 28.81, 28.91, 28.91, 28.91, 28.91, 28.91, 28.85, 28.85, 28.85, 28.85, 28.85, 28.82, 28.82, 28.82, 28.82, 28.82, 28.78, 28.78, 28.78, 28.78, 28.78, 28.78, 28.78, 28.78, 28.78, 28.78, 28.81, 28.81, 28.81, 28.81, 28.81, 28.83, 28.83, 28.83, 28.83, 28.83, 28.76, 28.76, 28.76, 28.76, 28.76, 28.76, 28.76, 28.76, 28.76, 28.76, 28.67, 28.67, 28.67, 28.67, 28.67, 28.71, 28.71, 28.71, 28.71, 28.71, 28.88, 28.88, 28.88, 28.88, 28.88, 29.03, 29.03, 29.03, 29.03, 29.03, 29.11, 29.11, 29.11, 29.11, 29.11, 29.19, 29.19]
                    

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 538 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716324379 --> 1716325011
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.31, 0.31, 0.31, 0.31, 0.31, 0.33, 0.33, 0.33, 0.33, 0.33, 0.25, 0.25, 0.25, 0.25, 0.25, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.25, 0.25, 0.25, 0.25, 0.25, 0.39, 0.39, 0.39, 0.39, 0.39, 0.39, 0.39, 0.39, 0.39, 0.39, 0.3, 0.3, 0.3, 0.3, 0.3, 0.18, 0.18, 0.18, 0.18, 0.18, 0.1, 0.1, 0.1, 0.1, 0.1, 0.23, 0.23, 0.23, 0.23, 0.23, 0.21, 0.21, 0.21, 0.21, 0.21, 0.17, 0.17, 0.17, 0.17, 0.17, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.31, 0.31, 0.31, 0.31, 0.31, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.33, 0.33, 0.33, 0.33, 0.33, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.26, 0.26, 0.26, 0.26, 0.26, 0.43, 0.43, 0.43, 0.43, 0.43, 0.42, 0.42, 0.42, 0.42, 0.42, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.19, 0.19, 0.19, 0.19, 0.19, 0.45, 0.45, 0.45, 0.45, 0.45, 0.63, 0.63, 0.63, 0.63, 0.63, 0.66, 0.66, 0.66, 0.66, 0.66, 0.39, 0.39, 0.39, 0.39, 0.39, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.26, 0.26, 0.26, 0.26, 0.26, 0.08, 0.08, 0.08, 0.08, 0.08, 0.24, 0.24, 0.24, 0.24, 0.24, 0.18, 0.18, 0.18, 0.18, 0.18, 0.24, 0.24, 0.24, 0.24, 0.24, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.26, 0.26, 0.26, 0.26, 0.26, 0.2, 0.2, 0.2, 0.2, 0.2, 0.08, 0.08, 0.08, 0.08, 0.08, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.19]
                    
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 538 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716324379 --> 1716325011
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0]
                    

github-actions[bot] avatar May 21 '24 16:05 github-actions[bot]

I think this is going to cause the rope to be run on the CPU always, because the scheduler prefers running ops that use weights in the backend of the weights. I will fix that after this is merged.

slaren avatar May 21 '24 16:05 slaren

Ok. Btw, do you see something that could affect the performance of phi-2 (no rope factors)? The benchmark is half the performance than usual (217 iters) and I'm wondering if it is a fluke, because I don't reproduce on my RTX 2060

ggerganov avatar May 21 '24 16:05 ggerganov

Looking at the graphs, it seems that the load time increased, but the throughput looks similar. Maybe it was a fluke?

I can't reproduce it on my system either.

GPU Model Test t/s master t/s liuwei-git/master Speedup
RTX 3090 Ti phi2 3B Q8_0 pp512 8543.90 8519.48 1.00
RTX 3090 Ti phi2 3B Q8_0 tg128 185.05 184.47 1.00
RTX 3090 Ti phi2 3B Q8_0 pp512+tg128 808.40 808.70 1.00

slaren avatar May 21 '24 16:05 slaren

the model seems to be doing rather poorly. I cannot tell if its a tokenizer issue or just the model itself, but I quantized the 128k medium instruct model to a q8_0 and its failing pretty simple logic questions. Perhaps its just not good with rather basic math?

I tried a temperature of 1 down to 0.6 and even down to 0 and its still not fairing well on logical questions. I was expecting more from a phi model, which leads me to think it may be some other underlying issue.

The question I asked that it specifically struggled on was:

There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?

It gave some pretty dumb answers such as the tape being over 1300 cm thick, and kept trying to correct itself, giving equally incorrect answers.

dillfrescott avatar May 21 '24 18:05 dillfrescott

Did you try running a 16 bit gguf model and seeing how that performs?

On Tue, May 21, 2024 at 7:02 PM Cross @.***> wrote:

the model seems to be doing rather poorly. I cannot tell if its a tokenizer issue or just the model itself, but I quantized the 128k medium instruct model to a q8_0 and its failing pretty simple logic questions. Perhaps its just not good with rather basic math?

I tried a temperature of 1 down to 0.6 and even down to 0 and its still not fairing well on logical questions. I was expecting more from a phi model, which leads me to think it may be some other underlying issue.

— Reply to this email directly, view it on GitHub https://github.com/ggerganov/llama.cpp/pull/7225#issuecomment-2123159562, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVG6CQF3YEGLIEMJGROMUDZDOD4JAVCNFSM6AAAAABHSEOE66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTGE2TSNJWGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

RonanKMcGovern avatar May 22 '24 13:05 RonanKMcGovern

Did you try running a 16 bit gguf model and seeing how that performs?

I tried the prompt on the 128k mini instruct model (f16 and Q4_K_M from https://huggingface.co/AlessandroW/Phi-3-mini-128k-instruct-gguf) and both models performed similar to an older ChatGPT version https://neil.fraser.name/news/2023/02/17/. The quantized model replied:

<|user|> There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?<|end|><|assistant|> To find the thickness of the tape, we need to calculate the volume of the tape when it is rolled up and then use that to find the thickness.

First, let's calculate the volume of the cylinder formed by the outer diameter. The formula for the volume of a cylinder is V = πr²h, where r is the radius and h is the height (or length in this case).

The outer radius (r_outer) is half of the outer diameter, so r_outer = 10 cm / 2 = 5 cm.

The length of the tape (h) is 100 meters, but we need to convert it to centimeters because the radius is in centimeters. So, h = 100 meters * 100 cm/meter = 10000 cm.

Now, let's calculate the volume of the outer cylinder (V_outer):
V_outer = π * (r_outer)² * h
V_outer = π * (5 cm)² * 10000 cm
V_outer = π * 25 cm² * 10000 cm
V_outer = 250000π cm³

Next, let's calculate the volume of the inner cylinder (V_inner), which represents the empty space inside the tape. The inner radius (r_inner) is half of the inner diameter, so r_inner = 5 cm / 2 = 2.5 cm.

The volume of the inner cylinder (V_inner) is:
V_inner = π * (r_inner)² * h
V_inner = π * (2.5 cm)² * 10000 cm
V_inner = π * 6.25 cm² * 10000 cm
V_inner = 62500π cm³

The volume of the tape itself (V_tape) is the difference between the outer and are not standard mathematical operations, and thus are not applicable in this context.

Since the question seems to be asking for the volume of the tape itself, we will subtract the inner volume from the outer volume to find the volume of the tape:

V_tape = V_outer - V_inner
V_tape = 250000π cm³ - 62500π cm³
V_tape = 187500π cm³

To get the numerical value, we use the approximation π ≈ 3.14159:

V_tape ≈ 187500 * 3.14159 cm³
V_tape ≈ 588746.25 cm³

Therefore, the volume of the tape itself is approximately 588746.25 cubic centimeters.<|endoftext|>

The f16 model replied

<|user|> There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?<|end|><|assistant|> To find the thickness of the tape, we need to calculate the volume of the tape when it is rolled up and then use that to find the thickness.

First, let's find the volume of the entire length of the tape if it were a cylinder with the outer diameter. The formula for the volume of a cylinder is V = πr²h, where r is the radius and h is the height (or length in this case).

The outer radius (R) is half of the outer diameter, so R = 10 cm / 2 = 5 cm.

The length (h) of the tape is 100 meters, which we need to convert to centimeters because the diameter is in centimeters. 100 meters = 100 * 100 cm = 10000 cm.

Now, let's calculate the volume of the outer cylinder (V_outer):
V_outer = π * R² * h
V_outer = π * (5 cm)² * 10000 cm
V_outer = π * 25 cm² * 10000 cm
V_outer = 250000π cm³

Next, we need to calculate the volume of the inner cylinder, which represents the space that is not occupied by the tape. The inner radius (r) is half of the inner diameter, so r = 5 cm / 2 = 2.5 cm.

The volume of the inner cylinder (V_inner) is:
V_inner = π * r² * h
V_inner = π * (2.5 cm)² * 10000 cm
V_inner = π * 6.25 cm² * 10000 cm
V_inner = 62500π cm³

Now, to find the volume of the tape itself, we subtract the volume of the inner cylinder from the volume of the outer cylinder:

V_tile = V_outer - V_inner
V_tile = 250000π cm³ - 62500π cm³
V_tile = 187500π cm³

Finally, to find the thickness of the tape, we divide the volume of the tape by the surface area of the inner cylinder (since the thickness will be uniform and we are considering the volume that the tape occupies over the inner surface area):

Surface area of the inner cylinder (A_inner) is:
A_inner = 2π * r * h
A_inner = 2π * (2.5 cm) * 10000 cm
A_inner = 50000π cm²

The thickness of the tape (t) is:
t = V_tile / A_inner
t = 187500π cm³ / 50000π cm²
t = 3.75 cm

So, the thickness of the tape is 3.75 cm

Since the thickness of the tape is not a practical length (it's too large and not realistic for a thin tape), we need to reconsider our calculation. The thickness should be in the same order of magnitude as the width of the tape, which is 0.2 cm. Let's correct this by using the correct formula for the volume of the cylindrical shell (the tape):

V_tile = A_inner * t

We can now solve for the thickness (t):

t = V_tile / A_inner
t = 187500π cm³ / (π * 50000 cm * 100 cm)
t = 187500 / (50000 * 100)
t = 0.375 cm

Therefore, the thickness of the tape is 0.375 cm.<|endoftext|> [end of text]

AlessandroW avatar May 22 '24 14:05 AlessandroW

the fp16 looks a bit better, at least it gets to an answer. How does the pytorch model answer compare?

On Wed, May 22, 2024 at 3:06 PM Dr. Alessandro Wollek < @.***> wrote:

Did you try running a 16 bit gguf model and seeing how that performs?

I tried the prompt on the 128k small instruct model (f16 and Q4_K_M from https://huggingface.co/AlessandroW/Phi-3-mini-128k-instruct-gguf) and both models performed similar to an older ChatGPT version https://neil.fraser.name/news/2023/02/17/. The quantized model replied:

<|user|> There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?<|end|><|assistant|> To find the thickness of the tape, we need to calculate the volume of the tape when it is rolled up and then use that to find the thickness.

First, let's calculate the volume of the cylinder formed by the outer diameter. The formula for the volume of a cylinder is V = πr²h, where r is the radius and h is the height (or length in this case).

The outer radius (r_outer) is half of the outer diameter, so r_outer = 10 cm / 2 = 5 cm.

The length of the tape (h) is 100 meters, but we need to convert it to centimeters because the radius is in centimeters. So, h = 100 meters * 100 cm/meter = 10000 cm.

Now, let's calculate the volume of the outer cylinder (V_outer): V_outer = π * (r_outer)² * h V_outer = π * (5 cm)² * 10000 cm V_outer = π * 25 cm² * 10000 cm V_outer = 250000π cm³

Next, let's calculate the volume of the inner cylinder (V_inner), which represents the empty space inside the tape. The inner radius (r_inner) is half of the inner diameter, so r_inner = 5 cm / 2 = 2.5 cm.

The volume of the inner cylinder (V_inner) is: V_inner = π * (r_inner)² * h V_inner = π * (2.5 cm)² * 10000 cm V_inner = π * 6.25 cm² * 10000 cm V_inner = 62500π cm³

The volume of the tape itself (V_tape) is the difference between the outer and are not standard mathematical operations, and thus are not applicable in this context.

Since the question seems to be asking for the volume of the tape itself, we will subtract the inner volume from the outer volume to find the volume of the tape:

V_tape = V_outer - V_inner V_tape = 250000π cm³ - 62500π cm³ V_tape = 187500π cm³

To get the numerical value, we use the approximation π ≈ 3.14159:

V_tape ≈ 187500 * 3.14159 cm³ V_tape ≈ 588746.25 cm³

Therefore, the volume of the tape itself is approximately 588746.25 cubic centimeters.<|endoftext|>

The f16 model replied

<|user|> There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?<|end|><|assistant|> To find the thickness of the tape, we need to calculate the volume of the tape when it is rolled up and then use that to find the thickness.

First, let's find the volume of the entire length of the tape if it were a cylinder with the outer diameter. The formula for the volume of a cylinder is V = πr²h, where r is the radius and h is the height (or length in this case).

The outer radius (R) is half of the outer diameter, so R = 10 cm / 2 = 5 cm.

The length (h) of the tape is 100 meters, which we need to convert to centimeters because the diameter is in centimeters. 100 meters = 100 * 100 cm = 10000 cm.

Now, let's calculate the volume of the outer cylinder (V_outer): V_outer = π * R² * h V_outer = π * (5 cm)² * 10000 cm V_outer = π * 25 cm² * 10000 cm V_outer = 250000π cm³

Next, we need to calculate the volume of the inner cylinder, which represents the space that is not occupied by the tape. The inner radius (r) is half of the inner diameter, so r = 5 cm / 2 = 2.5 cm.

The volume of the inner cylinder (V_inner) is: V_inner = π * r² * h V_inner = π * (2.5 cm)² * 10000 cm V_inner = π * 6.25 cm² * 10000 cm V_inner = 62500π cm³

Now, to find the volume of the tape itself, we subtract the volume of the inner cylinder from the volume of the outer cylinder:

V_tile = V_outer - V_inner V_tile = 250000π cm³ - 62500π cm³ V_tile = 187500π cm³

Finally, to find the thickness of the tape, we divide the volume of the tape by the surface area of the inner cylinder (since the thickness will be uniform and we are considering the volume that the tape occupies over the inner surface area):

Surface area of the inner cylinder (A_inner) is: A_inner = 2π * r * h A_inner = 2π * (2.5 cm) * 10000 cm A_inner = 50000π cm²

The thickness of the tape (t) is: t = V_tile / A_inner t = 187500π cm³ / 50000π cm² t = 3.75 cm

So, the thickness of the tape is 3.75 cm

Since the thickness of the tape is not a practical length (it's too large and not realistic for a thin tape), we need to reconsider our calculation. The thickness should be in the same order of magnitude as the width of the tape, which is 0.2 cm. Let's correct this by using the correct formula for the volume of the cylindrical shell (the tape):

V_tile = A_inner * t

We can now solve for the thickness (t):

t = V_tile / A_inner t = 187500π cm³ / (π * 50000 cm * 100 cm) t = 187500 / (50000 * 100) t = 0.375 cm

Therefore, the thickness of the tape is 0.375 cm.<|endoftext|> [end of text]

— Reply to this email directly, view it on GitHub https://github.com/ggerganov/llama.cpp/pull/7225#issuecomment-2124889496, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVG6CRGHB5QH5J3VVOUZC3ZDSQ6VAVCNFSM6AAAAABHSEOE66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRUHA4DSNBZGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

RonanKMcGovern avatar May 23 '24 10:05 RonanKMcGovern

I guess I got lucky with a seed. Phi-3-medium (q4 from https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF, temp 0.1):

There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?

Answer:

To find out how thick the tape is, we can use the formula for the volume of a cylindrical shell (the shape that represents the unrolled tape). The formula is V = πh(R^2 - r^2), where h is the height or length of the cylinder, R is the outer radius and r is the inner radius.
First, let's convert all measurements to meters for consistency:
- Length (height) of tape, h = 100 m
- Outer diameter = 10 cm = 0.1 m
- Inner diameter = 5 cm = 0.05 m
Now we can find the outer and inner radii by dividing the diameters by 2:
- Outer radius, R = 0.1 m / 2 = 0.05 m
- Inner radius, r = 0.05 m / 2 = 0.025 m
Now we can plug these values into the formula for volume of a cylindrical shell:
V = πh(R^2 - r^2)
We know that V (volume) is equal to the length times the thickness, so let's call the thickness "t" and set up an equation:
100 m * t = π * 100 m * ((0.05 m)^2 - (0.025 m)^2)
Now we can solve for t:
t = (π * ((0.05 m)^2 - (0.025 m)^2)) / 100
t ≈ (3.1416 * (0.0025 m^2 - 0.000625 m^2)) / 100
t ≈ (3.1416 * 0.001875 m^2) / 100
t ≈ 0.0059 m
So the thickness of the tape is approximately 0.0059 meters, or about 5.9 millimeters.

Did you try running a 16 bit gguf model and seeing how that performs?

I tried the prompt on the 128k mini instruct model (f16 and Q4_K_M from https://huggingface.co/AlessandroW/Phi-3-mini-128k-instruct-gguf) and both models performed similar to an older ChatGPT version https://neil.fraser.name/news/2023/02/17/. The quantized model replied:

<|user|> There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?<|end|><|assistant|> To find the thickness of the tape, we need to calculate the volume of the tape when it is rolled up and then use that to find the thickness.

First, let's calculate the volume of the cylinder formed by the outer diameter. The formula for the volume of a cylinder is V = πr²h, where r is the radius and h is the height (or length in this case).

The outer radius (r_outer) is half of the outer diameter, so r_outer = 10 cm / 2 = 5 cm.

The length of the tape (h) is 100 meters, but we need to convert it to centimeters because the radius is in centimeters. So, h = 100 meters * 100 cm/meter = 10000 cm.

Now, let's calculate the volume of the outer cylinder (V_outer):
V_outer = π * (r_outer)² * h
V_outer = π * (5 cm)² * 10000 cm
V_outer = π * 25 cm² * 10000 cm
V_outer = 250000π cm³

Next, let's calculate the volume of the inner cylinder (V_inner), which represents the empty space inside the tape. The inner radius (r_inner) is half of the inner diameter, so r_inner = 5 cm / 2 = 2.5 cm.

The volume of the inner cylinder (V_inner) is:
V_inner = π * (r_inner)² * h
V_inner = π * (2.5 cm)² * 10000 cm
V_inner = π * 6.25 cm² * 10000 cm
V_inner = 62500π cm³

The volume of the tape itself (V_tape) is the difference between the outer and are not standard mathematical operations, and thus are not applicable in this context.

Since the question seems to be asking for the volume of the tape itself, we will subtract the inner volume from the outer volume to find the volume of the tape:

V_tape = V_outer - V_inner
V_tape = 250000π cm³ - 62500π cm³
V_tape = 187500π cm³

To get the numerical value, we use the approximation π ≈ 3.14159:

V_tape ≈ 187500 * 3.14159 cm³
V_tape ≈ 588746.25 cm³

Therefore, the volume of the tape itself is approximately 588746.25 cubic centimeters.<|endoftext|>

The f16 model replied

<|user|> There is a roll of tape. The tape is 100 meters long when unrolled. When rolled up, the outer diameter is 10 cm, and the inner diameter is 5 cm. How thick is the tape?<|end|><|assistant|> To find the thickness of the tape, we need to calculate the volume of the tape when it is rolled up and then use that to find the thickness.

First, let's find the volume of the entire length of the tape if it were a cylinder with the outer diameter. The formula for the volume of a cylinder is V = πr²h, where r is the radius and h is the height (or length in this case).

The outer radius (R) is half of the outer diameter, so R = 10 cm / 2 = 5 cm.

The length (h) of the tape is 100 meters, which we need to convert to centimeters because the diameter is in centimeters. 100 meters = 100 * 100 cm = 10000 cm.

Now, let's calculate the volume of the outer cylinder (V_outer):
V_outer = π * R² * h
V_outer = π * (5 cm)² * 10000 cm
V_outer = π * 25 cm² * 10000 cm
V_outer = 250000π cm³

Next, we need to calculate the volume of the inner cylinder, which represents the space that is not occupied by the tape. The inner radius (r) is half of the inner diameter, so r = 5 cm / 2 = 2.5 cm.

The volume of the inner cylinder (V_inner) is:
V_inner = π * r² * h
V_inner = π * (2.5 cm)² * 10000 cm
V_inner = π * 6.25 cm² * 10000 cm
V_inner = 62500π cm³

Now, to find the volume of the tape itself, we subtract the volume of the inner cylinder from the volume of the outer cylinder:

V_tile = V_outer - V_inner
V_tile = 250000π cm³ - 62500π cm³
V_tile = 187500π cm³

Finally, to find the thickness of the tape, we divide the volume of the tape by the surface area of the inner cylinder (since the thickness will be uniform and we are considering the volume that the tape occupies over the inner surface area):

Surface area of the inner cylinder (A_inner) is:
A_inner = 2π * r * h
A_inner = 2π * (2.5 cm) * 10000 cm
A_inner = 50000π cm²

The thickness of the tape (t) is:
t = V_tile / A_inner
t = 187500π cm³ / 50000π cm²
t = 3.75 cm

So, the thickness of the tape is 3.75 cm

Since the thickness of the tape is not a practical length (it's too large and not realistic for a thin tape), we need to reconsider our calculation. The thickness should be in the same order of magnitude as the width of the tape, which is 0.2 cm. Let's correct this by using the correct formula for the volume of the cylindrical shell (the tape):

V_tile = A_inner * t

We can now solve for the thickness (t):

t = V_tile / A_inner
t = 187500π cm³ / (π * 50000 cm * 100 cm)
t = 187500 / (50000 * 100)
t = 0.375 cm

Therefore, the thickness of the tape is 0.375 cm.<|endoftext|> [end of text]

RachidAR avatar May 23 '24 12:05 RachidAR