bert.cpp issues

Is it possible to make multiple socket connection at the same time from same machine.

1

We ran one fast api on a machine on two different port. The API make socket connection to the bert.cpp socket server. In this only first app is connected and...

ZTAP0011

Segfault on large inputs?

2

When I run the the `build/bin/main` example with a larger input I get a segfault: ``` ggml_new_tensor_impl: not enough space in the context's memory pool (needed 271388624, available 260703040) Segmentation...

corani

How can I increase n_max_tokens?

1

After modifying the value of n_max_tokens in bert.cpp from "int32_t n_max_tokens = 512;" to "int32_t n_max_tokens = 10000;", I proceeded to rebuild the project. However, upon testing, the value of...

ztkg257

Any plan to support GGUF?

This is a good work but as ggml is phased out, any plan to support gguf

Jeevhi

Is this ever going to be updated?

6

Is this repository ever going to be updated and/or worked on or has it been abandoned?

BBC-Esq

Does this support CUDA?

1

I have seen where I can set the GGML_USE_CUBLAS, and I can follow the few #defines that activate the code, but the tensors are all on the CPU. I'm not...

SpaceCowboy850

Can this handle classification models, e.g. MNLI?

3

For example, a model like this: https://huggingface.co/aloxatel/bert-base-mnli If so, how would I do inference on it?

redthing1

Tokenizer in bert.cpp is not good enough, how about `tokenizers-cpp`

11

As mention in title, `https://github.com/mlc-ai/tokenizers-cpp` is a good implement for token. Maybe persons do not like another dependency, but it is worthy.

FFengIll

Error Running Server Example

1

When I try to run the server example I get an error ``` bert_load_from_file: loading model from 'models/all-MiniLM-L6-v2/ggml-model-q4_0.bin' - please wait ... bert_load_from_file: n_vocab = 30522 bert_load_from_file: n_max_tokens = 512...

BradSmith2015

Bugfix: missing hparam `type_vocab_size`

3

- `type_vocab_size` is also a hparam (can not use const as 2). - so does the converter.

FFengIll

bert.cpp
bert.cpp copied to clipboard

Metadata

Is it possible to make multiple socket connection at the same time from same machine.

Segfault on large inputs?

How can I increase n_max_tokens?

Any plan to support GGUF?

Is this ever going to be updated?

Does this support CUDA?

Can this handle classification models, e.g. MNLI?

Tokenizer in bert.cpp is not good enough, how about `tokenizers-cpp`

Error Running Server Example

Bugfix: missing hparam `type_vocab_size`

← Metadata

Owner

Metadata

bert.cpp bert.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

bert.cpp
bert.cpp copied to clipboard