bert.cpp icon indicating copy to clipboard operation
bert.cpp copied to clipboard

ggml implementation of BERT

Results 28 bert.cpp issues
Sort by recently updated
recently updated
newest added

We ran one fast api on a machine on two different port. The API make socket connection to the bert.cpp socket server. In this only first app is connected and...

When I run the the `build/bin/main` example with a larger input I get a segfault: ``` ggml_new_tensor_impl: not enough space in the context's memory pool (needed 271388624, available 260703040) Segmentation...

After modifying the value of n_max_tokens in bert.cpp from "int32_t n_max_tokens = 512;" to "int32_t n_max_tokens = 10000;", I proceeded to rebuild the project. However, upon testing, the value of...

This is a good work but as ggml is phased out, any plan to support gguf

Is this repository ever going to be updated and/or worked on or has it been abandoned?

I have seen where I can set the GGML_USE_CUBLAS, and I can follow the few #defines that activate the code, but the tensors are all on the CPU. I'm not...

For example, a model like this: https://huggingface.co/aloxatel/bert-base-mnli If so, how would I do inference on it?

As mention in title, `https://github.com/mlc-ai/tokenizers-cpp` is a good implement for token. Maybe persons do not like another dependency, but it is worthy.

When I try to run the server example I get an error ``` bert_load_from_file: loading model from 'models/all-MiniLM-L6-v2/ggml-model-q4_0.bin' - please wait ... bert_load_from_file: n_vocab = 30522 bert_load_from_file: n_max_tokens = 512...

- `type_vocab_size` is also a hparam (can not use const as 2). - so does the converter.