bert.cpp
bert.cpp copied to clipboard
ggml implementation of BERT
We ran one fast api on a machine on two different port. The API make socket connection to the bert.cpp socket server. In this only first app is connected and...
When I run the the `build/bin/main` example with a larger input I get a segfault: ``` ggml_new_tensor_impl: not enough space in the context's memory pool (needed 271388624, available 260703040) Segmentation...
After modifying the value of n_max_tokens in bert.cpp from "int32_t n_max_tokens = 512;" to "int32_t n_max_tokens = 10000;", I proceeded to rebuild the project. However, upon testing, the value of...
This is a good work but as ggml is phased out, any plan to support gguf
Is this repository ever going to be updated and/or worked on or has it been abandoned?
I have seen where I can set the GGML_USE_CUBLAS, and I can follow the few #defines that activate the code, but the tensors are all on the CPU. I'm not...
For example, a model like this: https://huggingface.co/aloxatel/bert-base-mnli If so, how would I do inference on it?
As mention in title, `https://github.com/mlc-ai/tokenizers-cpp` is a good implement for token. Maybe persons do not like another dependency, but it is worthy.
When I try to run the server example I get an error ``` bert_load_from_file: loading model from 'models/all-MiniLM-L6-v2/ggml-model-q4_0.bin' - please wait ... bert_load_from_file: n_vocab = 30522 bert_load_from_file: n_max_tokens = 512...
- `type_vocab_size` is also a hparam (can not use const as 2). - so does the converter.