stopes
stopes copied to clipboard
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
## Why ? This is example code of how to change the mining pipeline to add some extra filtering in the mining phase as a starting point for the Prague's...
I am trying to mine bitext with one language having 63292172 number of lines . My mining fails with following error. CUDA out of memory. Tried to allocate 22.62 GiB...
I want to train the NLLB model, as instructed by the data [ReadMe](https://github.com/facebookresearch/fairseq/tree/nllb/examples/nllb/data) documentation, I have tried the filtering pipeline and got the output of `populate_data_conf.py` and `compute_length_factors.py`. But I...
## Why ? Wikipedia expressed interest into hosting the model themselves, so I'm sharing the code to push the model to AWS ## How ? The deploy.py script will: 1....
I am trying to run Demo ```bash python -m stopes.pipelines.bitext.global_mining_pipeline src_lang=fuv tgt_lang=zul +preset=demo embed_text=laser3 ``` But I am receiving the following error: ``` 2024-03-11 11:48 INFO 1036353:stopes.moses - Preprocess fuv...
Can you offer support for the ALTI attribution method for LLMs such as LLAMA?
Hello, I am trying to run AutoPCP on my computer but I get the following error. Any help would be greatly appreciated. `Error in call to target 'stopes.core.launcher.Launcher': AttributeError("'posix_ipc.Semaphore' object...
## Why ? I add a few extra methods to CompareAudiosModule for the tutorial purpose. ## How ? Document the technical decisions you made. If some parts are WIP, please...
At the moment you cant use tibetian language tokenizer. It gives the error message: `TypeError: "module" object is not callable` The error is thrown here in sentence_split.py: ``` elif split_algo...
