stopes icon indicating copy to clipboard operation
stopes copied to clipboard

A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.

Results 35 stopes issues
Sort by recently updated
recently updated
newest added

## Why ? This is example code of how to change the mining pipeline to add some extra filtering in the mining phase as a starting point for the Prague's...

CLA Signed

I am trying to mine bitext with one language having 63292172 number of lines . My mining fails with following error. CUDA out of memory. Tried to allocate 22.62 GiB...

I want to train the NLLB model, as instructed by the data [ReadMe](https://github.com/facebookresearch/fairseq/tree/nllb/examples/nllb/data) documentation, I have tried the filtering pipeline and got the output of `populate_data_conf.py` and `compute_length_factors.py`. But I...

## Why ? Wikipedia expressed interest into hosting the model themselves, so I'm sharing the code to push the model to AWS ## How ? The deploy.py script will: 1....

CLA Signed

I am trying to run Demo ```bash python -m stopes.pipelines.bitext.global_mining_pipeline src_lang=fuv tgt_lang=zul +preset=demo embed_text=laser3 ``` But I am receiving the following error: ``` 2024-03-11 11:48 INFO 1036353:stopes.moses - Preprocess fuv...

Can you offer support for the ALTI attribution method for LLMs such as LLAMA?

Hello, I am trying to run AutoPCP on my computer but I get the following error. Any help would be greatly appreciated. `Error in call to target 'stopes.core.launcher.Launcher': AttributeError("'posix_ipc.Semaphore' object...

## Why ? I add a few extra methods to CompareAudiosModule for the tutorial purpose. ## How ? Document the technical decisions you made. If some parts are WIP, please...

CLA Signed

At the moment you cant use tibetian language tokenizer. It gives the error message: `TypeError: "module" object is not callable` The error is thrown here in sentence_split.py: ``` elif split_algo...

![image](https://github.com/facebookresearch/stopes/assets/78583932/1c36ab85-2711-429f-8cb0-e161cd993304)

alti