stopes issues

MT Marathon 2022 starting point

## Why ? This is example code of how to change the mining pipeline to add some extra filtering in the mining phase as a starting point for the Prague's...

Mortimerp9

CLA Signed

Index training failing with CUDA out of memory error.

3

I am trying to mine bitext with one language having 63292172 number of lines . My mining fails with following error. CUDA out of memory. Tried to allocate 22.62 GiB...

oneraghavan

How to create training data through pipeline

4

I want to train the NLLB model, as instructed by the data [ReadMe](https://github.com/facebookresearch/fairseq/tree/nllb/examples/nllb/data) documentation, I have tried the filtering pipeline and got the output of `populate_data_conf.py` and `compute_length_factors.py`. But I...

b3y0nd

open source publishing code

## Why ? Wikipedia expressed interest into hosting the model themselves, so I'm sharing the code to push the model to AWS ## How ? The deploy.py script will: 1....

gwenzek

CLA Signed

Runtime Error when running Demo

I am trying to run Demo ```bash python -m stopes.pipelines.bitext.global_mining_pipeline src_lang=fuv tgt_lang=zul +preset=demo embed_text=laser3 ``` But I am receiving the following error: ``` 2024-03-11 11:48 INFO 1036353:stopes.moses - Preprocess fuv...

rumourscape

attribution of LLMs

1

Can you offer support for the ALTI attribution method for LLMs such as LLAMA?

Wafaa014

Error with posix_ipc semaphore when trying to run AutoPCP

2

Hello, I am trying to run AutoPCP on my computer but I get the following error. Any help would be greatly appreciated. `Error in call to target 'stopes.core.launcher.Launcher': AttributeError("'posix_ipc.Semaphore' object...

andysegura89

[WIP] more Flexible AutoPCP interface

## Why ? I add a few extra methods to CompareAudiosModule for the tutorial purpose. ## How ? Document the technical decisions you made. If some parts are WIP, please...

avidale

CLA Signed

Bug in tokenizer for Tibetian Language

1

At the moment you cant use tibetian language tokenizer. It gives the error message: `TypeError: "module" object is not callable` The error is thrown here in sentence_split.py: ``` elif split_algo...

asusdisciple

the list index overflow

1

![image](https://github.com/facebookresearch/stopes/assets/78583932/1c36ab85-2711-429f-8cb0-e161cd993304)

zhenghuawang6

alti

stopes
stopes copied to clipboard

Metadata

MT Marathon 2022 starting point

Index training failing with CUDA out of memory error.

How to create training data through pipeline

open source publishing code

Runtime Error when running Demo

attribution of LLMs

Error with posix_ipc semaphore when trying to run AutoPCP

[WIP] more Flexible AutoPCP interface

Bug in tokenizer for Tibetian Language

the list index overflow

← Metadata

Owner

Metadata

stopes stopes copied to clipboard

Metadata

← Metadata

Owner

Metadata

stopes
stopes copied to clipboard