audapolis icon indicating copy to clipboard operation
audapolis copied to clipboard

Where to find French punctuation models

Open Sircam19 opened this issue 3 years ago • 5 comments

Hello. I love what your team has created, it an amazingly impressive tool. What I have noticed was that in exporting subtitles (SRTs) all text is contained within one time duration. I then realised that there wasn't a punctuation model for French. Where could I find one? Or how could I fix punctuation. Your tool has so much potential and glad I came across it.

Hallo. Ich liebe, was Ihr Team geschaffen hat, es ist ein erstaunlich beeindruckendes Werkzeug. Was mir aufgefallen ist, ist, dass beim Exportieren von Untertiteln (SRTs) der gesamte Text in einer Zeitdauer enthalten ist. Dann habe ich festgestellt, dass es kein Interpunktionsmodell für Französisch gibt. Wo könnte ich eines finden? Oder wie könnte ich die Zeichensetzung korrigieren. Ihr Tool hat so viel Potenzial, und ich bin froh, dass ich es gefunden habe.

Sircam19 avatar Jan 20 '23 21:01 Sircam19

If I have found additional punctuation models how can I add them directly into Audapolis. Where in the file structure of the app can additional punctuation models be placed.

Sircam19 avatar Feb 12 '23 11:02 Sircam19

Sadly using out-of-tree punctuation models is currently not supported. However, if you link us to the french punctuation model, we could add it / you can make a pull request

anuejn avatar Feb 12 '23 18:02 anuejn

Hello Anuejn. So happy to see progression on this tool as I think it is amazing and is SO full of potential. I am not a coder, but am trying to learn...So perhaps what I found, and will provide as a link, is not the way to go. However, from what I can determine alot of punctuation models are based on the Europarl project. I found a multilanguage model under the Oliverguhr language models available on hugging face. I wondered if this data set could be used to inform / support the language models within audapolis that are missing punctuation models. Here's the link --> https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large. Again perhaps I am off base as I am not a coder but very interested. Merci.

Sircam19 avatar Feb 12 '23 18:02 Sircam19

We are currently using the punctuator2 python library for punctuation reconstruction and would need a model for that - if it should be drop in. The model you linked uses a different library that would require additional work to integrate.

anuejn avatar Feb 12 '23 19:02 anuejn

Thanks anuejn. I knew it couldn't be that simple :-) I'll look around for what I can find and would like to help AND learn at the same time. Reiterating again, happy to help and really enjoy the Audapolis. It's amazing. Merci.

Sircam19 avatar Feb 12 '23 19:02 Sircam19