Bing Han
Bing Han
Hi. I want to implement a feature to train a generic translation model on a large dataset, where we have many translator users, each of whom has a personal memory...
I have read the [api doc](https://github.com/OpenNMT/CTranslate2/blob/master/docs/python.md), finding that the decode options do not support Top-p (nucleus) sampling which described in this blog https://huggingface.co/blog/how-to-generate#top-p-nucleus-sampling Is it possible to support this feature...
I'm working on a machine translation task. When I encode corpus with bpemb, the output is always lower case. Is it possible to retain case information after encode my corpus?
``` (掌声)这个是,盛装舞步。 ``` The result generated by command `sacremoses -l zh -j 4 tokenize < input > output` is ``` ( 掌声 ) 这个是 , 盛装舞步。 ``` I think it...
我运行了官方的例子,solr默认engine的名字为collection1,我想在solr中添加一个collection,我找到了solr_tools 中的add_engine方法,add_engine("localhost", "collection2", 8900)调用后报了如下错误,不知道如何解决 ```bash *****[Request Error]: Solr instance is not running in SolrCloud mode. ```
# Before submitting - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [ ] Did...
``` @pytest.mark.server(url=f'/termprojsbypage/{moke_terms_db_id}', response=fake_terms_db_1, method='GET') @pytest.mark.server_settings(port=free_port) def test_word_level_mask(self): import time time.sleep(2) config = { "term_mask_url": f"http://127.0.0.1:{self.free_port}", "translate_src_lang": "en", } protector = CodeSwitchProtector(config) for case in self._word_level_test_cases: sent, terms = protector.mask(case.text, self.moke_terms_db_id)...
Hi! After reading a lot of information, it seems that in the field of machine translation, it is more likely to use a small amount of parallel corpus for fine-tuning,...