uda icon indicating copy to clipboard operation
uda copied to clipboard

Noisy data generated by back translation

Open zwjyyc opened this issue 6 years ago • 1 comments

Very interesting work and thanks for sharing the code!

I am very interested in translation-based augmentation. I have generated some examples by running the run.sh, but some noisy ones are found and listed as follows:

(1) in forward generation; the input "could i get the address , phone number , and postcode of yu garden ?" and the output "The hotel is small location, the location is ideal and the food is fantastic.",

(2)in forward generation; the input "hi , i 'm looking for a nice german restaurant ." and the output "I was at listening to my room and we were even coming in the main area from 9 weeks. I also liked this hotel, this is a great boutique hotel."

(3)in forward generation; the input "i do n't care ." and the output "Sinon pour la plupart, je ne pense pas qu'il y ait un tel problème qui se pose à vous. Je n'ai pas l'intention de le faire."

Do you have any suggestions to avoid these errors?

Thanks!

zwjyyc avatar Aug 22 '19 07:08 zwjyyc

You can lower the temperature. Please refer to the README for more information. As a sanity check, you can set the temperature to 0 and the model should generate perfectly valid but identical paraphrases.

michaelpulsewidth avatar Aug 23 '19 17:08 michaelpulsewidth