PyThaiTTS icon indicating copy to clipboard operation
PyThaiTTS copied to clipboard

Model "pythainlp/thaitts-onnx" in this repository requires validation

Open MendyPanelli232 opened this issue 6 months ago • 0 comments

Hi, @wannaphong, I'd like to report that a potentially risky pretrained models are being used in this project, which may pose backdoor threats. Please check the following code example:

pythaitts/pretrained/lunarlist_onnx.py

class LunarlistONNX:
    def __init__(self) -> None:
        self.encoder = ort.InferenceSession(hf_hub_download(repo_id="pythainlp/thaitts-onnx",filename="tacotron2encoder-th.onnx"))
        self.decoder = ort.InferenceSession(hf_hub_download(repo_id="pythainlp/thaitts-onnx",filename="tacotron2decoder-th.onnx"))
        self.postnet = ort.InferenceSession(hf_hub_download(repo_id="pythainlp/thaitts-onnx",filename="tacotron2postnet-th.onnx"))
        self.hifi = ort.InferenceSession(hf_hub_download(repo_id="pythainlp/thaitts-onnx",filename="vocoder.onnx"))
    def tts(self, text: str):
        mel = inference(text, self.encoder, self.decoder, self.postnet)
        return self.hifi.run(None, {"spec": mel[0]})

Issue Description

As shown above, in the pythaitts/pretrained/lunarlist_onnx.py, the model "pythainlp/thaitts-onnx" is used as the default model parameter in the hf_hub_download method and download all onnx files. Finally the tacotron2decoder-th.onnx file runs via self.hifi.run.

At the same time, the model is flagged as risky on the HuggingFace platform. The encode.onnx files in these models are marked as risky and may trigger backdoor threats. For certain specific inputs, the backdoor in the models could be activated, effectively altering the model's behavior.

Image

Related Risk Reports:model risk report

Suggested Repair Methods

  1. Convert the model to safer safetensors format and re-upload
  2. Try to regenerate the model using the latest onnx library
  3. Visually inspect the model using OSS tools like Netron. If no issues are found, report the false threat to the scanning platform

As a popular machine learning libraries, every potential risk could be propagated and amplified. Could you please address the above issues?

Thanks for your help~

Best regards, Mendy

MendyPanelli232 avatar Jul 14 '25 14:07 MendyPanelli232