DeepSpeech icon indicating copy to clipboard operation
DeepSpeech copied to clipboard

Change Alphabet::EncodeSingle and Alphabet::DecodeSingle to use an outparam and return an error code instead of calling `abort`

Open godefv opened this issue 5 years ago • 14 comments

I run into the error "Invalid label 0". I assume it is from the function Alphabet::DecodeSingle() in the file native_client/alphabet.cc.

I don't know why I am running into this error and it is not important for now. What is important is the fact that the code calls abort() just after printing this error.

I don't know your guidelines. But I can tell that aborting the whole process, including the python interpreter which might be interactive, is a terrible way of handling errors, especially in such a trivial function, and I call it a bug.

godefv avatar Jul 22 '20 14:07 godefv

I don't know your guidelines. But I can tell that aborting the whole process, including the python interpreter which might be interactive, is a terrible way of handling errors, especially in such a trivial function, and I call it a bug.

it sounds like we broke badly some of your workflow, sorry about that.

I think that this abort() comes back from a long time and is here to avoid people getting into other bad situation because of invalid alphabets, and recent refactoring of Alphabet code made that exposed to Python when it was not the case in the past.

lissyx avatar Jul 22 '20 14:07 lissyx

Not sure though if we can properly fix that very quickly, it'd need better / proper error handling from C++ to Python CTC decoder?

lissyx avatar Jul 22 '20 14:07 lissyx

I have tried to replace the abort() call by returning something, but I then get a segfault. I will let you know if I find something.

godefv avatar Jul 22 '20 15:07 godefv

I have tried to replace the abort() call by returning something, but I then get a segfault.

Well, I suspect because of your alphabet issue, since it's playing with array index and you have error with label 0, it would not be surprising that it segfault if you remove the abort()

lissyx avatar Jul 22 '20 15:07 lissyx

The behavior is documented, although not in the Python bindings. I'll make a PR to expose the docs there as well.

reuben avatar Jul 23 '20 10:07 reuben

#3176 - basically, use the CanEncode and CanEncodeSingle methods to test for presence of a character in the alphabet.

reuben avatar Jul 23 '20 11:07 reuben

I was missing the fact that our own training code is not using these APIs correctly. Pushed another commit fixing that as well.

reuben avatar Jul 23 '20 11:07 reuben

I was not using the Alphabet API myself. My issue was that I initialized my Scorer improperly before calling ctc_beam_search_decoder(). Maybe, the code there is also not using the Alphabet API correctly ?

godefv avatar Jul 23 '20 11:07 godefv

By the way, the time before getting answered on this project is amazingly short. It is pretty cool !

godefv avatar Jul 23 '20 11:07 godefv

I have fixed the training code path that could hit this problem, but the abort is still there. We're happy to review a PR converting the function to return an error code and take an outparam. All callers would also have to be adapted. Because this is a non-trivial work and it can only be reached by directly calling into the Scorer and decoder APIs outside of our training interface, we're gonna put it on the backlog for now.

reuben avatar Jul 23 '20 14:07 reuben

@godefv Given the recent changes, I think we should take the time to properly do that change. Do you want to take care of that?

lissyx avatar Sep 02 '20 10:09 lissyx

I can do it eventually, but I don't know when. I wonder if it would be a good case to throw an exception instead of an error code. I usually don't like exception, but python already have them and I guess C++ exceptions could be converted somehow to python exceptions.

godefv avatar Sep 02 '20 13:09 godefv

Hey Developers, I am Yash Sehgal. I just started this project as my first open-source project. I think I can work on this feature and help you to remove the bug, Please guide me throughout the process. Thanks

yashsehgal avatar Sep 28 '20 15:09 yashsehgal

Hi @yashsehgal you could start by joining us on #machinelearning on Matrix. It will be a bit difficult to guide you through GitHub issues! Thanks for your interest :)

ftyers avatar Sep 28 '20 15:09 ftyers