Change Alphabet::EncodeSingle and Alphabet::DecodeSingle to use an outparam and return an error code instead of calling `abort`
I run into the error "Invalid label 0". I assume it is from the function Alphabet::DecodeSingle() in the file native_client/alphabet.cc.
I don't know why I am running into this error and it is not important for now. What is important is the fact that the code calls abort() just after printing this error.
I don't know your guidelines. But I can tell that aborting the whole process, including the python interpreter which might be interactive, is a terrible way of handling errors, especially in such a trivial function, and I call it a bug.
I don't know your guidelines. But I can tell that aborting the whole process, including the python interpreter which might be interactive, is a terrible way of handling errors, especially in such a trivial function, and I call it a bug.
it sounds like we broke badly some of your workflow, sorry about that.
I think that this abort() comes back from a long time and is here to avoid people getting into other bad situation because of invalid alphabets, and recent refactoring of Alphabet code made that exposed to Python when it was not the case in the past.
Not sure though if we can properly fix that very quickly, it'd need better / proper error handling from C++ to Python CTC decoder?
I have tried to replace the abort() call by returning something, but I then get a segfault.
I will let you know if I find something.
I have tried to replace the
abort()call by returning something, but I then get a segfault.
Well, I suspect because of your alphabet issue, since it's playing with array index and you have error with label 0, it would not be surprising that it segfault if you remove the abort()
The behavior is documented, although not in the Python bindings. I'll make a PR to expose the docs there as well.
#3176 - basically, use the CanEncode and CanEncodeSingle methods to test for presence of a character in the alphabet.
I was missing the fact that our own training code is not using these APIs correctly. Pushed another commit fixing that as well.
I was not using the Alphabet API myself. My issue was that I initialized my Scorer improperly before calling ctc_beam_search_decoder(). Maybe, the code there is also not using the Alphabet API correctly ?
By the way, the time before getting answered on this project is amazingly short. It is pretty cool !
I have fixed the training code path that could hit this problem, but the abort is still there. We're happy to review a PR converting the function to return an error code and take an outparam. All callers would also have to be adapted. Because this is a non-trivial work and it can only be reached by directly calling into the Scorer and decoder APIs outside of our training interface, we're gonna put it on the backlog for now.
@godefv Given the recent changes, I think we should take the time to properly do that change. Do you want to take care of that?
I can do it eventually, but I don't know when. I wonder if it would be a good case to throw an exception instead of an error code. I usually don't like exception, but python already have them and I guess C++ exceptions could be converted somehow to python exceptions.
Hey Developers, I am Yash Sehgal. I just started this project as my first open-source project. I think I can work on this feature and help you to remove the bug, Please guide me throughout the process. Thanks
Hi @yashsehgal you could start by joining us on #machinelearning on Matrix. It will be a bit difficult to guide you through GitHub issues! Thanks for your interest :)