List index out of range error in PoFileParser._add_message
The below code triggers a list index out of range error when provided a malformed input. This occurs in _add_message at line 235 in babel/messages/pofile.py.
import sys
import io
import babel.messages.pofile
import babel
def main():
input = io.StringIO(open(sys.argv[1], "r").read())
babel.messages.pofile.read_po(input)
if __name__ == "__main__":
main()
Version
babel version 2.16.0
POC File
https://github.com/FuturesLab/POC/blob/main/babel/poc-01
How to trigger:
python filename.py poc-01
Trace report
Traceback (most recent call last):
File "rep.py", line 11, in <module>
main()
File "rep.py", line 7, in main
babel.messages.pofile.read_po(input)
File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 434, in read_po
parser.parse(fileobj)
File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 361, in parse
self._finish_current_message()
File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 250, in _finish_current_message
self._add_message()
File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 235, in _add_message
string = self.translations[0][1].denormalize()
IndexError: list index out of range
Here's a smaller reproducer:
import io
from babel.messages.pofile import read_po
po = io.StringIO('msgid ""')
read_po(po)
Neither is a valid PO file so we should ideally raise a PoFileError in this case. @gabe-sherman would you like to open a PR?
Hey Tomas! Thanks for the response. I don't have a deep understanding of the way the PO files should be processed so a full PR may be challenging, but if I'm able to get some time I'll certainly check it out :)
I think checking if self.translations is empty inside _finish_current_message() and raising a PoFileError if that's the case might work. In any case, feel free to ask if you need help :)
Sounds good, thanks! Should the process exit when this is detected or should we only raise the warning and leave the exception raising up to the value of self.abort_invalid.
Right, there's this abort_invalid flag. I guess if it's set to True, we should just raise, and if not maybe we could insert a dummy translation, something like (0, '') (and emit a warning)?
Is the invariant here that if the messages list has an element in it, the translations list should also be populated? Adding in this check at the beginning of _finish_current_message triggers this new warning on valid files in cases where both self.translations and self.messages are empty. However, when this check to the size of self.translations is moved inside the if self.messages: block it behaves as expected, although I haven't done robust testing yet. I just want to make sure we're not triggering this exception in cases where it's okay that the translations list is empty.
To be honest, I'm not that familiar with the parser code either, but I think you are right. We should only raise/warn when we call _finish_current_message while self.messages is not empty and self.translations is, i.e. something like this:
def _finish_current_message(self) -> None:
if self.messages:
if not self.translations:
# Handle error here
self._add_message()
Yea that's what I was thinking as well :). I'll do some testing on it.
I think this can be closed now :)