babel icon indicating copy to clipboard operation
babel copied to clipboard

List index out of range error in PoFileParser._add_message

Open gabe-sherman opened this issue 1 year ago • 8 comments

The below code triggers a list index out of range error when provided a malformed input. This occurs in _add_message at line 235 in babel/messages/pofile.py.

import sys
import io
import babel.messages.pofile
import babel

def main():
    input = io.StringIO(open(sys.argv[1], "r").read())
    babel.messages.pofile.read_po(input)
    

if __name__ == "__main__":
    main()

Version

babel version 2.16.0

POC File

https://github.com/FuturesLab/POC/blob/main/babel/poc-01

How to trigger:

python filename.py poc-01

Trace report

Traceback (most recent call last):
  File "rep.py", line 11, in <module>
    main()
  File "rep.py", line 7, in main
    babel.messages.pofile.read_po(input)
  File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 434, in read_po
    parser.parse(fileobj)
  File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 361, in parse
    self._finish_current_message()
  File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 250, in _finish_current_message
    self._add_message()
  File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 235, in _add_message
    string = self.translations[0][1].denormalize()
IndexError: list index out of range

gabe-sherman avatar Sep 24 '24 18:09 gabe-sherman

Here's a smaller reproducer:

import io
from babel.messages.pofile import read_po

po = io.StringIO('msgid ""')
read_po(po)

Neither is a valid PO file so we should ideally raise a PoFileError in this case. @gabe-sherman would you like to open a PR?

tomasr8 avatar Sep 24 '24 21:09 tomasr8

Hey Tomas! Thanks for the response. I don't have a deep understanding of the way the PO files should be processed so a full PR may be challenging, but if I'm able to get some time I'll certainly check it out :)

gabe-sherman avatar Sep 24 '24 21:09 gabe-sherman

I think checking if self.translations is empty inside _finish_current_message() and raising a PoFileError if that's the case might work. In any case, feel free to ask if you need help :)

tomasr8 avatar Sep 24 '24 22:09 tomasr8

Sounds good, thanks! Should the process exit when this is detected or should we only raise the warning and leave the exception raising up to the value of self.abort_invalid.

gabe-sherman avatar Sep 24 '24 23:09 gabe-sherman

Right, there's this abort_invalid flag. I guess if it's set to True, we should just raise, and if not maybe we could insert a dummy translation, something like (0, '') (and emit a warning)?

tomasr8 avatar Sep 25 '24 07:09 tomasr8

Is the invariant here that if the messages list has an element in it, the translations list should also be populated? Adding in this check at the beginning of _finish_current_message triggers this new warning on valid files in cases where both self.translations and self.messages are empty. However, when this check to the size of self.translations is moved inside the if self.messages: block it behaves as expected, although I haven't done robust testing yet. I just want to make sure we're not triggering this exception in cases where it's okay that the translations list is empty.

gabe-sherman avatar Sep 25 '24 16:09 gabe-sherman

To be honest, I'm not that familiar with the parser code either, but I think you are right. We should only raise/warn when we call _finish_current_message while self.messages is not empty and self.translations is, i.e. something like this:

def _finish_current_message(self) -> None:
    if self.messages:
        if not self.translations:
            # Handle error here
        self._add_message()

tomasr8 avatar Sep 25 '24 17:09 tomasr8

Yea that's what I was thinking as well :). I'll do some testing on it.

gabe-sherman avatar Sep 25 '24 18:09 gabe-sherman

I think this can be closed now :)

tomasr8 avatar Oct 27 '24 20:10 tomasr8