babel icon indicating copy to clipboard operation
babel copied to clipboard

Check pofile string delimiters

Open rtobar opened this issue 1 year ago • 3 comments

This PR adds checks to the pofile parser code to validate that message strings are correctly delimited by double quotes. Keeping with the current design, an error is only raised if requested, otherwise a warning is printed, the faulty lines are corrected and parsing goes on.

I found this issue while processing a pofile used in the Spanish translation of the CPython documentation. One of our files was incorrectly written, and from all our tooling only the msgcat tool of GNU's gettext package complained, while babel, polib and others didn't. See https://github.com/python/python-docs-es/pull/2873, https://github.com/izimobil/polib/pull/161 and https://git.afpy.org/AFPy/powrap/pulls/4 for further reference.

While implementing this change I found that the _NormalizedString class not only was used to contain message lines, but also participated in the parsing process (and hid some parsing as well). I thus broke down my changes into three separate commits:

  • I first clarified the usage of the current _NormalizedString class across the codebase (see details in commit).
  • I then added the double quote delimitation check logic I wanted to add to the parser
  • Now that all strings have the same form, I more formally constrained how _NormalizedString behaves

Along the way I also implemented three small quality-of-life changes. They are included as the first three commits of this PR, happy to submit these separately if required:

  • Avoid re-compiling a regular expression
  • Remove a duplicate test assertion
  • Perform a better assertion in a particular test, allegedly what was intended in the first place

rtobar avatar Nov 17 '24 15:11 rtobar

Gentle ping, at least to kick off CI and check if there's any obvious mistakes to be fixed

rtobar avatar Nov 21 '24 07:11 rtobar

Codecov Report

Attention: Patch coverage is 95.45455% with 1 line in your changes missing coverage. Please review.

Project coverage is 91.26%. Comparing base (740ee3d) to head (1a9a142). Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
babel/messages/pofile.py 95.45% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1151      +/-   ##
==========================================
- Coverage   91.27%   91.26%   -0.02%     
==========================================
  Files          27       27              
  Lines        4619     4634      +15     
==========================================
+ Hits         4216     4229      +13     
- Misses        403      405       +2     
Flag Coverage Δ
macos-12-3.10 ?
macos-12-3.11 ?
macos-12-3.12 ?
macos-12-3.13-dev ?
macos-12-3.8 ?
macos-12-3.9 ?
macos-12-pypy3.10 ?
ubuntu-22.04-3.10 90.05% <95.45%> (-0.02%) :arrow_down:
ubuntu-22.04-3.11 89.98% <95.45%> (-0.02%) :arrow_down:
ubuntu-22.04-3.12 90.20% <95.45%> (-0.02%) :arrow_down:
ubuntu-22.04-3.13-dev 89.72% <95.45%> (-0.01%) :arrow_down:
ubuntu-22.04-3.8 89.98% <95.45%> (-0.02%) :arrow_down:
ubuntu-22.04-3.9 89.98% <95.45%> (-0.02%) :arrow_down:
ubuntu-22.04-pypy3.10 90.05% <95.45%> (-0.02%) :arrow_down:
windows-2022-3.10 90.17% <95.45%> (-0.02%) :arrow_down:
windows-2022-3.11 90.10% <95.45%> (-0.02%) :arrow_down:
windows-2022-3.12 90.32% <95.45%> (-0.02%) :arrow_down:
windows-2022-3.13-dev 89.84% <95.45%> (-0.02%) :arrow_down:
windows-2022-3.8 90.09% <95.45%> (-0.02%) :arrow_down:
windows-2022-3.9 90.09% <95.45%> (-0.02%) :arrow_down:
windows-2022-pypy3.10 90.17% <95.45%> (-0.02%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Dec 09 '24 12:12 codecov[bot]

@akx thanks for the patience .I finally found some time today to work on this. I opted for adding yet more checks (and tests) for the msgstr plural entries, hence ensuring their messages are as well-formed as the rest. I re-based on top of the latest master and push-forced, and I marked all conversations as resolved, mostly to keep track of what I needed to work on, so don't necessarily interpret that as me not being open to further suggestions.

rtobar avatar Dec 17 '24 08:12 rtobar