Check pofile string delimiters
This PR adds checks to the pofile parser code to validate that message strings are correctly delimited by double quotes. Keeping with the current design, an error is only raised if requested, otherwise a warning is printed, the faulty lines are corrected and parsing goes on.
I found this issue while processing a pofile used in the Spanish translation of the CPython documentation. One of our files was incorrectly written, and from all our tooling only the msgcat tool of GNU's gettext package complained, while babel, polib and others didn't. See https://github.com/python/python-docs-es/pull/2873, https://github.com/izimobil/polib/pull/161 and https://git.afpy.org/AFPy/powrap/pulls/4 for further reference.
While implementing this change I found that the _NormalizedString class not only was used to contain message lines, but also participated in the parsing process (and hid some parsing as well). I thus broke down my changes into three separate commits:
- I first clarified the usage of the current
_NormalizedStringclass across the codebase (see details in commit). - I then added the double quote delimitation check logic I wanted to add to the parser
- Now that all strings have the same form, I more formally constrained how
_NormalizedStringbehaves
Along the way I also implemented three small quality-of-life changes. They are included as the first three commits of this PR, happy to submit these separately if required:
- Avoid re-compiling a regular expression
- Remove a duplicate test assertion
- Perform a better assertion in a particular test, allegedly what was intended in the first place
Gentle ping, at least to kick off CI and check if there's any obvious mistakes to be fixed
Codecov Report
Attention: Patch coverage is 95.45455% with 1 line in your changes missing coverage. Please review.
Project coverage is 91.26%. Comparing base (
740ee3d) to head (1a9a142). Report is 7 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| babel/messages/pofile.py | 95.45% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #1151 +/- ##
==========================================
- Coverage 91.27% 91.26% -0.02%
==========================================
Files 27 27
Lines 4619 4634 +15
==========================================
+ Hits 4216 4229 +13
- Misses 403 405 +2
| Flag | Coverage Δ | |
|---|---|---|
| macos-12-3.10 | ? |
|
| macos-12-3.11 | ? |
|
| macos-12-3.12 | ? |
|
| macos-12-3.13-dev | ? |
|
| macos-12-3.8 | ? |
|
| macos-12-3.9 | ? |
|
| macos-12-pypy3.10 | ? |
|
| ubuntu-22.04-3.10 | 90.05% <95.45%> (-0.02%) |
:arrow_down: |
| ubuntu-22.04-3.11 | 89.98% <95.45%> (-0.02%) |
:arrow_down: |
| ubuntu-22.04-3.12 | 90.20% <95.45%> (-0.02%) |
:arrow_down: |
| ubuntu-22.04-3.13-dev | 89.72% <95.45%> (-0.01%) |
:arrow_down: |
| ubuntu-22.04-3.8 | 89.98% <95.45%> (-0.02%) |
:arrow_down: |
| ubuntu-22.04-3.9 | 89.98% <95.45%> (-0.02%) |
:arrow_down: |
| ubuntu-22.04-pypy3.10 | 90.05% <95.45%> (-0.02%) |
:arrow_down: |
| windows-2022-3.10 | 90.17% <95.45%> (-0.02%) |
:arrow_down: |
| windows-2022-3.11 | 90.10% <95.45%> (-0.02%) |
:arrow_down: |
| windows-2022-3.12 | 90.32% <95.45%> (-0.02%) |
:arrow_down: |
| windows-2022-3.13-dev | 89.84% <95.45%> (-0.02%) |
:arrow_down: |
| windows-2022-3.8 | 90.09% <95.45%> (-0.02%) |
:arrow_down: |
| windows-2022-3.9 | 90.09% <95.45%> (-0.02%) |
:arrow_down: |
| windows-2022-pypy3.10 | 90.17% <95.45%> (-0.02%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@akx thanks for the patience .I finally found some time today to work on this. I opted for adding yet more checks (and tests) for the msgstr plural entries, hence ensuring their messages are as well-formed as the rest. I re-based on top of the latest master and push-forced, and I marked all conversations as resolved, mostly to keep track of what I needed to work on, so don't necessarily interpret that as me not being open to further suggestions.