Issue 242 - Fix court detection when parenthetical includes date
As described in #242, there is currently a problem in detecting the court in a full citation when the parenthetical includes the specific date on which the opinion was issued, e.g., (C.D. Cal. Feb. 9, 2015).
The problem is that we currently treat everything before the year as a potential court string, so C.D. Cal. Feb. 9 in the example above. This causes the court lookup in helpers.get_court_by_paren() to fail.
This PR fixes this by modifying the POST_FULL_CITATION_REGEX regex to include capturing groups for potential months and days. If these are present, we capture them so the <court> group remains pristine (i.e., only C.D. Cal.). If they are not present, the behavior should be the same as it currently is.
Thanks @mattdahl
(1) I edited the month regexes, but didn't follow your proposed format because it would have captured full month names with a period after them (e.g., January.). I think my version has the flexibility you wanted, but let me know if it's missing a case you had in mind.
(2) I exposed the captured month/day information on the metadata object. I did not, however, do any normalization of this information or expose it on the base citation object itself. (For the year data, we store the original captured string under citation.metadata.year AND a normalized int version under citation.year. I wasn't sure if we should also put the month/day at the top level like that, but since it's not particularly common I decided to just leave it in the metadata.)
(3) I considered editing corrected_citation_full() to include the month/day in its output when we have it, but I encountered an issue regarding the court normalization that takes place in that function that made me hesitate to do anything before we decide whether that behavior should itself be changed. See https://github.com/freelawproject/eyecite/issues/135#issuecomment-2819212924.
@mattdahl
That looks great.
I think a larger PR landed and it had an expanded test that included a month and day
("Corp. v. Nature's Farm Prods., No. 99 Civ. 9404 (SHS), 2000 U.S. Dist. LEXIS 12335 (S.D.N.Y. Aug. 25, 2000)",
[case_citation(
volume='2000',
reporter='U.S. Dist. LEXIS',
page='12335',
year=2000,
metadata={'plaintiff': "Corp.", 'defendant': "Nature's Farm Prods., No. 99 Civ. 9404 (SHS)", "month": "Aug.", "day": "25", "court": "nysd"})
],),
this is the fixed version, I think if we add that in we can land this
Thanks, I just fixed that test in 71a46d4e98e53888f588280a74380b5ce546c66c. Also, v2.7.1 just landed so I rebased again.
you have permission to run theses test right? or am I going crazy?
I think the tests are fine but the benchmark check has never be able to run properly for me, for some reason.
@mattdahl I ran it locally and it looks fine. Can you do one last rebase and we can get this merged today.
I think it's already fully rebased against main (last commit on main: c1edf7acb50b3b7710839b378e4828fc1e0c7a69)
@mlissner I think I need you to merge this.
I will rebase again
@mattdahl sorry
I haven't looked at this in ages, but I just marked it as approved so that my review wasn't needed anymore. @flooie, your call completely on this one and hopefully I'm out of the way now.