eyecite icon indicating copy to clipboard operation
eyecite copied to clipboard

Issue 242 - Fix court detection when parenthetical includes date

Open mattdahl opened this issue 10 months ago • 1 comments

As described in #242, there is currently a problem in detecting the court in a full citation when the parenthetical includes the specific date on which the opinion was issued, e.g., (C.D. Cal. Feb. 9, 2015).

The problem is that we currently treat everything before the year as a potential court string, so C.D. Cal. Feb. 9 in the example above. This causes the court lookup in helpers.get_court_by_paren() to fail.

This PR fixes this by modifying the POST_FULL_CITATION_REGEX regex to include capturing groups for potential months and days. If these are present, we capture them so the <court> group remains pristine (i.e., only C.D. Cal.). If they are not present, the behavior should be the same as it currently is.

mattdahl avatar Apr 06 '25 16:04 mattdahl

Thanks @mattdahl

flooie avatar Apr 07 '25 14:04 flooie

(1) I edited the month regexes, but didn't follow your proposed format because it would have captured full month names with a period after them (e.g., January.). I think my version has the flexibility you wanted, but let me know if it's missing a case you had in mind.

(2) I exposed the captured month/day information on the metadata object. I did not, however, do any normalization of this information or expose it on the base citation object itself. (For the year data, we store the original captured string under citation.metadata.year AND a normalized int version under citation.year. I wasn't sure if we should also put the month/day at the top level like that, but since it's not particularly common I decided to just leave it in the metadata.)

(3) I considered editing corrected_citation_full() to include the month/day in its output when we have it, but I encountered an issue regarding the court normalization that takes place in that function that made me hesitate to do anything before we decide whether that behavior should itself be changed. See https://github.com/freelawproject/eyecite/issues/135#issuecomment-2819212924.

mattdahl avatar Apr 21 '25 18:04 mattdahl

@mattdahl

That looks great.
I think a larger PR landed and it had an expanded test that included a month and day

            ("Corp. v. Nature's Farm Prods., No. 99 Civ. 9404 (SHS), 2000 U.S. Dist. LEXIS 12335 (S.D.N.Y. Aug. 25, 2000)",
             [case_citation(
                 volume='2000',
                 reporter='U.S. Dist. LEXIS',
                 page='12335',
                 year=2000,
                 metadata={'plaintiff': "Corp.", 'defendant': "Nature's Farm Prods., No. 99 Civ. 9404 (SHS)", "month": "Aug.", "day": "25", "court": "nysd"})
              ],),

this is the fixed version, I think if we add that in we can land this

flooie avatar Apr 24 '25 19:04 flooie

Thanks, I just fixed that test in 71a46d4e98e53888f588280a74380b5ce546c66c. Also, v2.7.1 just landed so I rebased again.

mattdahl avatar Apr 25 '25 17:04 mattdahl

you have permission to run theses test right? or am I going crazy?

flooie avatar Apr 25 '25 20:04 flooie

I think the tests are fine but the benchmark check has never be able to run properly for me, for some reason.

mattdahl avatar Apr 25 '25 22:04 mattdahl

@mattdahl I ran it locally and it looks fine. Can you do one last rebase and we can get this merged today.

flooie avatar Apr 30 '25 13:04 flooie

I think it's already fully rebased against main (last commit on main: c1edf7acb50b3b7710839b378e4828fc1e0c7a69)

mattdahl avatar Apr 30 '25 15:04 mattdahl

@mlissner I think I need you to merge this.

flooie avatar May 14 '25 22:05 flooie

I will rebase again

mattdahl avatar May 14 '25 23:05 mattdahl

@mattdahl sorry

flooie avatar May 14 '25 23:05 flooie

I haven't looked at this in ages, but I just marked it as approved so that my review wasn't needed anymore. @flooie, your call completely on this one and hopefully I'm out of the way now.

mlissner avatar May 15 '25 00:05 mlissner