dateparser icon indicating copy to clipboard operation
dateparser copied to clipboard

German weekday "Montag" (Monday) only works with 'PREFER_DATES_FROM': 'future'?

Open helfrichp opened this issue 10 months ago • 2 comments

Found this strange behaviour, German weekday "Montag" (Monday) only works with 'PREFER_DATES_FROM': 'future'.

Not sure what's going on here, all other weekdays work as expected though.

    all_found_dates = dateparser.search.search_dates("Montag", languages=['de'], settings={'PREFER_DATES_FROM': 'current_period'})
    print(all_found_dates)
    all_found_dates = dateparser.search.search_dates("Montag", languages=['de'], settings={'PREFER_DATES_FROM': 'past'})
    print(all_found_dates)
    all_found_dates = dateparser.search.search_dates("Montag", languages=['de'], settings={'PREFER_DATES_FROM': 'future'})
    print(all_found_dates)

Output: [('Montag', datetime.datetime(2025, 12, 31, 0, 0))] [('Montag', datetime.datetime(2025, 12, 31, 0, 0))] [('Montag', datetime.datetime(2025, 4, 7, 0, 0))]

helfrichp avatar Apr 02 '25 10:04 helfrichp

Problems with parsing a weekday in the past at the beginning of the month are not language-specific. I show a more comprehensive reproducer below with English.

The date on which the test is run is relevant to the problem. Parsing a weekday that hasn't yet occurred in the current month unexpectedly returns days in the future instead of days in the past with settings={'PREFER_DATES_FROM': 'past'}

Our workaround is to revert to an older version. The last release in which a date in the past is returned when input is a weekday that hasn't happened yet in the current month is 1.1.8.

dateparser 1.20 and 1.2.1

Note: April 1, 2025 is a Tuesday. Today (the date of this test) is a Friday. The only days that work are Tuesday, Wednesday, and Thursday.

>>> import dateparser
>>> weekdays = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
>>> kwargs = {'settings': {'PREFER_DATES_FROM': 'past'}}
>>> [dateparser.parse(wday, **kwargs) for wday in weekdays]
[datetime.datetime(2025, 4, 30, 0, 0), datetime.datetime(2025, 12, 31, 0, 0), datetime.datetime(2025, 4, 1, 0, 0), datetime.datetime(2025, 4, 2, 0, 0), datetime.datetime(2025, 4, 3, 0, 0), datetime.datetime(2025, 4, 28, 0, 0), datetime.datetime(2025, 4, 29, 0, 0)]

dateparser 1.1.8

The parsed dates for each day of the week are all in the past as expected, either past weekdays in the current month (Tuesday through Thursday) or else past days for the previous month (Friday through Monday).

>>> import dateparser
>>> weekdays = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
>>> kwargs = {'settings': {'PREFER_DATES_FROM': 'past'}}
>>> [dateparser.parse(wday, **kwargs) for wday in weekdays]
[datetime.datetime(2025, 3, 30, 0, 0), datetime.datetime(2025, 3, 31, 0, 0), datetime.datetime(2025, 4, 1, 0, 0), datetime.datetime(2025, 4, 2, 0, 0), datetime.datetime(2025, 4, 3, 0, 0), datetime.datetime(2025, 3, 28, 0, 0), datetime.datetime(2025, 3, 29, 0, 0)]

synrg avatar Apr 04 '25 08:04 synrg

This may be part of the same issue so I mention it here in case the two problems share a common root cause.

We have a similar problem with dates in the future being returned for months that have not yet occurred in the current year when we request past dates. I show both last and first day of the month preferred, as that matches our use cases (for implementing 'since' and 'until' qualifiers in a query language) and affects the outcome.

That is, the last day of April 2025 hasn't yet occurred, yet currently when we ask for a date in the past with last day of the month preferred for 'April', it returns April 30, 2025, a date in the future. Compare with the second test with first day of the month preferred and it returns the expected result, April 30, 2024. I don't know if that's expected behaviour or not. It's not what we expected, though.

dateparser 1.1.7, 1.1.8, 2.1.0, 2.1.0

>>> import dateparser
>>> months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
>>> kwargs = {'settings': {'PREFER_DATES_FROM': 'past', 'PREFER_DAY_OF_MONTH': 'last'}}
>>> [dateparser.parse(month, **kwargs) for month in months]
[datetime.datetime(2025, 1, 31, 0, 0), datetime.datetime(2025, 2, 28, 0, 0), datetime.datetime(2025, 3, 31, 0, 0), datetime.datetime(2025, 4, 30, 0, 0), datetime.datetime(2024, 5, 31, 0, 0), datetime.datetime(2024, 6, 30, 0, 0), datetime.datetime(2024, 7, 31, 0, 0), datetime.datetime(2024, 8, 31, 0, 0), datetime.datetime(2024, 9, 30, 0, 0), datetime.datetime(2024, 10, 31, 0, 0), datetime.datetime(2024, 11, 30, 0, 0), datetime.datetime(2024, 12, 31, 0, 0)]
>>> kwargs = {'settings': {'PREFER_DATES_FROM': 'past', 'PREFER_DAY_OF_MONTH': 'first'}}
>>> [dateparser.parse(month, **kwargs) for month in months]
[datetime.datetime(2025, 1, 1, 0, 0), datetime.datetime(2025, 2, 1, 0, 0), datetime.datetime(2025, 3, 1, 0, 0), datetime.datetime(2025, 4, 1, 0, 0), datetime.datetime(2024, 5, 1, 0, 0), datetime.datetime(2024, 6, 1, 0, 0), datetime.datetime(2024, 7, 1, 0, 0), datetime.datetime(2024, 8, 1, 0, 0), datetime.datetime(2024, 9, 1, 0, 0), datetime.datetime(2024, 10, 1, 0, 0), datetime.datetime(2024, 11, 1, 0, 0), datetime.datetime(2024, 12, 1, 0, 0)]

I gave up testing older versions prior to 1.1.7. It looks like this may have always been this way, and in any case, we would not want to revert to a release older than that if any of them ever worked the way we think it should work.

Although I don't want to get into reproducers for it, it seems like parsing an end date that is supposed to be a past end date when specifying an unqualified year, e.g. 2025, is also awkward in the current implementation of dateparser.

workaround

The only workaround I can think of off the top of my head is:

  • parse the date portion of the user's input (e.g. 'April' for 'since April' or 'until April') with preferred day of month determined by the qualifier 'since' ('first') or 'until' ('last')
  • for the 'until' case, since the parser may incorrectly return a date in the future
    • compare the resulting date with the current date
    • if the result is in the future, try parsing the input again with settings={'PREFER_DATES_FROM': 'past', 'PREFER_DAY_OF_MONTH': 'current'}

This could've been a better workaround for the weekday case as well except for the bizarre treatment of Monday as Dec 31, 2025 which is unlike all of the others. Therefore, we'll at least pin dateparser at 1.1.8 for now for that reason, and may also implement the workaround above to guarantee until dates for the current month or year are never dates in the future.

synrg avatar Apr 04 '25 09:04 synrg