Fast-F1 [BUG] Timing data contains laps with incorrect duplicate lap times

Describe the issue:

For the Qualifying of the 2023 Canadian GP, the timing data for some drivers contains laps that have the exact same lap time as a previous lap.

For example: Perez' first two laps, Verstappen's last two laps

Reference: https://www.fia.com/sites/default/files/2023_09_can_f1_q0_timing_qualifyingsessionlaptimes_v01.pdf

Edit after first investigation:

The laps that have incorrect lap times (and sector 3 times) are laps during which the session was red-flagged. The lap time and sector 3 time of the previous lap is then received again from the API. I.e. the incorrectly duplicated data already exists in the source data.

Expected Behaviour

FastF1 should detect that these values are incorrect and ignore them.

Reproduce the code example:

import fastf1

session = fastf1.get_session(2023, 'Canada', 'Q')
session.load(telemetry=False)

ver = session.laps.pick_driver('VER')

print(ver.loc[:, ('LapNumber', 'Time', 'LapTime')])

Error message:

core           INFO 	Loading data for Canadian Grand Prix - Qualifying [v3.0.4]
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '27', '14', '44', '63', '31', '4', '55', '81', '23', '16', '11', '18', '20', '77', '22', '10', '21', '2', '24']

    LapNumber                   Time                LapTime
0         1.0 0 days 00:25:02.786000                    NaT
1         2.0 0 days 00:26:36.836000                    NaT
2         3.0 0 days 00:28:00.942000 0 days 00:01:24.106000
3         4.0 0 days 00:29:23.785000 0 days 00:01:22.843000
4         5.0 0 days 00:30:54.954000 0 days 00:01:31.169000
5         6.0 0 days 00:32:16.942000 0 days 00:01:21.988000
6         7.0 0 days 00:33:56.601000 0 days 00:01:39.659000
7         8.0 0 days 00:35:22.770000 0 days 00:01:26.169000
8         9.0 0 days 00:36:44.509000 0 days 00:01:21.739000
9        10.0 0 days 00:38:41.690000 0 days 00:01:57.181000
10       11.0 0 days 00:40:02.541000 0 days 00:01:20.851000
11       12.0 0 days 00:47:02.809000                    NaT
12       13.0 0 days 00:48:34.656000                    NaT
13       14.0 0 days 00:49:54.791000 0 days 00:01:20.135000
14       15.0 0 days 00:51:39.433000 0 days 00:01:44.642000
15       16.0 0 days 00:53:10.331000 0 days 00:01:30.898000
16       17.0 0 days 00:54:30.708000 0 days 00:01:20.377000
17       18.0 0 days 00:55:49.800000 0 days 00:01:19.092000
18       19.0 0 days 00:57:16.184000 0 days 00:01:26.384000
19       20.0 0 days 00:58:42.033000 0 days 00:01:25.849000
20       21.0 0 days 01:10:02.660000                    NaT
21       22.0 0 days 01:11:38.752000                    NaT
22       23.0 0 days 01:13:05.811000 0 days 00:01:27.059000
23       24.0 0 days 01:14:31.669000 0 days 00:01:25.858000
24       25.0 0 days 01:21:59.694000 0 days 00:01:25.858000
25       26.0 0 days 01:23:44.223000                    NaT

Jun 20 '23 22:06 theOehrly

I was thinking if we can check the race_control_messages for RED FLAG, and try to figure out whether there are any laps that have red_flag_time < Time < immediate_green_flag. This will require some processing and helper methods to do further analysis on the laps and race_control_messages.

I also noticed, that the lap "Time" (start time) i think is in GMT and race control messages are maybe in race local time. I think race control messages time should be converted to GMT ?

Sep 15 '23 18:09 AND2797

import fastf1

session = fastf1.get_session(2023, 'Canada', 'Q')
session.load(telemetry=False)

ver = session.laps.pick_driver('VER')

ver_df = ver.loc[:, ('LapNumber', 'Time', 'LapTime')]

#create a column that calculates the difference with previous finish lap time
ver_df['sub_time'] = ver_df['Time'].diff()

#create a boolean column to check if 'sub_time' equals 'LapTime'
ver_df['bool_check'] = ver_df['sub_time'] == ver_df['LapTime']
#create a boolean column to check if 'LapNumber' equals 'LapNumber' of previous row
ver_df['bool_previous_lap'] = ver_df['LapTime'] == ver_df['LapTime'].shift(1)


#if "bool_check" False and "bool_previous_lap" True, then set "LapTime" to None
ver_df['LapTime'] = ver_df['LapTime'].mask((ver_df['bool_check'] == False) & (ver_df['bool_previous_lap'] == True), None)

#remove the columns that were used to remove the duplicates
ver_df.drop(['sub_time', 'bool_check', 'bool_previous_lap'], axis=1, inplace=True)

I tried to do something like this, but obviously you can correct me if this could lead to ignore "useful" laps. This piece of code only adds some kind of temporary column to check two conditions:

The first condition checks if the difference with the previous lap "Time" equals with the current "LapTime". For a normal lap, it should always return True;
The second condition checks if the current "LapTime" equals the previous "LapTime". This should give more strenght to the first condition, checking if it's also a possible duplicate.

So, if the first condition if False and the second condition is True, we can set None to that lap time. Finally, temp columns are removed from the dataframe.

Sep 18 '23 21:09 d-tomasino

@d-tomasino this seems to work, although I'm not entirely happy with a solution like this because it just assumes that any lap time that matches these criteria is incorrect. F1 drivers surprisingly often set two successive laps with exactly the same time (can happen multiple times per race actually).

So this would need some more extensive testing on multiple session where it is manually verified whether the removed laps were correctly detected.

Additionally, your first check is in theory already implemented in the API parser. It should warn the user about "timing integrity errors", but apparently it is not triggered here. Before fixing this we should figure out why this warning is not shown because there has to be something else that's going on.

Sep 19 '23 15:09 theOehrly

@theOehrly thanks for the reply! You're right, it's understandable that could happen not so rarely to have two straight laps with same exact time. However, in that case (as far as I understood) the difference in "Time" between the two adjacent rows should match the "LapTime" value, which is why, in the case of two consecutive real laps, the two conditions should report True and True instead of False and True as in this case (the red flag issue), but obviously I could be wrong, so please correct me if I said some inaccuracy.

In any case, as soon as I can, I could try to take a look first at the "timing integrity errors" warning that is not shown, so that we can try to solve everything step by step

Sep 19 '23 22:09 d-tomasino

Noting that this may not just be limited to laps near red flags, see #612. Also remember to investigate potential relation with #473

Jul 09 '24 17:07 theOehrly