You are now able to create visualizations along with your knowledge.
task:
To get the info I wanted for the visuals, my first intuition was to have a look at the cumulative distance column for every runner, decide when every runner accomplished a lap distance (1000, 2000, 3000, and so forth.) and calculate the distinction in timestamps.
Though this algorithm appears easy and may match, it had some limitations that wanted to be addressed.
- The precise lap distance is usually accomplished between two registered knowledge factors. To be extra correct, interpolation Each place and time.
- By Variations in accuracy MachineThere might be lags between runners. Most usually, one runner’s lap notification will go off earlier than one other, even when the 2 of them ran the entire monitor collectively. To reduce this, I Use the reference runner to set place marks for every lap of the monitor.The time distinction is calculated when different runners go these marks (no matter whether or not they have collected distance within the lead or the top of the lap). That is nearer to the fact of racing: if somebody passes the purpose first, they’re within the lead (whatever the collected distance on their system).
- The earlier level brings with it one other drawback: the latitude and longitude of the reference level might not be recorded precisely within the knowledge of different runners. Closest neighbor Discover the closest knowledge level by way of location.
- Lastly, Nearest Neighbors can yield false knowledge factors if vehicles go the identical location at completely different occasions. Subsequently, the inhabitants by which Nearest Neighbors appears for one of the best match is The variety of candidates has decreasedI’ve outlined Window dimension 20 knowledge factors Close to the goal distance (Distance_total).
algorithm
Bearing in mind all of the earlier limitations, the algorithm appears like this:
1. Choose your baseline and lap distance (default = 1km)
2. Use the reference knowledge to determine the situation and second every lap was accomplished (reference mark).
3. Entry the info of the opposite runners and determine the second they go the place mark. Then calculate the distinction within the time when each runners go the mark. Lastly, the delta of this time distinction represents the hole evolution.
Code instance
1. Choose your baseline and lap distance (default = 1km)
- Juan turns into a witness (Juan_df) Relating to examples.
- The opposite runner was Pedro (Pedro ) and Jimena (Jimena).
- The lap distance is 1000 meters
2. Create InterpolationWrap(): A operate that finds or interpolates the precise factors for every accomplished lap and returns them in a brand new knowledge body.. The inference is completed with the next operate: Interpolated worth() that was additionally created.
## Perform: interpolate_value()Enter:
- begin: The beginning worth.
- finish: The ending worth.
- fraction: A price between 0 and 1 that represents the place between
the beginning and finish values the place the interpolation ought to happen.
Return:
- The interpolated worth that lies between the begin and finish values
on the specified fraction.
def interpolate_value(begin, finish, fraction):
return begin + (finish - begin) * fraction
## Perform: interpolate_laps()Enter:
- track_df: dataframe with monitor knowledge.
- lap_distance: metres per lap (default 1000)
Return:
- track_laps: dataframe with lap metrics. As many rows as laps recognized.
def interpolate_laps(track_df , lap_distance = 1000):
#### 1. Initialise track_laps with the primary row of track_df
track_laps = track_df.loc[0][['latitude','longitude','elevation','date_time','distance_cum']].copy()# Set distance_cum = 0
track_laps[['distance_cum']] = 0
# Transpose dataframe
track_laps = pd.DataFrame(track_laps)
track_laps = track_laps.transpose()
#### 2. Calculate number_of_laps = Complete Distance / lap_distance
number_of_laps = track_df['distance_cum'].max()//lap_distance
#### 3. For every lap i from 1 to number_of_laps:
for i in vary(1,int(number_of_laps+1),1):
# a. Calculate target_distance = i * lap_distance
target_distance = i*lap_distance
# b. Discover first_crossing_index the place track_df['distance_cum'] > target_distance
first_crossing_index = (track_df['distance_cum'] > target_distance).idxmax()
# c. If match is precisely the lap distance, copy that row
if (track_df.loc[first_crossing_index]['distance_cum'] == target_distance):
new_row = track_df.loc[first_crossing_index][['latitude','longitude','elevation','date_time','distance_cum']]
# Else: Create new_row with interpolated values, copy that row.
else:
fraction = (target_distance - track_df.loc[first_crossing_index-1, 'distance_cum']) / (track_df.loc[first_crossing_index, 'distance_cum'] - track_df.loc[first_crossing_index-1, 'distance_cum'])
# Create the brand new row
new_row = pd.Sequence({
'latitude': interpolate_value(track_df.loc[first_crossing_index-1, 'latitude'], track_df.loc[first_crossing_index, 'latitude'], fraction),
'longitude': interpolate_value(track_df.loc[first_crossing_index-1, 'longitude'], track_df.loc[first_crossing_index, 'longitude'], fraction),
'elevation': interpolate_value(track_df.loc[first_crossing_index-1, 'elevation'], track_df.loc[first_crossing_index, 'elevation'], fraction),
'date_time': track_df.loc[first_crossing_index-1, 'date_time'] + (track_df.loc[first_crossing_index, 'date_time'] - track_df.loc[first_crossing_index-1, 'date_time']) * fraction,
'distance_cum': target_distance
}, title=f'lap_{i}')
# d. Add the brand new row to the dataframe that shops the laps
new_row_df = pd.DataFrame(new_row)
new_row_df = new_row_df.transpose()
track_laps = pd.concat([track_laps,new_row_df])
#### 4. Convert date_time to datetime format and take away timezone
track_laps['date_time'] = pd.to_datetime(track_laps['date_time'], format='%Y-%m-%d %H:%M:%S.%fpercentz')
track_laps['date_time'] = track_laps['date_time'].dt.tz_localize(None)
#### 5. Calculate seconds_diff between consecutive rows in track_laps
track_laps['seconds_diff'] = track_laps['date_time'].diff()
return track_laps
Making use of the interpolation operate to the reference dataframe produces the next dataframe:
juan_laps = interpolate_laps(juan_df , lap_distance=1000)
Since it’s a 10km race, it’s outlined as 10 laps of 1000m (see column). Distance_total).column Second distinction is the time per lap. The remaining columns (latitude, longitude, elevation and Date Time) Mark the place and time of every lap on the idea of the interpolated consequence.
3. I created a operate to calculate the time distinction between the bottom runner and different runners. Hole Between References()
## Helper Capabilities:
- get_seconds(): Convert timedelta to whole seconds
- format_timedelta(): Format timedelta as a string (e.g., "+01:23" or "-00:45")
# Convert timedelta to whole seconds
def get_seconds(td):
# Convert to whole seconds
total_seconds = td.total_seconds() return total_seconds
# Format timedelta as a string (e.g., "+01:23" or "-00:45")
def format_timedelta(td):
# Convert to whole seconds
total_seconds = td.total_seconds()
# Decide signal
signal = '+' if total_seconds >= 0 else '-'
# Take absolute worth for calculation
total_seconds = abs(total_seconds)
# Calculate minutes and remaining seconds
minutes = int(total_seconds // 60)
seconds = int(total_seconds % 60)
# Format the string
return f"{signal}{minutes:02d}:{seconds:02d}"
## Perform: gap_to_reference()Enter:
- laps_dict: dictionary containing the df_laps for all of the runnners' names
- df_dict: dictionary containing the track_df for all of the runnners' names
- reference_name: title of the reference
Return:
- matches: processed knowledge with time variations.
def gap_to_reference(laps_dict, df_dict, reference_name):
#### 1. Get the reference's lap knowledge from laps_dict
matches = laps_dict[reference_name][['latitude','longitude','date_time','distance_cum']]#### 2. For every racer (title) and their knowledge (df) in df_dict:
for title, df in df_dict.objects():
# If racer is the reference:
if title == reference_name:
# Set time distinction to zero for all laps
for lap, row in matches.iterrows():
matches.loc[lap,f'seconds_to_reference_{reference_name}'] = 0
# If racer isn't the reference:
if title != reference_name:
# a. For every lap discover the closest level in racer's knowledge based mostly on lat, lon.
for lap, row in matches.iterrows():
# Step 1: set the place and lap distance from the reference
target_coordinates = matches.loc[lap][['latitude', 'longitude']].values
target_distance = matches.loc[lap]['distance_cum']
# Step 2: discover the datapoint that will probably be within the centre of the window
first_crossing_index = (df_dict[name]['distance_cum'] > target_distance).idxmax()
# Step 3: choose the 20 candidate datapoints to search for the match
window_size = 20
window_sample = df_dict[name].loc[first_crossing_index-(window_size//2):first_crossing_index+(window_size//2)]
candidates = window_sample[['latitude', 'longitude']].values
# Step 4: get the closest match utilizing the coordinates
nn = NearestNeighbors(n_neighbors=1, metric='euclidean')
nn.match(candidates)
distance, indice = nn.kneighbors([target_coordinates])
nearest_timestamp = window_sample.iloc[indice.flatten()]['date_time'].values
nearest_distance_cum = window_sample.iloc[indice.flatten()]['distance_cum'].values
euclidean_distance = distance
matches.loc[lap,f'nearest_timestamp_{name}'] = nearest_timestamp[0]
matches.loc[lap,f'nearest_distance_cum_{name}'] = nearest_distance_cum[0]
matches.loc[lap,f'euclidean_distance_{name}'] = euclidean_distance
# b. Calculate time distinction between racer and reference at this level
matches[f'time_to_ref_{name}'] = matches[f'nearest_timestamp_{name}'] - matches['date_time']
# c. Retailer time distinction and different related knowledge
matches[f'time_to_ref_diff_{name}'] = matches[f'time_to_ref_{name}'].diff()
matches[f'time_to_ref_diff_{name}'] = matches[f'time_to_ref_diff_{name}'].fillna(pd.Timedelta(seconds=0))
# d. Format knowledge utilizing helper features
matches[f'lap_difference_seconds_{name}'] = matches[f'time_to_ref_diff_{name}'].apply(get_seconds)
matches[f'lap_difference_formatted_{name}'] = matches[f'time_to_ref_diff_{name}'].apply(format_timedelta)
matches[f'seconds_to_reference_{name}'] = matches[f'time_to_ref_{name}'].apply(get_seconds)
matches[f'time_to_reference_formatted_{name}'] = matches[f'time_to_ref_{name}'].apply(format_timedelta)
#### 3. Return processed knowledge with time variations
return matches
Beneath is the code that implements the logic and saves the outcomes right into a dataframe. Gaps matched with reference:
# Lap distance
lap_distance = 1000# Retailer the DataFrames in a dictionary
df_dict = {
'jimena': jimena_df,
'juan': juan_df,
'pedro': pedro_df,
}
# Retailer the Lap DataFrames in a dictionary
laps_dict = {
'jimena': interpolate_laps(jimena_df , lap_distance),
'juan': interpolate_laps(juan_df , lap_distance),
'pedro': interpolate_laps(pedro_df , lap_distance)
}
# Calculate gaps to reference
reference_name = 'juan'
matches_gap_to_reference = gap_to_reference(laps_dict, df_dict, reference_name)
The columns of the ensuing dataframe include the essential data that’s displayed within the graph.

