Demo 5: Clustering Trajectories by Shape
Using Tracktable’s box-DBSCAN capabilities, we can detect similarly shaped trajectories regardless of location or scale.
Algorithm Details
For every trajectory, we create a feature vector, which we call the distance geometry feature vector, as follows:
1st value: Take the distance between the endpoints of the trajectory (as the crow flies) and divide by the total distance traveled, giving us a measure of “straightness”.
2nd and 3rd values: Find the midpoint along the trajectory such that it splits the distance traveled along the trajectory in half. Do the same calculation again, but for each half. That is, take the distance between the start point and the midpoint of the trajectory (as the crow flies) and divide by the distance traveled along the trajectory from the start point to the midpoint. Save that as the second value in the feature vector. Then take the distance between the midpoint and the endpoint and divide by the distance traveled along the trajectory between those points. This is the third value in the feature vector.
4th, 5th and 6th values: Do the same calculation, but with thirds.
7th, 8th, 9th and 10th values: Do the same calculation, but with fourths.
… and so on. We stop when we reach a preset “depth” which is the largest number of segments that we divide our trajectory into. For instance, a depth of 4 will yield a 10-dimensional feature vector. In this way, the distance geometry feature vector encapsulates the straightness of the trajectory across proportionally smaller and smaller segments, and thus gives a concise and comparable representation of shape for the trajectory. This representation is scale, rotation, translation and reflection invariant. Using box-DBSCAN, Tracktable clusters trajectories together based on the similarity of their feature vectors. Trajectories that are similar in shape should occupy the same cluster.
In [1]:
from tracktable.render.render_trajectories import render_trajectories_separate, render_trajectories
import tracktable.examples.tutorials.tutorial_helper as tutorial
from tracktable.applications.cluster import *
Import Trajectories
We will use some sample maritime data for this demo, obtained from BOEM/NOAA.\(^1\)
In [2]:
trajectories = tutorial.get_trajectory_list('shape')
Loading Trajectories: 1395 trajectory [00:00, 3922.77 trajectory/s]
[2025-06-12 00:48:57.151661] [0x00000001f1aadf00] [info] Read a total of 1395 trajectories.
Trajectory Clustering using Distance Geometry Feature Vectors
Create a distance geometry feature vector (as defined in “Algorithm Details” above) for each trajectory and use Tracktable’s box-DBSCAN to cluster the trajectories.
The epsilon parameter defines the radius of the “nearness” box for box-DBSCAN. Increasing (decreasing) it will result in a greater (smaller) likelihood that trajectories will cluster together, reducing (increasing) the number of outliers.
In [3]:
clusters = cluster_trajectories_shape(trajectories, depth=4, epsilon=0.02)
The clusters dictionary contains cluster numbers as keys, and a list of trajectories in that cluster for each corresponding value. These clusters and their sizes are printed above.
In our trajectory dataset, we have found 5 clusters, meaning 5 underlying shapes were prevalent in our dataset of 125 maritime trajectories.
Cluster Visualization
Let’s look at what the trajectories in some of these clusters look like.
Cluster 3: Trajectories that double-back and return to their origin.
In [4]:
render_trajectories(clusters[3])
Out[4]:
In [5]:
render_trajectories_separate(clusters[3])
Cluster 4: Trajectories with a sharp turn at one end.
In [6]:
render_trajectories(clusters[4])
Out[6]:
In [7]:
render_trajectories_separate(clusters[4])
Cluster 7: Trajectories that are ferrying in a mostly straight line.
In [8]:
render_trajectories(clusters[7])
Out[8]:
In [9]:
render_trajectories_separate(clusters[7])
Cluster 8: Trajectories that are ferrying with sharp turns.
In [10]:
render_trajectories(clusters[8])
Out[10]:
In [11]:
render_trajectories_separate(clusters[8])
Outlier Visualization
The trajectories that did not cluster (outliers) are stored under key 0 of the clusters dictionary. Let’s look at the first five of these anomalously shaped trajectories.
In [12]:
render_trajectories_separate(clusters[0][:5])
\(^1\) Bureau of Ocean Energy Management (BOEM) and National Oceanic and Atmospheric Administration (NOAA). MarineCadastre.gov. AIS Data for 2020. Retrieved February 2021 from marinecadastre.gov/data. US coastal maritime traffic trimmed down to June 30, 2020.