Tutorial 6: Filtering Trajectories
In [1]:
import tracktable.examples.tutorials.tutorial_helper as tutorial
Purpose
It is often desirable to remove unwanted trajectories from a dataset prior to beginning analysis. Tracktable’s geomath package includes a large suite of functions to support trajectory filtering, as well as general analysis. We will demonstrate several possible filtering techniques here.
Import Example Trajectories
The function below will assemble trajectories from our sample data file \(^1\). For details, please see Tutorials 1 & 2.
In [2]:
trajectories = tutorial.get_trajectory_list()
Loading Trajectories: 279 trajectory [00:01, 251.54 trajectory/s]
Let’s print out some info about our trajectories. How each of these values is calculated is explained in each corresponding filter below.
In [3]:
tutorial.print_statistics(trajectories, 'all')
There are 279 total trajectories.
The total number of points in the given trajectory list is 8633.
The length of the given trajectories ranges from 0.0 km to 43.019866804260204 km.
The straightness of the given trajectories ranges from 0.0 to 1.0.
The convex hull area of the given trajectories ranges from 0.0 km^2 to 172.30630866896283 km^2.
The average speed of the given trajectories ranges from 0.0 km/hr to 34.13708002699722 km/hr.
Filtering using Tracktable’s geomath
Filtering by Distance Traveled
Suppose we know that our dataset contains many short trajectories (say, less than 5km in length) that we wish to filter out. This can be done using geomath’s length function.
In [4]:
length_threshold = 5 # km
In [5]:
from tracktable.core.geomath import length
trajectories_filtered_by_length = [trajectory for trajectory in trajectories if length(trajectory) > length_threshold]
How many are left?
In [6]:
len(trajectories_filtered_by_length)
Out[6]:
54
What lengths do the remaining trajectories have?
In [7]:
tutorial.print_statistics(trajectories_filtered_by_length, 'length')
The length of the given trajectories ranges from 5.095141135887496 km to 43.019866804260204 km.
Filtering by Straightness
In some datasets (such as air traffic) we can anticipate that many of our trajectories will be nearly straight, and we may wish to filter these out as “uninteresting” to make analysis faster on the remaining trajectories.
The function below will calculate the “straightness” of a trajectory. This is done by comparing the distance between the trajectory’s endpoints to the distance traveled along the trajectory. If their ratio is 1, the trajectory traveled in a straight line. As the ratio decreases, we consider the trajectory to be less straight. A ratio of zero means the trajectory’s origin and destination are the same, so it could not have traveled a straight line.
In [8]:
from tracktable.core.geomath import length, end_to_end_distance
def calculate_straightness(trajectory):
# get the distance between endpoints of a trajectory
end_to_end_dist = end_to_end_distance(trajectory)
# get the distance traveled along the trajectory
dist_traveled = length(trajectory)
# if the trajectory doesn't move, it is not straight
if dist_traveled == 0:
return 0
# measure how well the trajectory followed the straight path
return end_to_end_dist / dist_traveled
Since straightness varies from 0 to 1, so must our threshold. Increasing our threshold means we will discard few trajectories based on straightness.
In [9]:
straightness_threshold = 0.9
Let’s discard all trajectories with straightness 0.9 or higher.
In [10]:
trajectories_filtered_by_straightness = [trajectory for trajectory in trajectories if calculate_straightness(trajectory) < straightness_threshold]
How many are left?
In [11]:
len(trajectories_filtered_by_straightness)
Out[11]:
256
How straight are the remaining trajectories?
In [12]:
tutorial.print_statistics(trajectories_filtered_by_straightness, 'straightness')
The straightness of the given trajectories ranges from 0.0 to 0.898311689786312.
Filtering by Area Covered
The convex hull is the smallest convex polygon that encloses the entire trajectory (imagine each trajectory point to be a peg on a board, and we are stretching a rubber band around this set of pegs). The area of this polygon can give us insight into the breadth of travel of a trajectory, and can be calculated using the convex_hull_area function of geomath.
For example, in some maritime datasets, we may wish to remove anchored boats from our dataset. This can be done by filtering out trajectories with a small convex_hull_area, e.g. less than 0.2 km \(^2\), as shown below.
In [13]:
convex_hull_area_threshold = 0.2 # square km
In [14]:
from tracktable.core.geomath import convex_hull_area
trajectories_filtered_by_area = [trajectory for trajectory in trajectories if convex_hull_area(trajectory) > convex_hull_area_threshold]
How many are left?
In [15]:
len(trajectories_filtered_by_area)
Out[15]:
61
What are the convex hull areas of the remaining trajectories?
In [16]:
tutorial.print_statistics(trajectories_filtered_by_area, 'convex hull area')
The convex hull area of the given trajectories ranges from 0.24882849366082138 km^2 to 172.30630866896283 km^2.
Filtering by Average Speed
We may wish to remove slow moving objects from our dataset (such as tugboats in maritime data). We can use the speed_between function on the first and last trajectory points to filter slow trajectories, as shown below.
In [17]:
avg_speed_threshold = 1 # km/s
In [18]:
from tracktable.core.geomath import speed_between
trajectories_filtered_by_avg_speed = [trajectory for trajectory in trajectories if speed_between(trajectory[0], trajectory[-1]) > avg_speed_threshold]
How many are left?
In [19]:
len(trajectories_filtered_by_avg_speed)
Out[19]:
60
What average speeds do the remaining trajectories have?
In [20]:
tutorial.print_statistics(trajectories_filtered_by_avg_speed, 'average speed')
The average speed of the given trajectories ranges from 1.3010998797767819 km/hr to 34.13708002699722 km/hr.
Filtering by Spatial Window
If we only want to keep trajectories within a given spatial window, we can filter as demonstrated below. The algorithm below is a quick filtering algorithm that uses a trajectory’s bounding box.
Reminder: Tracktable uses the ordering (longitude, latitude) to match the traditional Cartesian (x,y) ordering.
In [21]:
from tracktable.core.geomath import compute_bounding_box
def remove_trajectories_outside(trajectories, min_lon, min_lat, max_lon, max_lat, strictly_within=True):
trajectories_in_window = []
for trajectory in trajectories:
# Get the bounding box of the current trajectory.
bounding_box = compute_bounding_box(trajectory)
# Check if the bottom left corner of the trajectory's bounding box is inside our window.
min_corner_in_window = (min_lon < bounding_box.min_corner[0] < max_lon and min_lat < bounding_box.min_corner[1] < max_lat)
# Check if the top right corner of the trajectory's bounding box is inside our window.
max_corner_in_window = (min_lon < bounding_box.max_corner[0] < max_lon and min_lat < bounding_box.max_corner[1] < max_lat)
# For a trajectory to be entirely within the box, both corners must be in the box.
# For a trajectory to be somewhat within the box, at least one corner must be in the box.
if ((strictly_within and min_corner_in_window and max_corner_in_window)
or ((not strictly_within) and (min_corner_in_window or max_corner_in_window))):
trajectories_in_window.append(trajectory)
return trajectories_in_window
In [22]:
trajectories_in_window = remove_trajectories_outside(trajectories, -74.1, 40.5, -73.9, 40.6, strictly_within=True) # change strictly_within to False to include trajectories that cross the boundary of our window
How many trajectories are strictly within the window with bottom left corner (-74.1, 40.5) and top right corner (-73.9, 40.6)?
In [23]:
len(trajectories_in_window)
Out[23]:
7
Other Filtering Techniques
Trim Redundant Points
The storage footprints can be significantly reduced by removing redundant points, meaning sequential points with unchanged lat/long coordinates. We can use Tracktable to create the function trim_redundant_points that will remove these points.
Example: If our trajectory consists of sequential points \(p_0\), \(p_1\), \(p_2\), \(p_3\), \(p_4\), \(p_5\), \(p_6\), \(p_7\), \(p_8\), and points \(p_1\), \(p_2\), \(p_3\), \(p_4\), \(p_6\) all occur in the exact same location, we can remove \(p_2\) and \(p_3\) without losing any information.
In [24]:
from tracktable.domain.terrestrial import Trajectory
# Determines if the latitude and longitude of two points match.
def colocated(point1, point2):
return point1[0] == point2[0] and point1[1] == point2[1]
# Determines if the timestamps of two points match.
def cotimed(point1, point2):
return point1.timestamp == point2.timestamp
# Removes the middle points from any sequence of colocated points
def trim_redundant_points(trajectory):
# Initialize our interval of points at the same location to include only the first point.
first_point_at_location = trajectory[0]
last_point_at_location = trajectory[0]
trimmed_trajectory = Trajectory()
for point in trajectory[1:]:
if colocated(first_point_at_location, point):
# Extend our interval of points at the same location to include this point,
last_point_at_location = point
else:
# Keep the first point of the interval of points at the same location.
trimmed_trajectory.append(first_point_at_location)
# Check that the endpoints of our interval of points at the same location aren't indentically timed as well.
if not cotimed(first_point_at_location, last_point_at_location):
# Keep the last point of the interval of points at the same location.
trimmed_trajectory.append(last_point_at_location)
# Reinitialize the interval of points at the same location to include only the current point.
first_point_at_location = point
last_point_at_location = point
if not cotimed(first_point_at_location, last_point_at_location):
trimmed_trajectory.extend([first_point_at_location, last_point_at_location])
return trimmed_trajectory
In [25]:
trajectories_with_redundant_points_removed = [trim_redundant_points(trajectory) for trajectory in trajectories]
We still have the same number of trajectories…
In [26]:
len(trajectories)
Out[26]:
279
In [27]:
len(trajectories_with_redundant_points_removed)
Out[27]:
279
… but we have reduced the number of trajectory points in our data:
In [28]:
tutorial.print_statistics(trajectories, 'total points')
The total number of points in the given trajectory list is 8633.
In [29]:
tutorial.print_statistics(trajectories_with_redundant_points_removed, 'total points')
The total number of points in the given trajectory list is 7914.
Filtering by Time Window
We can remove all trajectories occurring outside of a given time range as follows:
In [30]:
from datetime import timedelta, timezone, datetime
# This format can be changed to match your data.
format = '%Y-%m-%d %H:%M:%S'
# Let's trim down to the first thirty minutes of the day.
begin = datetime.strptime('2020-06-30 00:00:00', format)
end = datetime.strptime('2020-06-30 00:30:00', format)
# Matching our begin/end timestamps to time zones of the Tracktable trajectories.
begin = begin.replace(tzinfo=timezone.utc)
end = end.replace(tzinfo=timezone.utc)
In [31]:
trajectories_in_first_thirty_minutes = [trajectory for trajectory in trajectories
if trajectory[0].timestamp - begin >= timedelta(0) and end - trajectory[-1].timestamp >= timedelta(0)]
How many trajectories occurred only in the first thirty minutes of the day?
In [32]:
len(trajectories_in_first_thirty_minutes)
Out[32]:
13
Filtering by Polygon
We can remove all trajectories that are within a given polygon, such as shorelines, rivers and borders, as follows:
First, we need to import the relevant modules so we can filter and load polygons/trajectories.
In [33]:
from tracktable.info.shorelines import shoreline_information, all_shorelines, buffer
from tracktable.filter.trajectory import FilterByPolygon
from tracktable.rw.load import load_trajectories
from tracktable_data.data import retrieve
# These imports are for rendering our trajectories
from tracktable.render.render_trajectories import render_trajectories
import folium
Next, we need to grab the North American polygon and create a 10km buffer around it.
In [34]:
# Grab the North American Polygon
north_america_polygon = shoreline_information(3, resolution="medium")
# Create a 10km buffered polygon
buffer_distance = 10 # km
buffered_north_america_polygon = buffer(north_america_polygon.points, buffer=buffer_distance)
Now that we have our buffered and unbuffered polygons we need to load in some trajectories.
In [35]:
trajectories = load_trajectories(retrieve('NYHarbor_2020_12_first_week.traj'))
print(f'{len(trajectories)} historical trajectories successfully imported.')
Loading Trajectories: 513 trajectory [00:07, 71.01 trajectory/s]
513 historical trajectories successfully imported.
With the trajectories loaded we can now filter out trajectories that are within the buffered polygon.
In [36]:
filter = FilterByPolygon()
filter.input = trajectories
filter.polygon = buffered_north_america_polygon
filtered_trajectories = list(filter.trajectories())
print(f'{len(list(trajectories)) - len(list(filtered_trajectories))} trajectories within {buffer_distance} km of the coastline were removed.')
354 trajectories within 10 km of the coastline were removed.
To help visualize this filtering, lets render our original and filtered trajectory sets.
In [37]:
# render the unfiltered trajectories using Tracktable
map_canvas = render_trajectories(trajectories[:20])
# add in the buffered north america polygon
folium.GeoJson(buffered_north_america_polygon).add_to(map_canvas)
map_canvas
Out[37]:
In [38]:
# render the filtered trajectories using Tracktable
map_canvas = render_trajectories(filtered_trajectories[:20])
# add in the buffered north america polygon
folium.GeoJson(buffered_north_america_polygon).add_to(map_canvas)
map_canvas
Out[38]:
Additional Functionality
Tracktable’s geomath module contains many additional functions not shown here. For a complete list, please see the geomath documentation.
\(^1\) Bureau of Ocean Energy Management (BOEM) and National Oceanic and Atmospheric Administration (NOAA). MarineCadastre.gov. AIS Data for 2020. Retrieved February 2021 from marinecadastre.gov/data. Trimmed down to the first hour of June 30, 2020, restricted to in NY Harbor.