Tutorial 2: How to Create Trajectories from a Delimited File

In [1]:

import tracktable.examples.tutorials.tutorial_helper as tutorial

Purpose

This notebook demonstrates how to create Tracktable trajectories from points stored in a delimited text (CSV, TSV, etc.) data file. A data file must contain the following columns in order to be compatible with Tracktable:

an identifier that is unique to each object
a timestamp
longitude
latitude

Both ordering and headers for these columns can vary, but they must exist in the file. Each row of the data file should represent the information for a single trajectory point.

IMPORTANT: delimited files must be sorted in increasing order by timestamp to be compatible with Tracktable.

Note: If you want to create individual TrajectoryPoint objects but not assemble them into Trajectory objects, please see Tutorial 1.

Step 1: Set up a TrajectoryPointReader object.

We will use the provided example data \(^1\) for this tutorial. We’ll re-use the code from Tutorial 1 to create a TrajectoryPointReader object.

In [2]:

reader = tutorial.create_point_reader()

Step 2: Create an AssembleTrajectoryFromPoints object.

This will build trajectories from the individual points.

In [3]:

from tracktable.applications.assemble_trajectories import AssembleTrajectoryFromPoints

builder = AssembleTrajectoryFromPoints()

The builder expects an iterable of points as its input. As we saw in Tutorial 1, the reader provides that iterable.

In [4]:

builder.input = reader

Identifying New Trajectories

The trajectory assembler turns points into trajectories by grouping them according to their object ID. It has three additional parameters. Separation time and separation distance control whether two points with the same object ID belong to the same trajectory. The minimum point count controls whether or not to discard too-short trajectories.

Separation Distance: How far apart (in km) should sequential points (with the same object ID) have to be before we consider them separate trajectories? This parameter is optional and defaults to None, which disables the separation distance check.

In [5]:

builder.separation_distance = 10 # km

Separation Time: How far apart (in time) should sequential points (with the same object ID) have to be before we consider them separate trajectories? This parameter is optional and defaults to 30 minutes. Setting it to None disables the separation time check.

In [6]:

from datetime import timedelta

builder.separation_time = timedelta(minutes = 20)

Minimum Point Count: What is the minimum number of points that a trajectory must have? Any trajectories assembled with fewer than this number will be discarded. This parameter is optional and defaults to 2 points. Setting it to 0 or None disables the minimum length check.

In [7]:

builder.minimum_length = 5 # points

Step 3: Assemble Trajectories from Point Data

Like the point reader, the trajectory builder provides its output as an iterable. We pull the contents of that iterable into a list so that we can access it at our convenience.

In [8]:

trajectories = list(builder)

INFO:tracktable.applications.assemble_trajectoriesAssembleTrajectoryFromPoints:New trajectories will be declared after a separation of 10 distance units between two points or a time lapse of at least 0:20:00 (hours, minutes, seconds).
INFO:tracktable.applications.assemble_trajectoriesAssembleTrajectoryFromPoints:Trajectories with fewer than 5 points will be discarded.
INFO:tracktable.applications.assemble_trajectoriesAssembleTrajectoryFromPoints:Done assembling trajectories. 279 trajectories produced and 22 discarded for having fewer than 5 points.

How many trajectories do we have?

In [9]:

len(trajectories)

Out[9]:

Step 4: Accessing Trajectory Information

For each trajectory, we can access the following information:

trajectory.object_id: a string identifier that is unique to each moving object. It is possible (and common, in some data sets) for multiple trajectories to share a single object ID.
trajectory.trajectory_id: a string identifier that is mostly-unique to each trajectory, created by concatenating the object ID, start timestamp and end timestamp together

This is demonstrated below for our first ten trajectories.

In [10]:

for trajectory in trajectories[:10]:
    object_id      = trajectory.object_id
    trajectory_id  = trajectory.trajectory_id

    print(f'Object ID: {object_id}')
    print(f'Trajectory ID: {trajectory_id}\n')

Object ID: 367109000
Trajectory ID: 367109000_20200630000104_20200630002505

Object ID: 367484710
Trajectory ID: 367484710_20200630000243_20200630002642

Object ID: 367000140
Trajectory ID: 367000140_20200630000000_20200630005959

Object ID: 366999618
Trajectory ID: 366999618_20200630000000_20200630005949

Object ID: 367776270
Trajectory ID: 367776270_20200630000000_20200630005952

Object ID: 367022550
Trajectory ID: 367022550_20200630000000_20200630005919

Object ID: 367515850
Trajectory ID: 367515850_20200630000000_20200630005941

Object ID: 367531640
Trajectory ID: 367531640_20200630000000_20200630005950

Object ID: 338531000
Trajectory ID: 338531000_20200630000000_20200630005955

Object ID: 366516370
Trajectory ID: 366516370_20200630000000_20200630005940

Step 5: Accessing Trajectory Point Information

Let’s look at just the first trajectory in our list:

In [11]:

trajectory = trajectories[0]

Trajectory points can be accessed in a trajectory object using list indexing. So, we can get the first point in our trajectory as follows:

In [12]:

trajectory_point = trajectory[0]

The information from the required columns of the csv can be accessed for a single trajectory_point object as

unique object identifier: trajectory_point.object_id
timestamp: trajectory_point.timestamp
longitude: trajectory_point[0]
latitude: trajectory_point[1]

The optional column information is available through the member variable properties as follows: trajectory_point.properties['what-you-named-it'].

Below, we access all of the information stored in our trajectory_point object.

In [13]:

object_id    = trajectory_point.object_id
timestamp    = trajectory_point.timestamp
longitude    = trajectory_point[0]
latitude     = trajectory_point[1]
heading      = trajectory_point.properties["heading"]
vessel_name  = trajectory_point.properties["vessel-name"]
eta          = trajectory_point.properties["eta"]

print(f'Unique ID: {object_id}')
print(f'Timestamp: {timestamp}')
print(f'Longitude: {longitude}')
print(f'Latitude: {latitude}')
print(f'Heading: {heading}')
print(f'Vessel Name: {vessel_name}')
print(f'ETA: {eta}\n')

Unique ID: 367109000
Timestamp: 2020-06-30 00:01:04+00:00
Longitude: -74.2053
Latitude: 40.60922
Heading: 189.0
Vessel Name: OVERSEAS HOUSTON
ETA: 2020-06-30 13:40:04+00:00

What can we do with trajectories in Tracktable?

Compact trajectory storage using .traj files is discussed in Tutorials 3 (Write Trajectories to File) & 4 (Read Traj File).
Tutorials 5A (Interactive Visualization) & 5B (Static Visualization) demonstrate how to visualize trajectories with Tracktable.
Filtering trajectories using the capabilities provided by the geomath module is shown in Tutorial 6 (Trajectory Filtering).

\(^1\) Bureau of Ocean Energy Management (BOEM) and National Oceanic and Atmospheric Administration (NOAA). MarineCadastre.gov. AIS Data for 2020. Retrieved February 2021 from marinecadastre.gov/data. Trimmed down to the first hour of June 30, 2020, restricted to in NY Harbor.