Tutorial 1: How to Create Trajectory Points from a Delimited File

Purpose

This notebook demonstrates how to create Tracktable Trajectory Point objects from a delimited text file (comma-separated, tab-separated, et cetera). A data file must contain the following columns in order to be compatible with Tracktable:

  • an identifier that is unique to each object

  • a timestamp

  • longitude

  • latitude

Both ordering and headers for these columns can vary, but they must exist in the file. Each row of the data file should represent the information for a single trajectory point.

IMPORTANT: delimited files must be sorted by timestamp in increasing order to be compatible with Tracktable.

Note: This notebook does not cover how to create a Trajectory object (as opposed to a list of Trajectory point objects). Please see Tutorial 2 for an example of how to create Trajectory objects from a csv file containing trajectory point information.

Step 1: Identify your CSV/TSV File

We will use the provided example data \(^1\) for this tutorial. If you are using another filename, data_filename should be set to the string containing the path to your csv file.

In [1]:
from tracktable_data.data import retrieve

data_filename = retrieve(filename='NYHarbor_2020_06_30_first_hour.csv')

Step 2: Create a TrajectoryPointReader object.

We will create a Terrestrial point reader, which will expect (longitude, latitude) coordinates. Alternatively, if our data points were in a Cartesian coordinate system, we would import the TrajectoryPointReader object from tracktable.domain.cartesian2d or tracktable.domain.cartesian3d.

In [2]:
from tracktable.domain.terrestrial import TrajectoryPointReader

reader = TrajectoryPointReader()

Step 3: Give the TrajectoryPointReader an input source.

This must be a Python file-like object. Files opened with open() are the most common use case.

In [3]:
reader.input = open(data_filename, 'r')

Additional Settings

Identify the comment character for the data file. Any lines with this as the first non-whitespace character will be ignored. This is optional and defaulted to #.

In [4]:
reader.comment_character = '#'

Identify the file’s delimiter. For comma-separated (CSV) files, the delimiter should be set to ,. For tab-separated files, this should be \t. This is optional, and the default value is ,. If your field delimiter is some other character, substitute it here.

In [5]:
reader.field_delimiter = ','

Identify the string associated with a null value in a cell. This is optional and defaults to an empty string.

In [6]:
reader.null_value = 'NaN'

Required Columns

We must tell the reader where to find the unique object ID, timestamp, longitude and latitude columns. Column numbering starts at zero.

If no column numbers are given, the reader will assume they are in the order listed above: object ID in column 0, timestamp in column 1, longitude in column 2, latitude in column 3.

Note: Tracktable stores geodetic (terrestrial) coordinates with the longitude first and latitude second.

In [7]:
reader.object_id_column = 3
reader.timestamp_column = 0
reader.coordinates[0] = 1     # longitude
reader.coordinates[1] = 2     # latitude

Optional Columns

Your data file may contain additional information (e.g. speed, heading, altitude, etc.) that you wish to store with your trajectory points. These can be stored as either floats, strings or datetime objects. An example of each is shown below, respectively.

In [8]:
reader.set_real_field_column('heading', 6)
reader.set_string_field_column('vessel-name', 7)
reader.set_time_field_column('eta', 17)

Step 4: Convert the Reader to a List of Trajectory Points

TrajectoryPointReader functions as an iterable of points. Once an instance has been configured with an input source and a list of fields.

The sometimes-inconvenient thing about iterables is that they can only be traversed once. Here we store all the points in a list so that we can access them at will.

In [9]:
trajectory_points = list(reader)

How many trajectory points do we have?

In [10]:
len(trajectory_points)
Out[10]:
8689

Step 5: Accessing Trajectory Point Info

The information from the required columns of the csv can be accessed for a single TrajectoryPoint object as

  • unique object identifier: trajectory_point.object_id

  • timestamp: trajectory_point.timestamp

  • longitude: trajectory_point[0]

  • latitude: trajectory_point[1]

The optional column information is available through the member variable properties as follows: trajectory_point.properties['what-you-named-it'].

This is demonstrated below for our first ten trajectory points.

In [11]:
for traj_point in trajectory_points[:10]:
    object_id    = traj_point.object_id
    timestamp    = traj_point.timestamp
    longitude    = traj_point[0]
    latitude     = traj_point[1]
    heading      = traj_point.properties["heading"]
    vessel_name  = traj_point.properties["vessel-name"]
    eta          = traj_point.properties["eta"]

    print(f'Unique ID: {object_id}')
    print(f'Timestamp: {timestamp}')
    print(f'Longitude: {longitude}')
    print(f'Latitude: {latitude}')
    print(f'Heading: {heading}')
    print(f'Vessel Name: {vessel_name}')
    print(f'ETA: {eta}\n')
Unique ID: 367000140
Timestamp: 2020-06-30 00:00:00+00:00
Longitude: -74.07157
Latitude: 40.64409
Heading: 246.0
Vessel Name: SAMUEL I NEWHOUSE
ETA: 2020-06-30 12:01:00+00:00

Unique ID: 366999618
Timestamp: 2020-06-30 00:00:00+00:00
Longitude: -74.02433
Latitude: 40.54291
Heading: 349.0
Vessel Name: CG SHRIKE
ETA: 2020-06-30 19:40:00+00:00

Unique ID: 367776270
Timestamp: 2020-06-30 00:00:00+00:00
Longitude: -73.97656
Latitude: 40.70324
Heading: 290.0
Vessel Name: H200
ETA: 2020-06-30 20:04:00+00:00

Unique ID: 367022550
Timestamp: 2020-06-30 00:00:00+00:00
Longitude: -74.07281
Latitude: 40.63668
Heading: 511.0
Vessel Name: SAMANTHA MILLER
ETA: 2020-06-30 08:10:00+00:00

Unique ID: 367515850
Timestamp: 2020-06-30 00:00:00+00:00
Longitude: -74.11926
Latitude: 40.64217
Heading: 163.0
Vessel Name: DISCOVERY COAST
ETA: 2020-06-30 09:53:00+00:00

Unique ID: 367531640
Timestamp: 2020-06-30 00:00:00+00:00
Longitude: -74.07176
Latitude: 40.62947
Heading: 511.0
Vessel Name: FDNY M9B
ETA: 2020-06-30 13:45:00+00:00

Unique ID: 338531000
Timestamp: 2020-06-30 00:00:00+00:00
Longitude: -74.05089
Latitude: 40.64413
Heading: 96.0
Vessel Name: GENESIS VIGILANT
ETA: 2020-06-30 09:15:00+00:00

Unique ID: 366516370
Timestamp: 2020-06-30 00:00:00+00:00
Longitude: -74.14805
Latitude: 40.64346
Heading: 302.0
Vessel Name: STEPHEN REINAUER
ETA: 2020-06-30 04:51:00+00:00

Unique ID: 367779550
Timestamp: 2020-06-30 00:00:00+00:00
Longitude: -74.00551
Latitude: 40.70308
Heading: 234.0
Vessel Name: SUNSET CROSSING
ETA: 2020-06-30 06:36:00+00:00

Unique ID: 367797260
Timestamp: 2020-06-30 00:00:00+00:00
Longitude: -73.9741
Latitude: 40.70235
Heading: 51.0
Vessel Name: H208
ETA: 2020-06-30 05:39:00+00:00

\(^1\) Bureau of Ocean Energy Management (BOEM) and National Oceanic and Atmospheric Administration (NOAA). MarineCadastre.gov. AIS Data for 2020. Retrieved February 2021 from marinecadastre.gov/data. Trimmed down to the first hour of June 30, 2020, restricted to in NY Harbor.