Skip to content

Latest commit

 

History

History
823 lines (497 loc) · 47.6 KB

RULES.md

File metadata and controls

823 lines (497 loc) · 47.6 KB

Implemented rules

Rules are declared in the ValidationRules class. Below are details of currently implemented rules.

Table of Errors

Error ID Error Title
E001 Not in POSIX time
E002 stop_time_updates not strictly sorted
E003 GTFS-rt trip_id does not exist in GTFS data
E004 GTFS-rt route_id does not exist in GTFS data
E006 Missing required trip field for frequency-based exact_times = 0
E009 GTFS-rt stop_sequence isn't provided for trip that visits same stop_id more than once
E010 location_type not 0 in stops.txt (Note that this is implemented but not executed because it's specific to GTFS - see issue #126)
E011 GTFS-rt stop_id does not exist in GTFS data
E012 Header timestamp should be greater than or equal to all other timestamps
E013 Frequency type 0 trip schedule_relationship should be UNSCHEDULED or empty
E015 All stop_ids referenced in GTFS-rt TripUpdates and VehiclePositions feeds must have the location_type = 0
E016 trip_ids with schedule_relationship ADDED must not be in GTFS data
E017 GTFS-rt content changed but has the same header timestamp
E018 GTFS-rt header timestamp decreased between two sequential iterations
E019 GTFS-rt frequency type 1 trip start_time must be a multiple of GTFS headway_secs later than GTFS start_time
E020 Invalid start_time format
E021 Invalid start_date format
E022 Sequential stop_time_update times are not increasing
E023 trip start_time does not match first GTFS arrival_time
E024 trip direction_id does not match GTFS data
E025 stop_time_update departure time is before arrival time
E026 Invalid vehicle position
E027 Invalid vehicle bearing
E028 Vehicle position outside agency coverage area
E029 Vehicle position far from trip shape
E030 GTFS-rt alert trip_id does not belong to GTFS-rt alert route_id in GTFS trips.txt
E031 Alert informed_entity.route_id does not match informed_entity.trip.route_id
E032 Alert does not have an informed_entity
E033 Alert informed_entity does not have any specifiers
E034 GTFS-rt agency_id does not exist in GTFS data
E035 GTFS-rt trip.trip_id does not belong to GTFS-rt trip.route_id in GTFS trips.txt
E036 Sequential stop_time_updates have the same stop_sequence
E037 Sequential stop_time_updates have the same stop_id
E038 Invalid header.gtfs_realtime_version
E039 FULL_DATASET feeds should not include entity.is_deleted
E040 stop_time_update doesn't contain stop_id or stop_sequence
E041 trip doesn't have any stop_time_updates
E042 arrival or departure provided for NO_DATA stop_time_update
E043 stop_time_update doesn't have arrival or departure
E044 stop_time_update arrival/departure doesn't have delay or time
E045 GTFS-rt stop_time_update stop_sequence and stop_id do not match GTFS
E046 GTFS-rt stop_time_update without time doesn't have arrival/departure time in GTFS
E047 VehiclePosition and TripUpdate ID pairing mismatch
E048 header timestamp not populated (GTFS-rt v2.0 and higher)
E049 header incrementality not populated (GTFS-rt v2.0 and higher)
E050 timestamp is in the future
E051 GTFS-rt stop_sequence not found in GTFS data
E052 vehicle.id is not unique

Table of Warnings

Warning ID Warning Title
W001 timestamps not populated
W002 vehicle_id not populated
W003 ID in one feed missing from the other
W004 vehicle speed is unrealistic
W005 Missing vehicle_id in trip_update for frequency-based exact_times = 0
W006 trip_update missing trip_id
W007 Refresh interval is more than 35 seconds
W008 Header timestamp is older than 65 seconds
W009 schedule_relationship not populated

Errors

All times and timestamps must be in POSIX time (i.e., number of seconds since January 1st 1970 00:00:00 UTC).

Common mistakes - Accidentally using Java's System.currentTimeMillis(), which is the number of milliseconds since January 1st 1970 00:00:00 UTC.

Possible solution - Use TimeUnit.MILLISECONDS.toSeconds(System.currentTimeMillis()) to convert from milliseconds to seconds.

References:

stop_time_updates for a given trip_id must be strictly ordered by increasing stop_sequence - this also means that no stop_sequence should be repeated.

From Stop Time Updates description:

Updates should be sorted by stop_sequence (or stop_ids in the order they occur in the trip).

From GTFS stop_times.txt:

The values for stop_sequence must be non-negative integers, and they must increase along the trip.

This validation rule is implemented for both when stop_sequence is provided in the GTFS-rt feed, and when stop_sequence is omitted from the GTFS-rt feed.

Common mistakes - Assuming that the GTFS stop_times.txt file will be grouped by trip_id and sorted by stop_sequence - while sorting the data is a good practice, it's not strictly required by the spec.

Possible solution - Group the GTFS stop_times.txt records by trip_id and sort by stop_sequence. Also, make sure that no stop_sequence is repeated in GTFS stop_times.txt.

References:

All trip_ids provided in the GTFS-rt feed must exist in the GTFS data, unless their schedule_relationship is set to ADDED.

trip says:

trip_id - The trip_id from the GTFS feed that this selector refers to.

schedule_relationship says:

If a trip is done in accordance with temporary schedule, not reflected in GTFS, then it shouldn't be marked as SCHEDULED, but marked as ADDED...

ADDED - An extra trip that was added in addition to a running schedule, for example, to replace a broken vehicle or to respond to sudden passenger load.

References:

All route_ids provided in the GTFS-rt feed must exist in the GTFS data.

trip says:

route_id - The route_id from the GTFS that this selector refers to.

References:

Frequency-based exact_times = 0 trip_updates must contain trip_id, start_time, and start_date.

References:

If a GTFS trip contains multiple references to the same stop_id (i.e., the vehicle visits the same stop_id more than once in the same trip), then GTFS-rt stop_time_updates for this trip must include stop_sequence.

From stop_time_update:

If the same stop_id is visited more than once in a trip, then stop_sequence should be provided in all StopTimeUpdates for that stop_id on that trip.

References:

(Note that this is implemented but not executed because it's specific to GTFS - see issue #126

If location_type is used in stops.txt, all stops referenced in stop_times.txt must have location_type of 0

All stop_ids referenced in GTFS-rt feeds must exist in the GTFS data in stops.txt.

From stop_time_update):

stop_id - Must be the same as in stops.txt in the corresponding GTFS feed.

From position:

stop_id - Identifies the current stop. The value must be the same as in stops.txt in the corresponding GTFS feed.

References:

No timestamps for individual entities (TripUpdate, VehiclePosition, Alerts) in the feeds should be greater than the header timestamp.

From header:

timestamp - This timestamp identifies the moment when the content of this feed has been created (in server time). In POSIX time (i.e., number of seconds since January 1st 1970 00:00:00 UTC). To avoid time skew between systems producing and consuming realtime information it is strongly advised to derive timestamp from a time server. It is completely acceptable to use Stratum 3 or even lower strata servers since time differences up to a couple of seconds are tolerable.

References:

For frequency-based exact_times=0 trips, schedule_relationship should be UNSCHEDULED or empty.

From Trip Updates -> Trip Descriptor description:

UNSCHEDULED - This trip is running and is never associated with a schedule. For example, if there is no schedule and the buses run on a shuttle service.

From trip_update.trip.schedule_relationship:

UNSCHEDULED - A trip that is running with no schedule associated to it, for example, if there is no schedule at all.

References:

All stop_ids referenced in GTFS-rt TripUpdates and VehiclePositions feeds must have the location_type = 0 in GTFS stops.txt.

Alerts may reference stops with location_type other than 0 (e.g., for pathway nodes of 2-4).

From GTFS stop_times.txt:

stop_id - ...The stop_id is referenced from the stops.txt file. If location_type is used in stops.txt, all stops referenced in stop_times.txt must have location_type of 0.

References:

Trips that have a schedule_relationship of ADDED must NOT be included in the GTFS data.

From trip.schedule_relationship:

ADDED - An extra trip that was added in addition to a running schedule, for example, to replace a broken vehicle or to respond to sudden passenger load.

From Trip Updates -> Trip Descriptor description:

Added - This trip was not scheduled and has been added. For example, to cope with demand, or replace a broken down vehicle.

References:

The GTFS-rt header timestamp value should always change if the feed contents change - the feed contents must not change without updating the header timestamp.

Common mistakes - If there are multiple instances of GTFS-realtime feed behind a load balancer, each instance may be pulling information from the real-time data source and publishing it to consumers slightly out of sync. If a GTFS-rt consumer makes two back-to-back requests, and each request is served by a different GTFS-rt feed instance, the same feed contents could potentially be returned to the consumer with different timestamps.

Possible solution - Configure the load balancer for "sticky routes", so that the consumer always receives the GTFS-rt feed contents from the same GTFS-rt instance.

References:

The GTFS-rt header timestamp should be monotonically increasing - it should always be the same value or greater than previous feed iterations if the feed contents are different.

Common mistakes - If there are multiple instances of GTFS-realtime feed behind a load balancer, each instance may be pulling information from the real-time data source and publishing it to consumers slightly out of sync. If a GTFS-rt consumer makes two back-to-back requests, and each request is served by a different GTFS-rt feed instance, the same feed contents could potentially be returned to the consumer with the most recent feed response having a timestamp that is less than the previous feed response.

Possible solution - Configure the load balancer for "sticky routes", so that the GTFS-rt consumer always receives the GTFS-rt feed contents from the same GTFS-rt instance.

References:

For frequency-based trips defined in frequencies.txt with exact_times = 1, the GTFS-rt trip start_time must be some multiple (including zero) of headway_secs later than the start_time in file frequencies.txt for the corresponding time period. Note that this doesn't not apply to frequency-based trips defined in frequencies.txt with exact_times = 0.

From trip.start_time:

start_time - ...If the trip corresponds to exact_times=1 GTFS record, then start_time must be some multiple (including zero) of headway_secs later than frequencies.txt start_time for the corresponding time period.

References:

start_time must be in the format HH:MM:SS or H:MM:SS. Note that times can exceed 24 hrs if service goes into the next service day.

From trip.start_time:

start_time - ...Format and semantics of the field is same as that of GTFS/frequencies.txt/start_time, e.g., 1:15:35 or 25:15:35.

References:

start_date must be in the YYYYMMDD format.

From trip.start_date:

start_date - The scheduled start date of this trip instance...In YYYYMMDD format.

References:

stop_time_update arrival/departure times between sequential stops should always increase - they should never be the same or decrease.

References:

For normal scheduled trips (i.e., not defined in frequencies.txt), the GTFS-realtime trip start_time must match the first GTFS arrival_time in stop_times.txt for this trip.

From trip.start_time:

start_time - The initially scheduled start time of this trip instance. When the trip_id corresponds to a non-frequency-based trip, this field should either be omitted or be equal to the value in the GTFS feed.

Common mistakes - Accidentally providing a GTFS-realtime time that is modulo 24hr, such as 00:02:00, when that trip start time in GTFS stop_times.txt is after midnight of the service day, such as 24:02:00

Possible solution - Make sure that any start_times in GTFS-realtime match that same trip start time in GTFS stop_times.txt, especially if the trip starts after midnight of the service day.

References:

GTFS-rt trip direction_id must match the direction_id in GTFS trips.txt.

From trip.direction_id:

direction_id - The direction_id from the GTFS feed trips.txt file, indicating the direction of travel for trips this selector refers to.

References:

Within the same stop_time_update, arrival and departures times can be the same, or the departure time can be later than the arrival time - the departure time should never come before the arrival time.

References:

Vehicle position must be valid WGS84 coordinates - latitude must be between -90 and 90 (inclusive), and vehicle longitude must be between -180 and 180 (inclusive).

From vehicle.position:

  • latitude - Degrees North, in the WGS-84 coordinate system.
  • longitude - Degrees East, in the WGS-84 coordinate system.

References:

Vehicle bearing must be between 0 and 360 degrees (inclusive). The GTFS-rt spec says bearing is:

...in degrees, clockwise from True North, i.e., 0 is North and 90 is East. This can be the compass bearing, or the direction towards the next stop or intermediate location. This should not be deduced from the sequence of previous positions, which clients can compute from previous data.

References:

The vehicle position should be inside the agency coverage area. Coverage area is defined by a buffer surrounding the GTFS shapes.txt data, or stops.txt locations if the GTFS feed doesn't include shapes.txt.

Buffer distance is defined by GtfsMetadata.REGION_BUFFER_METERS, and is currently 1609 meters (roughly 1 mile).

References:

The vehicle position should be within a buffer surrounding the GTFS shapes.txt data for the current trip unless there is an alert with the effect of DETOUR for this trip_id.

Buffer distance is defined by GtfsMetadata.TRIP_BUFFER_METERS, and is currently 200 meters (roughly 1/8 of a mile).

References:

The GTFS-rt alert.informed_entity.trip.trip_id should belong to the specified GTFS-rt alert.informed_entity.route_id in GTFS trips.txt.

References:

The alert.informed_entity.trip.route_id should be the same as the specified alert.informed_entity.route_id.

References:

All alerts must have at least one informed_entity.

From alert.informed_entity:

The values of the fields should correspond to the appropriate fields in the GTFS feed. At least one specifier must be given. If several are given, then the matching has to apply to all the given specifiers.

References:

Alert informed_entity should have at least one specified value (route_id, trip_id, stop_id, etc) to which the alert applies.

References:

All agency_ids provided in the GTFS-rt alert.informed_entity.agency_id should also exist in GTFS agency.txt.

References:

The GTFS-rt trip.trip_id should belong to the specified trip.route_id in GTFS trips.txt.

trip says:

If route_id is also set, then it should be same as one that the given trip corresponds to.

References:

Sequential GTFS-rt trip stop_time_updates should never have the same stop_sequence - stop_sequence must increase for each stop_time_update.

From GTFS stop_times.txt:

The values for stop_sequence must be non-negative integers, and they must increase along the trip.

Common mistakes - Repeated records in the GTFS stop_times.txt file

Possible solution - Make sure that no stop_sequence is repeated in GTFS stop_times.txt.

References:

Sequential GTFS-rt trip stop_time_updates shouldn't have the same stop_id - sequential stop_ids should be different. If a stop_id is visited more than once in a trip (i.e., a loop), and if no stop_time_updates in the loop are provided in the feed, and if the stop_sequence field of the stop where the loop starts/stops is provided in the GTFS-rt feed for the given stop_id, then this may not be an error.

References:

header.gtfs_realtime_version is required and must be a valid value. Currently, the only valid values are 1.0 and 2.0.

References:

The entity.is_deleted field should only be included in GTFS-rt feeds with header.incrementality of DIFFERENTIAL.

References:

All stop_time_updates must contain stop_id or stop_sequence - both fields cannot be left blank.

From trip.stop_time_update:

The update is linked to a specific stop either through stop_sequence or stop_id, so one of these fields must necessarily be set.

References:

Unless a trip's schedule_relationship is CANCELED, a trip must have at least one stop_time_update

References:

If a stop_time_update has a schedule_relationship of NO_DATA, then neither arrival nor departure should be provided.

From stop_time_update.schedule_relationship:

NO_DATA -> No data is given for this stop. It indicates that there is no realtime information available. When set NO_DATA is propagated through subsequent stops so this is the recommended way of specifying from which stop you do not have realtime information. When NO_DATA is set neither arrival nor departure should be supplied.

References:

If a stop_time_update doesn't have a schedule_relationship of SKIPPED or NO_DATA, then either arrival or departure must be provided.

From stop_time_update.schedule_relationship:

SCHEDULED -> The vehicle is proceeding in accordance with its static schedule of stops, although not necessarily according to the times of the schedule. This is the default behavior. At least one of arrival and departure must be provided. If the schedule for this stop contains both arrival and departure times then so must this update.

References:

If the stop_time_update.schedule_relationship is not SKIPPED, stop_time_update.arrival and stop_time_update.departure must have either delay or time - both fields cannot be missing.

Stop Time Updates description says:

The update can provide a exact timing for arrival and/or departure at a stop in StopTimeUpdates using StopTimeEvent. This should contain either an absolute time or a delay (i.e. an offset from the scheduled time in seconds).

stop_time_update.schedule_relationship says:

SKIPPED - The stop is skipped, i.e., the vehicle will not stop at this stop. Arrival and departure are optional.

References:

If GTFS-rt stop_time_update contains both stop_sequence and stop_id, the values must match the GTFS data in stop_times.txt

References:

If only delay is provided in a stop_time_update arrival or departure (and not a time), then the GTFS stop_times.txt must contain arrival_times and/or departure_times for these corresponding stops. A delay value in the real-time feed is meaningless unless you have a clock time to add it to in the GTFS stop_times.txt file.

Common mistakes - Providing a arrival/departure.delay value, but not providing a arrival/departure.time value for non-timepoint stops that do not have an arrival_time or departure_time in GTFS stop_times.txt.

Possible solution - Add a time value to the GTFS-rt feed for the arrival and departure, or add an arrival_time and departure_time in GTFS stop_times.txt.

References:

If separate VehiclePositions and TripUpdates feeds are provided, VehicleDescriptor or TripDescriptor ID value pairing should match between the two feeds.

In other words, if the VehiclePosition has a vehicle_id A that is assigned to trip_id 4, then the TripUpdate feed should have a prediction for trip_id 4 that includes a reference to vehicle_id A. If the trip_id of 4 is paired with a different vehicle_id B in one of the two feeds, this is an error.

Note that this is different from W003, which simply checks to see if an ID that is provided in one feed is provided in the other - that is a warning.

References:

timestamp must be populated in FeedHeader for gtfs_realtime_version v2.0 and higher.

References:

incrementality must be populated in FeedHeader for gtfs_realtime_version v2.0 and higher.

References:

All timestamps should be less than the current time.

header.timestamp says:

This timestamp identifies the moment when the content of this feed has been created (in server time). In POSIX time (i.e., number of seconds since January 1st 1970 00:00:00 UTC). To avoid time skew between systems producing and consuming realtime information it is strongly advised to derive timestamp from a time server. It is completely acceptable to use Stratum 3 or even lower strata servers since time differences up to a couple of seconds are tolerable.

Timestamps are flagged as being in the future if they greater than the current time plus TimestampValidator.MAX_IN_FUTURE_SECONDS, which is currently set to 60 seconds.

References:

All stop_time_update stop_sequences in GTFS-realtime data must appear in GTFS stop_times.txt for that trip.

To keep GTFS-rt validator runtime performance at O(n) for GTFS stop_times.txt (i.e., so we don't have to loop through the entire GTFS stop_times.txt for each GTFS-rt stop_time_update, which would be O(n*m)), if E051 is logged for a stop_time_update, subsequent stop_time_updates in that same GTFS-rt trip will not be checked for other errors or warnings (e.g., E046 - GTFS-rt stop_time_update without time doesn't have arrival/departure_time in GTFS).

See this issue for details.

References:

Each vehicle should have a unique ID.

From VehiclePosition.VehicleDescriptor for vehicle.id:

Internal system identification of the vehicle. Should be unique per vehicle, and is used for tracking the vehicle as it proceeds through the system. This id should not be made visible to the end-user; for that purpose use the label field

References:

Warnings

timestamps should be populated for FeedHeader, TripUpdates, VehiclePositions, and Alerts.

Including timestamps for each entity type enhances the transit rider experience, as consumers can show timestamp information to end users give them an idea of how old certain information is.

For example, when a vehicle position is shown on a map, the marker may say "Data updated 17 sec ago" (see screenshot below). If vehicle position timestamps aren't included, then the consumer must use the GTFS-rt header timestamp, which may be much more recent than the actual vehicle position, resulting in misleading information being show to end users.

image

vehicle_id should be populated for TripUpdates and VehiclePositions.

Populating vehicle_ids in TripUpdates is important so consumers can relate a given arrival/departure prediction to a particular vehicle.

If separate VehiclePositions and TripUpdates feeds are provided, a trip_id that is provided in the VehiclePositions feed should be provided in the TripUpdates feed, and a vehicle_id that is provided in the TripUpdates feed should be provided in the VehiclePositions feed.

In other words, if the VehiclePosition has a vehicle that is assigned to trip_id 4, then the TripUpdate feed should have a prediction for trip_id 4.

Note that when a vehicle is serving more than one trip in a block, it is recommended to include not only a TripUpdate for the currently served trip, but also a TripUpdate for the next trip to be served. In this case, there will not yet be a VehiclePosition for the next TripUpdate, and the W003 warning can be ignored.

Note that this is different from E047, which checks for a mismatch of IDs between the feeds - that is an error.

References:

vehicle.position.speed has an unrealistic speed that may be incorrect.

Speeds are flagged as unrealistic if they are greater than VehicleValidator.MAX_REALISTIC_SPEED_METERS_PER_SECOND, which is currently set to 26 meters per second (approx. 60 miles per hour).

Common mistakes - Accidentally setting the speed value in miles per hour, instead of meters per second.

Possible solution - Check to make sure the speed units are meters per second.

References:

Frequency-based exact_times = 0 trip_updates should contain vehicle_id. This helps disambiguate predictions in situations where more than one vehicle is running the same trip instance simultaneously.

References:

trips should include a trip_id. A missing trip_id is usually an error in the feed (especially for frequency-based exact_times = 0 trips - see E006), although the section on "Alternative trip matching" includes one exception:

Trips which are not frequency based may also be uniquely identified by a TripDescriptor including the combination of:

  • route_id
  • direction_id
  • start_time
  • start_date

...where start_time is the scheduled start time as defined in the static schedule, as long as the combination of ids provided resolves to a unique trip.

References:

GTFS-realtime feeds should be refreshed at least every 35 seconds.

The data in a GTFS-realtime feed should always be less than one minute old.

trip.schedule_relationship and stop_time_update.schedule_relationship should be populated.

References: