Transforms a GTFS file into a directed acyclic graph of actual connections.
A connection is the combination of a departure and its successive arrival of the same trip. Our goal is to retrieve a list of connections that is sorted by departure time, better known as a Directed Acyclic Graph. This way, route planning algorithms can be performed.
More information and live demo at https://linkedconnections.org
Install it using the Node Package Manager (npm).
npm install -g gtfs2lc
If you haven’t yet picked a GTFS file you want to work with, different repositories exist. Our favorite ones:
Yet, you may also directly ask your local public transport authority for a copy.
Mind that we have not tested our code with all GTFS files yet, and there are known limitations.
You can use your favorite unzipper. E.g., unzip gtfs.zip
should work fine.
This process is now run automatically so you can skip to Step 4. But you can still use it independently using the enclosed bash script gtfs2lc-clean <path>
. Next to cleaning and sorting, it also unifies newlines and removes UTF-8 artifacts.
If step 4 would not give the desired result, you might want to tweak the script manually. In order for our script to work:
- stop_times.txt must be ordered by
trip_id
andstop_sequence
. - calendar.txt must be ordered by
service_id
. - calendar_dates.txt must be ordered by
service_id
.
Successfully finished the previous steps? Then you can now generate actual departure and arrival pairs (connections) as follows:
gtfs2lc /path/to/extracted/gtfs -f json
We support other formats such as csv
as well.
For big GTFS files, your memory may not be sufficient. Luckily, we’ve implemented a way to use your hard disk instead of your RAM. You can enable this with an option: gtfs2lc /path/to/extracted/gtfs -f json --store LevelStore
.
It may also be the case that your disk has limited storage space. In that case you may want to use the --compressed
option.
When you download a new GTFS file, all identifiers in there might change and conflict with your previous export. Therefore, we need to think about a way to create global identifiers for the connections, trips, routes and stops in our system. As we are publishing our data on the Web, we will also use Web addresses for these global identifiers.
See baseUris-example.json
for an example on URI templates of what a stable identifier strategy could look like. Copy it and edit it to your likings. For a more detailed explanation of how to use the URI templates see the description at our GTFS-RT2LC
tool, which uses the same strategy.
Now you can generate Linked Data in JSON-LD as follows:
gtfs2lc /path/to/extracted/gtfs -f jsonld -b baseUris.json
That’s it! Want to serve your Linked Connections over HTTP? Take a look at our work over here: The Linked Connection’s server (WIP)
In GTFS, joining and splitting trains are fixed in a horrible way. See https://support.google.com/transitpartners/answer/7084064?hl=en for more details.
In Linked Connections, we can solve this gracefully by adding a nextConnection array to every connection. A splitting train is then, on the last connection before it is split, indicate 2 nextConnection items.
On your newline delimited jsonld file, you can perform this script in order to make that work: linkedconnections-joinandsort yourconnectionsfile.nldjsonld
Next to the jsonld format, we’ve also implement the “mongold
” format. It can be directly used by the command mongoimport
as follows:
gtfs2lc /path/to/extracted/gtfs -f mongold -b baseUris.json | mongoimport -c myconnections
Mind that only MongoDB starting version 2.6 is supported and mind that it doesn’t work at this moment well together with the post-processing step of joining trips.
For more options, check gtfs2lc --help
We first convert stop_times.txt
to connection rules called connections.txt
.
Service dates are processed through calendar_dates.txt
and calendar.txt
, that was processed at the same time.
In the final step, the connection rules are expanded towards connections by joining the days, service ids and connectionRules.
Post-processing steps work directly on the output stream, and can map the output stream to Linked Data. Connections2JSONLD is the main class to look at.
Another post-processing step is introduced to fix joining and splitting trips.
At this moment we've only implemented a conversion from the Stop Times to connections. However, in future work we will also implement a system for describing trips and routes, a system for transit stops and a system for transfers in Linked Data.
Furthermore, also frequencies.txt
is not supported at this time. We hope to support this in the future though.
-
Pieter Colpaert [email protected]
-
Julián Rojas [email protected]