Skip to content

TraSGen: A Distributed, Scalable and Continuous Trajectory Stream Generator

License

Notifications You must be signed in to change notification settings

aistairc/TraSGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TraSGen: A Distributed, Scalable and Continuous Realistic Trajectory Stream Generator

A brief description of essential requirements for operating TraSGen are given below:

External Dependencies

TraSGen accepts any road network directly in GeoJSON format provided the geometry types are "LineString" or "MultiLineString". The geometries can have additional properties, for instance max road segment speed, number of lanes, road type, etc. which can be used by moving object speed model to compute a moving object's next location on the network. TraSGen converts GeoJSON file into directed weighted graph using JGraphT library, which is a Java library of graph data structures and algorithms. TraSGen converts GeoJSON file to Graph only once in the beginning when the file is read, thus it does not affect the overall TraSGen performance.

The GeoJSON network of any area, city, or country is available for download for free from OpenStreetMap. Geojson.io provides an interactive tool to build custom GeoJSON network. Similarly, GeoJSONMaps provides a web API to build and download GeoJSON maps.

Configuration Parameters

To run TrasGen, the main class called "DataGen.StreamingJob" needs to be executed where upon a flink job called "TrasGen" will be instantiated that starts trajectory generation. TrasGen can be controlled by setting the parameters in the "spatialdatagen-conf.yml" file which are then read by the main class "DataGen.StreamingJob". The default values of the parameters are set in the mentioned file. A brief description for these parameters is given below.

Cluster

Properties for Apache Flink cluster

Name clusterMode
Value {True, False}
Description Set True to run TraSGen on a multi node Flink cluster. False for standalone mode.
Name parallelism
Value 1~number of parallel instances
Description The number of parallel instances in clusterMode is equal to the number of Task Slots available in the Flink cluster.

Output Parameters (output)

Output sink type and format for the generated trajectories.

Name option
Value {"kafka", "file"}
Description The type of output sink for the output of trajectory points. Select option 'kafka' for an Apache Kafka sink or file to save the data locally in a txt file.
Name outputformat
Value {"GeoJSON", "WKT"}
Description The output format for the trajectory points.
  • kafka: Define the properties of the kafka output sink.
Name outputTopicName
Value String
Description The name of the kafka topic designated as the output sink
Name bootStrapServers
Value IP address
Description If output "option" is selected as "kafka" then provide the IP address and port of kafka server here. For multi-node kafka server the IP addresses can be defined in comma delimited format. E.g "172.16.0.64:9092, 172.16.0.81:9092"
  • file: Specify the path of the output file.
Name outputDirName
Value File path
Description If output "option" is selected as "file" then provide the path of the file here.
  • data: Properties of the output trajectory stream.
Name dateFormat
Value "yyyy-MM-dd HH:mm:ss" and variations, "unix"
Description Output format for the timestamp of each trajectory data point. If "unix",timeStamps will be in Epoch/UNIX GMT time.
Name initialTimeStamp
Value Any timestamp provided in the format defined by "dateFormat",
"system"
Description Starting timestamp of the first point in the trajectories. If "system" then the current time is used for all timestamps.
Name timeStep
Value Integer
Description A discrete increment in milliseconds for each timestamp of subsequent points in a trajectory. Please note that this not the actual wait time between the generation of points but rather just an incremental value for the labelling of timeSteps for each trajectory point after the initial time step.
Name randomizeTimeInBatch
Value {True, False}
Description Set False to set the same timestamp for all parallel generated trajectory points. Set True for all parallel points generated in parallel to have a different timestamp from each other.
Name objIDRange
Value [1, N+1]
Description Total number of N trajectories to be generated by the generator
Name nRows
Value Integer
Description The combined total number of output points to be generated by all trajectories. Set to "-1" to generate unlimited number of points until all trajectories reach their end points.
Name consecutiveTrajTuplesIntervalMilliSec
Value Integer
Description The rate of data generation. Set 0 for max throughput. Any value greater than 0 will be the real-time delay in milliseconds in between generation of consecutive points in a trajectory

Query Parameters (query)

Control options for trajectory generation

  • mappedTrajectories: Control parameters for trajectory generation on network
Name mapFile
Value File path
Description Absolute file path for the GeoJSON road map file generate trajectory points. Supported trajectories are "LineString" and "MultiLineString"
Name shortestPathAlgorithm
Value { "dijkstra", "astar"}
Description The shortest path algorithm to be used to define the route of the trajectories
Name interWorkersDataSharing
Value {"none", "redis", "broadcast"}
Description The traffic congestion information exchange methodology to be used between the workers.
Name sync
Value {True, False}
Description If "interWorkersDataSharing" is set to be "broadcast then set "sync" to true to ensure the latest trajectory point is generated using the latest traffic information i.e waiting until the the road traffic update tables are fully updated. Set false to generate trajectory points without waiting for the road traffic update tables to complete their update.
Name syncPercentage
Value Double value between 0.0~100.0
Description If "interWorkersDataSharing" is set to be "broadcast and "sync" to true then select how much road traffic information should be shared among workers. 100.0 to wait until the table is updated using all of the traffic tuples generated by all trajectories. 0.0 to update the traffic table using the least amount of traffic tuples (one). The higher the "syncPercentage" the lower the trajectory generation rate or time.
Name trajStartEndSelectionApproach
Value {"random", "userdefined", "region"}
Description Define how to select start and end points of trajectories in a network. "random" uses a random generator to randomly select the start-end points from the entire road network. "userdefined" uses the start-end pairs provided by the user. "region" randomly selects start-end points from an area or areas selected by the user.
Name trajStartEndCoordinatePairs
Value

{[Pair A], [Pair B], …[Pair N]}

where [Pair n] is defined as

[start_longitude_n, start_latitude_n, 0.0, end_ longitude_n, end_ latitude_n, 0.0]

Description Define the start-end coordinate pairs manually selected by the user if “trajStartEndSelectionApproach” is set to “userdefined”. If the coordinates do not lie on the road network then the nearest point on the road network is selected.
Name trajStartPolygons
Value

{[Polygon A], [Polygon B] …[Polygon N]}

where coordinate points in [Polygon N] are defined as

[point1_longitude, point1_latitude, 0.0, point2_ longitude, point2_ latitude, 0.0…. pointX_longitude, pointX_latitude, 0.0]

Description If “trajStartEndSelectionApproach” is set to “region”. Define a set of region polygons in which the randomly selected starting points of the trajectories should lie within.
Name trajEndPolygons
Value

{[Polygon A], [Polygon B] …[Polygon N]}

where coordinate points in [Polygon N] are defined as

[point1_longitude, point1_latitude, 0.0, point2_ longitude, point2_ latitude, 0.0…. pointX_longitude, pointX_latitude, 0.0]

Description If “trajStartEndSelectionApproach” is set to “region”. Define the set of polygons in which the randomly selected ending points of the trajectories should lie within.
Name displacementMetersPerSecond
Value Double >= 1
Description The default speed in meters per second of the trajectories.

Redis Parameters (redis)

Define properties for Redis cluster. Set these values if "interWorkersDataSharing" is selected as "redis".

Name redisServerType
Value {"standalone", "cluster"}
Description Set "cluster" if you have a redis custer available otherwise use "standalone" if you have a local redis installation.
Name redisAddresses
Value IP address
Description

Provide the IP addresses of redis server in comma delimited format. For example a redis server composed of three nodes:

"redis://172.16.0.126:6379, redis://172.16.0.70:6379, redis://172.16.0.121:6379"

Valid Geometries (For Synthetic Trajecotry Generation)

Geometry Number of Points/Vertices Number of Holes Number of Geometries (for Multi Geometries only) Geometry/MultiGeometry Generation Algorithm
Point NA NA NA NA
LineString 3 ~ 360 NA NA 0 (Arc)
LineString 3 ~ 1000 NA NA 1 (Vertical)
LineString 3 ~ 1000 NA NA 2 (Horizontal)
Polygon 4 ~ 1000 0 ~ 4 NA 0 (Box)
Polygon 4 ~ 20 0 NA 1 (Arc)
Polygon 5 ~ 20 2 ~ 4 NA 1 (Arc)
MultiPoint 3 ~ 1000 NA 2,3,4,6,8,10 0 (Box)
MultiPoint 3 ~ 1000 NA 2 ~ 1000 1 (Horizontal)
MultiPoint 3 ~ 1000 NA 2 ~ 1000 2 (Vertical)
MultiLineString 3 ~ 1000 NA 2,3,4,6,8,10 0 (Box)
MultiLineString 3 ~ 1000 NA 2 ~ 1000 1 (Horizontal)
MultiLineString 3 ~ 1000 NA 2 ~ 1000 2 (Vertical)
MultiPolygon 4 ~ 1000 0 ~ 4 2,3,4,6,8,10 0 (Box)
MultiPolygon 4 ~ 20 0 2 ~ 1000 1 (Horizontal)
MultiPolygon 4 ~ 20 2 ~ 4 2 ~ 1000 2 (Vertical)

About

TraSGen: A Distributed, Scalable and Continuous Trajectory Stream Generator

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages