-
Notifications
You must be signed in to change notification settings - Fork 76
Querying SPADE
SPADE allows stored provenance to be queried with a command line client. Support for this can be activated in the SPADE Kernel using the controller:
-> add analyzer CommandLine
Adding analyzer CommandLine... done
SPADE supports several types of queries. They are invoked using the query client, which is started with the following command (from within the SPADE/bin
directory):
spade query
The following will appear:
SPADE Query Client
->
Help for the query client can be printed using the command:
help [ all | control | constraint | graph | env ]
The SPADE storage that is to be queried can be specified using the command:
set storage <storage class name>
Alternatively, a default storage can be set in cfg/spade.client.CommandLine.config
.
Currently, three types of storage can be queried: Quickstep, PostgreSQL, or Neo4j.
To query the configured Quickstep database:
set storage Quickstep
To query the configured Postgres relational database:
set storage PostgreSQL
To query the configured Neo4j graph database:
set storage Neo4j
Querying can then commence using the commands described here.
Use exit
to leave the query client.
Upon starting, the query client tries to execute the commands in cfg/spade.client.CommandLine.config
. Upon exit, it saves the query history at that point to the same file. This allows query state (such as the current storage and environment variables) to be automatically restored in subsequent sessions.
Constraints are used to define the properties of vertices and edges that are retrieved during querying. Each constraint has the form:
<constraint_name> = [ not ] "<key>" <comparison_operator> '<value>' [ and|or [ not ] "<key>" <comparison_operator> '<value>' ]*
where <constraint_name>
is a label that allows the constraint to be referenced in queries and must start with %
; <key>
specifies an annotation name; <comparison_operator>
defines the relationship that must hold with the specified <value>
. The operators supported are ==
, !=
, >
, <
, >=
, <=
, and like
(for pattern matching).
For example, this constraint will match graph elements that contain a pid
annotation with a value of 1
.
-> %init_process = "pid" == '1'
This constraint will match elements that contain the annotation name
and a value that ends with fox
:
-> %name_ends_with_fox = "name" like '%fox'
This constraint matches elements that have an annotation event id
with a value in the numeric range that starts at 1000
and ends at 2000
:
-> %events_1k_to_2k = "event id" >= '1000' and "event id" <= '2000'
Currently defined constraints can be viewed with:
-> list constraints
The expression that a constraint is currently bound to can seen with:
-> dump <constraint_name>
For example, the constraint %init_process
(defined above) can be seen with:
-> dump %init_process
(pid == 1)
As a convenience, certain query arguments do not need to be provided explicitly. In this case, the default value can be specified by setting it in the environment. Currently, these variables can be set: maxDepth
, limit
.
All supported environment variables can be listed using the command:
list env
If the parameter maxDepth
is not defined in a path or lineage query, its value is retrieved from the environment. Similarly, a limit
query will use the default from the environment if an explicit value is not provided.
The environment variables can be set, unset, and printed using the commands:
-> env set maxDepth 10 # Sets the variable 'maxDepth' to value 10
OK
-> env set limit 20 # Sets the variable 'limit' to value 20
OK
-> env unset maxDepth # Removes the binding for variable 'maxDepth'
OK
-> env print limit # Prints the value of the variable 'limit'
20
-> env print maxDepth # Prints UNDEFINED if there is no current binding for the variable 'maxDepth'
UNDEFINED
The getVertex
and getEdge
functions find all the provenance vertices or edges, respectively, in a particular graph that have specific properties. The properties are framed as an expression that can be evaluated by the underlying storage(s). If a constraint is not specified, all vertices or edges in the graph will be returned.
For example, this will find vertices that have an annotation with key type
and value Process
:
-> %only_processes = "type" == 'Process'
-> $all_processes = $base.getVertex(%only_processes)
$base
is a special variable that represents the entire graph.
To retrieve at most 10
such vertices, use:
-> $ten_processes = $all_processes.limit(10)
getNeighbor
finds the immediate ancestors or descendants of a set of vertices in a given graph. It takes two arguments:
- a graph variable/expression that defines a set of initial vertices, and
-
ancestors
to find parents, ordescendants
to find children.
For example, to find all the processes (and threads) created by firefox
in the global $base
graph:
-> %firefox_constraint = "name" == 'firefox'
-> $firefox_vertices = $base.getVertex(%firefox_constraint)
-> $firefox_children = $base.getNeighbor($firefox_vertices, 'descendants')
getPath
finds all paths between vertices. At minimum, it takes the following arguments:
- a graph variable/expression specifying the set of source vertices,
- a graph variable/expression specifying the set of destination vertices, and
- the maximum length of a path from the source to destination vertices. Optionally, further pairs of another destination graph variable/expression and maximum path length can be specified.
For example, this finds all paths of length at most 7
from vertices with firefox
as the value of their name
key to vertices with /etc/passwd
as the value of their path
key:
-> %source_constraint = "name" == 'firefox'
-> $sources = $base.getVertex(%source_constraint)
-> %destination_constraint = "path" == '/etc/passwd'
-> $destinations = $base.getVertex(%destination_constraint)
-> $paths = $base.getPath($sources, $destinations, 7)
Here is another example illustrating the use of optional additional arguments:
-> $paths = $base.getPath($first, $second, 10, $third, 11, $fourth)
Above, any paths between the vertices in the $first
and $fourth
graphs that pass through the vertices in the $second
and $third
graphs are found. Specifically, this will find paths from $first
to $second
with maximum length 10
, followed by paths from $second
to $third
with maximum length 11
, and finally from $third
to $fourth
. Note that the maximum path length to $fourth
was not specified. Its value was transparently retrieved from the environment variable for maxDepth
.
getLineage
finds the ancestors or descendants of given vertices. It takes three arguments:
- a graph variable/expression specifying the sources of the lineage.
- an optional natural number that specifies the maximum number of levels that should be retrieved. If this is not specified, the environment variable
maxDepth
is used. - (any prefix of) either
ancestors
,descendants
orboth
. It indicates the direction of traversal.
For example, this query will find 5
levels of descendants, starting from the vertices with firefox
as their name
:
-> %firefox_constraint = "name" == 'firefox'
-> $initial_vertices = $base.getVertex(%firefox_constraint)
-> $lineage = $base.getLineage($initial_vertices, 5, 'ancestors')
Each query response is displayed in the client. To export one to a file, run this command before issuing the query:
-> export > /tmp/will_write_result_of_next_command_here
A graph can also be exported in Graphviz DOT format. For example:
-> %test_constraint = "name" == 'galileo'
-> $test_vertices = $base.getVertex(%test_constraint)
-> export > /tmp/galileo.dot
-> dump all $test_vertices
Output exported to file /tmp/galileo.dot
For convenience, SPADE query client supports loading queries from a file. This can be done using the command:
-> load /tmp/SPADE-queries
Each line in the file /tmp/SPADE-queries
is treated as a complete SPADE query. Upon error in the execution of a query in the file, the remaining queries are discarded with an error message. Comments can be added by starting a line with #
.
This material is based upon work supported by the National Science Foundation under Grants OCI-0722068, IIS-1116414, and ACI-1547467. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
- Setting up SPADE
- Storing provenance
-
Collecting provenance
- Across the operating system
- Limiting collection to a part of the filesystem
- From an external application
- With compile-time instrumentation
- Using the reporting API
- Of transactions in the Bitcoin blockchain
- Filtering provenance
- Viewing provenance
-
Querying SPADE
- Illustrative example
- Transforming query responses
- Protecting query responses
- Miscellaneous