Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add types to the graph #82

Open
ChrisCummins opened this issue Aug 13, 2020 · 0 comments
Open

Add types to the graph #82

ChrisCummins opened this issue Aug 13, 2020 · 0 comments
Labels
Datasets Graph and text datasets Enhancement New feature or request Machine Learning Anything relevant to //deeplearning/ml4pl/models
Milestone

Comments

@ChrisCummins
Copy link
Owner

ChrisCummins commented Aug 13, 2020

Tracking issue for a significant enhancement of the graph representation.

Current representation:
llvm2graph-4-cg

Proposal:
llvm2graph-5-types

@ChrisCummins ChrisCummins added Datasets Graph and text datasets Enhancement New feature or request Machine Learning Anything relevant to //deeplearning/ml4pl/models labels Aug 14, 2020
ChrisCummins added a commit that referenced this issue Aug 14, 2020
github.com//issues/82

Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
ChrisCummins added a commit that referenced this issue Aug 14, 2020
github.com//issues/82

Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
ChrisCummins added a commit that referenced this issue Aug 18, 2020
This adds a fourth node type, and a fourth edge flow, both called
"type". The idea is to represent types as first-class elements in the
graph representation. This allows greater compositionality by breaking
up composite types into subcomponents, and decreases the required
vocabulary size required to achieve a given coverage.

Background
----------

Currently, type information is stored in the "text" field of nodes for
constants and variables, e.g.:

    node {
      type: VARIABLE
      text: "i8"
    }

There are two issues with this:

 * Composite types end up with long textual representations,
   e.g. "struct foo { i32 a; i32 b; ... }". Since there is an
   unbounded number of possible structs, this prevents 100% vocabulary
   coverage on any IR with structs (or other composite types).

 * In the future, we will want to encode different information on data
   nodes, such as embedding literal values. Moving the type information
   out of the data node "frees up" space for something else.

Overview
--------

This changes the representation to represent types as first-class
elements in the graph. A "type" node represents a type using its
"text" field, and a new "type" edge connects this type to variables or
constants of that type, e.g.:

    node {
      type: VARIABLE
      text: "var"
    }
    node {
      type: TYPE
      text: "i8"
    }
    edge {
      flow: TYPE
      source: 1
    }

Composite types
---------------

Types may be composed by connecting of many type nodes using type
edges. This allows you to break down complex types into a graph of
primitive parts. The meaning of composite types will depend on the
type of IR, the remainder describes the process for LLVM-IR.

Pointer types
-------------

A pointer is a composite of two types:

    [pointer] <- [pointed-type]

For example:

    int32_t* instance;

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      text: TYPE
      target: 1
    }
    edge {
      text: TYPE
      source: 1
      target: 2
    }

Where variables/constants of this type receive an incoming type edge
from the [pointer] node, which in turn receives an incoming type edge
from the [pointed-type] node.

One [pointer] node is generated for each unique pointer type. If a
graph contains multiple pointer types, there will be multiple
[pointer] nodes.

Struct types
------------

A struct is a compsite type where each member is a node type which
points to the parent node. Variable/constant instances of a struct
receive an incoming type edge from the root struct node. Note that
the graph of type nodes representing a composite struct type may be
cyclical, since a struct can contain a pointer of the same type (think
of a binary tree implementation).

The type edges from member nodes to the parent struct are
positional. The position indicates the element number. E.g. for a
struct with three elements, the incoming type edges to the struct node
will have positions 0, 1, and 2.

This example struct:

    struct s {
      int8_t a;
      int8_t b;
      struct s* c;
    }

    struct s instance;

Would be represented as:

    node {
      type: TYPE
      text: "struct"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      target: 1
      position: 1
    }
    edge {
      flow: TYPE
      target: 2
      position: 2
    }
    edge {
      flow: TYPE
      source: 2
    }
    edge {
      flow: TYPE
      target: 3
    }

Array Types
-----------

An array is a composite type [array] <- [element-type]. For example,
the array:

    int a[10];

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "[]"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      source: 1
      target: 2
    }

github.com//issues/82

Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
ChrisCummins added a commit that referenced this issue Aug 20, 2020
github.com//issues/82

Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
ChrisCummins added a commit that referenced this issue Aug 21, 2020
This updates the llvm2graph plots to show how a fifth "type graph" stage,
and updates the README to describe how types are added to the graph.

github.com//issues/82
ChrisCummins added a commit that referenced this issue Aug 21, 2020
This adds a fourth node type, and a fourth edge flow, both called
"type". The idea is to represent types as first-class elements in the
graph representation. This allows greater compositionality by breaking
up composite types into subcomponents, and decreases the required
vocabulary size required to achieve a given coverage.

Background
----------

Currently, type information is stored in the "text" field of nodes for
constants and variables, e.g.:

    node {
      type: VARIABLE
      text: "i8"
    }

There are two issues with this:

 * Composite types end up with long textual representations,
   e.g. "struct foo { i32 a; i32 b; ... }". Since there is an
   unbounded number of possible structs, this prevents 100% vocabulary
   coverage on any IR with structs (or other composite types).

 * In the future, we will want to encode different information on data
   nodes, such as embedding literal values. Moving the type information
   out of the data node "frees up" space for something else.

Overview
--------

This changes the representation to represent types as first-class
elements in the graph. A "type" node represents a type using its
"text" field, and a new "type" edge connects this type to variables or
constants of that type, e.g. a variable "int x" could be represented as:

    node {
      type: VARIABLE
      text: "var"
    }
    node {
      type: TYPE
      text: "i32"
    }
    edge {
      flow: TYPE
      source: 1
    }

Composite types
---------------

Types may be composed by connecting multiple type nodes using type
edges. This allows you to break down complex types into a graph of
primitive parts. The meaning of composite types will depend on the
IR being targetted, the remainder describes the process for
LLVM-IR.

Pointer types
-------------

A pointer is a composite of two types:

    [variable] <- [pointer] <- [pointed-type]

For example:

    int32_t* instance;

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      text: TYPE
      target: 1
    }
    edge {
      text: TYPE
      source: 1
      target: 2
    }

Where variables/constants of this type receive an incoming type edge
from the [pointer] node, which in turn receives an incoming type edge
from the [pointed-type] node.

One [pointer] node is generated for each unique pointer type. If a
graph contains multiple pointer types, there will be multiple
[pointer] nodes, one for each pointed type.

Struct types
------------

A struct is a compsite type where each member is a node type which
points to the parent node. Variable/constant instances of a struct
receive an incoming type edge from the root struct node. Note that
the graph of type nodes representing a composite struct type may be
cyclical, since a struct can contain a pointer of the same type (think
of a binary tree implementation).

The type edges from member nodes to the parent struct are
positional. The position indicates the element number. E.g. for a
struct with three elements, the incoming type edges to the struct node
will have positions 0, 1, and 2.

This example struct:

    struct s {
      int8_t a;
      int8_t b;
      struct s* c;
    }

    struct s instance;

Would be represented as:

    node {
      type: TYPE
      text: "struct"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      target: 1
      position: 1
    }
    edge {
      flow: TYPE
      target: 2
      position: 2
    }
    edge {
      flow: TYPE
      source: 2
    }
    edge {
      flow: TYPE
      target: 3
    }

Array Types
-----------

An array is a composite type [variable] <- [array] <- [element-type].
For example, the array:

    int a[10];

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "[]"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      source: 1
      target: 2
    }

github.com//issues/82
ChrisCummins added a commit that referenced this issue Aug 21, 2020
This updates the llvm2graph plots to show how a fifth "type graph" stage,
and updates the README to describe how types are added to the graph.

github.com//issues/82
ChrisCummins added a commit that referenced this issue Aug 21, 2020
This adds a fourth node type, and a fourth edge flow, both called
"type". The idea is to represent types as first-class elements in the
graph representation. This allows greater compositionality by breaking
up composite types into subcomponents, and decreases the required
vocabulary size required to achieve a given coverage.

Background
----------

Currently, type information is stored in the "text" field of nodes for
constants and variables, e.g.:

    node {
      type: VARIABLE
      text: "i8"
    }

There are two issues with this:

 * Composite types end up with long textual representations,
   e.g. "struct foo { i32 a; i32 b; ... }". Since there is an
   unbounded number of possible structs, this prevents 100% vocabulary
   coverage on any IR with structs (or other composite types).

 * In the future, we will want to encode different information on data
   nodes, such as embedding literal values. Moving the type information
   out of the data node "frees up" space for something else.

Overview
--------

This changes the representation to represent types as first-class
elements in the graph. A "type" node represents a type using its
"text" field, and a new "type" edge connects this type to variables or
constants of that type, e.g. a variable "int x" could be represented as:

    node {
      type: VARIABLE
      text: "var"
    }
    node {
      type: TYPE
      text: "i32"
    }
    edge {
      flow: TYPE
      source: 1
    }

Composite types
---------------

Types may be composed by connecting multiple type nodes using type
edges. This allows you to break down complex types into a graph of
primitive parts. The meaning of composite types will depend on the
IR being targetted, the remainder describes the process for
LLVM-IR.

Pointer types
-------------

A pointer is a composite of two types:

    [variable] <- [pointer] <- [pointed-type]

For example:

    int32_t* instance;

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      text: TYPE
      target: 1
    }
    edge {
      text: TYPE
      source: 1
      target: 2
    }

Where variables/constants of this type receive an incoming type edge
from the [pointer] node, which in turn receives an incoming type edge
from the [pointed-type] node.

One [pointer] node is generated for each unique pointer type. If a
graph contains multiple pointer types, there will be multiple
[pointer] nodes, one for each pointed type.

Struct types
------------

A struct is a compsite type where each member is a node type which
points to the parent node. Variable/constant instances of a struct
receive an incoming type edge from the root struct node. Note that
the graph of type nodes representing a composite struct type may be
cyclical, since a struct can contain a pointer of the same type (think
of a binary tree implementation).

The type edges from member nodes to the parent struct are
positional. The position indicates the element number. E.g. for a
struct with three elements, the incoming type edges to the struct node
will have positions 0, 1, and 2.

This example struct:

    struct s {
      int8_t a;
      int8_t b;
      struct s* c;
    }

    struct s instance;

Would be represented as:

    node {
      type: TYPE
      text: "struct"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      target: 1
      position: 1
    }
    edge {
      flow: TYPE
      target: 2
      position: 2
    }
    edge {
      flow: TYPE
      source: 2
    }
    edge {
      flow: TYPE
      target: 3
    }

Array Types
-----------

An array is a composite type [variable] <- [array] <- [element-type].
For example, the array:

    int a[10];

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "[]"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      source: 1
      target: 2
    }

github.com//issues/82
ChrisCummins added a commit that referenced this issue Aug 21, 2020
This adds a fourth node type, and a fourth edge flow, both called
"type". The idea is to represent types as first-class elements in the
graph representation. This allows greater compositionality by breaking
up composite types into subcomponents, and decreases the required
vocabulary size required to achieve a given coverage.

Background
----------

Currently, type information is stored in the "text" field of nodes for
constants and variables, e.g.:

    node {
      type: VARIABLE
      text: "i8"
    }

There are two issues with this:

 * Composite types end up with long textual representations,
   e.g. "struct foo { i32 a; i32 b; ... }". Since there is an
   unbounded number of possible structs, this prevents 100% vocabulary
   coverage on any IR with structs (or other composite types).

 * In the future, we will want to encode different information on data
   nodes, such as embedding literal values. Moving the type information
   out of the data node "frees up" space for something else.

Overview
--------

This changes the representation to represent types as first-class
elements in the graph. A "type" node represents a type using its
"text" field, and a new "type" edge connects this type to variables or
constants of that type, e.g. a variable "int x" could be represented as:

    node {
      type: VARIABLE
      text: "var"
    }
    node {
      type: TYPE
      text: "i32"
    }
    edge {
      flow: TYPE
      source: 1
    }

Composite types
---------------

Types may be composed by connecting multiple type nodes using type
edges. This allows you to break down complex types into a graph of
primitive parts. The meaning of composite types will depend on the
IR being targetted, the remainder describes the process for
LLVM-IR.

Pointer types
-------------

A pointer is a composite of two types:

    [variable] <- [pointer] <- [pointed-type]

For example:

    int32_t* instance;

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      text: TYPE
      target: 1
    }
    edge {
      text: TYPE
      source: 1
      target: 2
    }

Where variables/constants of this type receive an incoming type edge
from the [pointer] node, which in turn receives an incoming type edge
from the [pointed-type] node.

One [pointer] node is generated for each unique pointer type. If a
graph contains multiple pointer types, there will be multiple
[pointer] nodes, one for each pointed type.

Struct types
------------

A struct is a compsite type where each member is a node type which
points to the parent node. Variable/constant instances of a struct
receive an incoming type edge from the root struct node. Note that
the graph of type nodes representing a composite struct type may be
cyclical, since a struct can contain a pointer of the same type (think
of a binary tree implementation). For all other member types, a new
type node is produced. For example, a struct with two integer members
will produce two integer type nodes, they are not shared.

The type edges from member nodes to the parent struct are
positional. The position indicates the element number. E.g. for a
struct with three elements, the incoming type edges to the struct node
will have positions 0, 1, and 2.

This example struct:

    struct s {
      int8_t a;
      int8_t b;
      struct s* c;
    }

    struct s instance;

Would be represented as:

    node {
      type: TYPE
      text: "struct"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      target: 2
      position: 1
    }
    edge {
      flow: TYPE
      target: 3
      position: 2
    }
    edge {
      flow: TYPE
      source: 3
    }
    edge {
      flow: TYPE
      target: 4
    }

Array Types
-----------

An array is a composite type [variable] <- [array] <- [element-type].
For example, the array:

    int a[10];

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "[]"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      source: 1
      target: 2
    }

github.com//issues/82
ChrisCummins added a commit that referenced this issue Aug 22, 2020
This adds a fourth node type, and a fourth edge flow, both called
"type". The idea is to represent types as first-class elements in the
graph representation. This allows greater compositionality by breaking
up composite types into subcomponents, and decreases the required
vocabulary size required to achieve a given coverage.

Background
----------

Currently, type information is stored in the "text" field of nodes for
constants and variables, e.g.:

    node {
      type: VARIABLE
      text: "i8"
    }

There are two issues with this:

 * Composite types end up with long textual representations,
   e.g. "struct foo { i32 a; i32 b; ... }". Since there is an
   unbounded number of possible structs, this prevents 100% vocabulary
   coverage on any IR with structs (or other composite types).

 * In the future, we will want to encode different information on data
   nodes, such as embedding literal values. Moving the type information
   out of the data node "frees up" space for something else.

Overview
--------

This changes the representation to represent types as first-class
elements in the graph. A "type" node represents a type using its
"text" field, and a new "type" edge connects this type to variables or
constants of that type, e.g. a variable "int x" could be represented as:

    node {
      type: VARIABLE
      text: "var"
    }
    node {
      type: TYPE
      text: "i32"
    }
    edge {
      flow: TYPE
      source: 1
    }

Composite types
---------------

Types may be composed by connecting multiple type nodes using type
edges. This allows you to break down complex types into a graph of
primitive parts. The meaning of composite types will depend on the
IR being targetted, the remainder describes the process for
LLVM-IR.

Pointer types
-------------

A pointer is a composite of two types:

    [variable] <- [pointer] <- [pointed-type]

For example:

    int32_t* instance;

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      text: TYPE
      target: 1
    }
    edge {
      text: TYPE
      source: 1
      target: 2
    }

Where variables/constants of this type receive an incoming type edge
from the [pointer] node, which in turn receives an incoming type edge
from the [pointed-type] node.

One [pointer] node is generated for each unique pointer type. If a
graph contains multiple pointer types, there will be multiple
[pointer] nodes, one for each pointed type.

Struct types
------------

A struct is a compsite type where each member is a node type which
points to the parent node. Variable/constant instances of a struct
receive an incoming type edge from the root struct node. Note that
the graph of type nodes representing a composite struct type may be
cyclical, since a struct can contain a pointer of the same type (think
of a binary tree implementation). For all other member types, a new
type node is produced. For example, a struct with two integer members
will produce two integer type nodes, they are not shared.

The type edges from member nodes to the parent struct are
positional. The position indicates the element number. E.g. for a
struct with three elements, the incoming type edges to the struct node
will have positions 0, 1, and 2.

This example struct:

    struct s {
      int8_t a;
      int8_t b;
      struct s* c;
    }

    struct s instance;

Would be represented as:

    node {
      type: TYPE
      text: "struct"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      target: 2
      position: 1
    }
    edge {
      flow: TYPE
      target: 3
      position: 2
    }
    edge {
      flow: TYPE
      source: 3
    }
    edge {
      flow: TYPE
      target: 4
    }

Array Types
-----------

An array is a composite type [variable] <- [array] <- [element-type].
For example, the array:

    int a[10];

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "[]"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      source: 1
      target: 2
    }

Function Pointers
-----------------

A function pointer is represented by a type node that uniquely identifies the
*signature* of a function, i.e. its return type and parameter types. The caveat
of this is that pointers to different functions which have the same signature
will resolve to the same `fn` node. Additionally, there is no edge connecting
a function pointer type and the instructions which belong to this function.

github.com//issues/82
ChrisCummins added a commit that referenced this issue Aug 24, 2020
This adds a fourth node type, and a fourth edge flow, both called
"type". The idea is to represent types as first-class elements in the
graph representation. This allows greater compositionality by breaking
up composite types into subcomponents, and decreases the required
vocabulary size required to achieve a given coverage.

Background
----------

Currently, type information is stored in the "text" field of nodes for
constants and variables, e.g.:

    node {
      type: VARIABLE
      text: "i8"
    }

There are two issues with this:

 * Composite types end up with long textual representations,
   e.g. "struct foo { i32 a; i32 b; ... }". Since there is an
   unbounded number of possible structs, this prevents 100% vocabulary
   coverage on any IR with structs (or other composite types).

 * In the future, we will want to encode different information on data
   nodes, such as embedding literal values. Moving the type information
   out of the data node "frees up" space for something else.

Overview
--------

This changes the representation to represent types as first-class
elements in the graph. A "type" node represents a type using its
"text" field, and a new "type" edge connects this type to variables or
constants of that type, e.g. a variable "int x" could be represented as:

    node {
      type: VARIABLE
      text: "var"
    }
    node {
      type: TYPE
      text: "i32"
    }
    edge {
      flow: TYPE
      source: 1
    }

Composite types
---------------

Types may be composed by connecting multiple type nodes using type
edges. This allows you to break down complex types into a graph of
primitive parts. The meaning of composite types will depend on the
IR being targetted, the remainder describes the process for
LLVM-IR.

Pointer types
-------------

A pointer is a composite of two types:

    [variable] <- [pointer] <- [pointed-type]

For example:

    int32_t* instance;

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      text: TYPE
      target: 1
    }
    edge {
      text: TYPE
      source: 1
      target: 2
    }

Where variables/constants of this type receive an incoming type edge
from the [pointer] node, which in turn receives an incoming type edge
from the [pointed-type] node.

One [pointer] node is generated for each unique pointer type. If a
graph contains multiple pointer types, there will be multiple
[pointer] nodes, one for each pointed type.

Struct types
------------

A struct is a compsite type where each member is a node type which
points to the parent node. Variable/constant instances of a struct
receive an incoming type edge from the root struct node. Note that
the graph of type nodes representing a composite struct type may be
cyclical, since a struct can contain a pointer of the same type (think
of a binary tree implementation). For all other member types, a new
type node is produced. For example, a struct with two integer members
will produce two integer type nodes, they are not shared.

The type edges from member nodes to the parent struct are
positional. The position indicates the element number. E.g. for a
struct with three elements, the incoming type edges to the struct node
will have positions 0, 1, and 2.

This example struct:

    struct s {
      int8_t a;
      int8_t b;
      struct s* c;
    }

    struct s instance;

Would be represented as:

    node {
      type: TYPE
      text: "struct"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      target: 2
      position: 1
    }
    edge {
      flow: TYPE
      target: 3
      position: 2
    }
    edge {
      flow: TYPE
      source: 3
    }
    edge {
      flow: TYPE
      target: 4
    }

Array Types
-----------

An array is a composite type [variable] <- [array] <- [element-type].
For example, the array:

    int a[10];

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "[]"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      source: 1
      target: 2
    }

Function Pointers
-----------------

A function pointer is represented by a type node that uniquely identifies the
*signature* of a function, i.e. its return type and parameter types. The caveat
of this is that pointers to different functions which have the same signature
will resolve to the same type node. Additionally, there is no edge connecting a
function pointer type and the instructions which belong to this function.

github.com//issues/82
@ChrisCummins ChrisCummins added this to the 2.0.0 milestone Aug 30, 2020
ChrisCummins added a commit that referenced this issue Aug 30, 2020
This updates the llvm2graph plots to show how a fifth "type graph" stage,
and updates the README to describe how types are added to the graph.

github.com//issues/82
ChrisCummins added a commit that referenced this issue Aug 30, 2020
This adds a fourth node type, and a fourth edge flow, both called
"type". The idea is to represent types as first-class elements in the
graph representation. This allows greater compositionality by breaking
up composite types into subcomponents, and decreases the required
vocabulary size required to achieve a given coverage.

Background
----------

Currently, type information is stored in the "text" field of nodes for
constants and variables, e.g.:

    node {
      type: VARIABLE
      text: "i8"
    }

There are two issues with this:

 * Composite types end up with long textual representations,
   e.g. "struct foo { i32 a; i32 b; ... }". Since there is an
   unbounded number of possible structs, this prevents 100% vocabulary
   coverage on any IR with structs (or other composite types).

 * In the future, we will want to encode different information on data
   nodes, such as embedding literal values. Moving the type information
   out of the data node "frees up" space for something else.

Overview
--------

This changes the representation to represent types as first-class
elements in the graph. A "type" node represents a type using its
"text" field, and a new "type" edge connects this type to variables or
constants of that type, e.g. a variable "int x" could be represented as:

    node {
      type: VARIABLE
      text: "var"
    }
    node {
      type: TYPE
      text: "i32"
    }
    edge {
      flow: TYPE
      source: 1
    }

Composite types
---------------

Types may be composed by connecting multiple type nodes using type
edges. This allows you to break down complex types into a graph of
primitive parts. The meaning of composite types will depend on the
IR being targetted, the remainder describes the process for
LLVM-IR.

Pointer types
-------------

A pointer is a composite of two types:

    [variable] <- [pointer] <- [pointed-type]

For example:

    int32_t* instance;

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      text: TYPE
      target: 1
    }
    edge {
      text: TYPE
      source: 1
      target: 2
    }

Where variables/constants of this type receive an incoming type edge
from the [pointer] node, which in turn receives an incoming type edge
from the [pointed-type] node.

One [pointer] node is generated for each unique pointer type. If a
graph contains multiple pointer types, there will be multiple
[pointer] nodes, one for each pointed type.

Struct types
------------

A struct is a compsite type where each member is a node type which
points to the parent node. Variable/constant instances of a struct
receive an incoming type edge from the root struct node. Note that
the graph of type nodes representing a composite struct type may be
cyclical, since a struct can contain a pointer of the same type (think
of a binary tree implementation). For all other member types, a new
type node is produced. For example, a struct with two integer members
will produce two integer type nodes, they are not shared.

The type edges from member nodes to the parent struct are
positional. The position indicates the element number. E.g. for a
struct with three elements, the incoming type edges to the struct node
will have positions 0, 1, and 2.

This example struct:

    struct s {
      int8_t a;
      int8_t b;
      struct s* c;
    }

    struct s instance;

Would be represented as:

    node {
      type: TYPE
      text: "struct"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      target: 2
      position: 1
    }
    edge {
      flow: TYPE
      target: 3
      position: 2
    }
    edge {
      flow: TYPE
      source: 3
    }
    edge {
      flow: TYPE
      target: 4
    }

Array Types
-----------

An array is a composite type [variable] <- [array] <- [element-type].
For example, the array:

    int a[10];

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "[]"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      source: 1
      target: 2
    }

Function Pointers
-----------------

A function pointer is represented by a type node that uniquely identifies the
*signature* of a function, i.e. its return type and parameter types. The caveat
of this is that pointers to different functions which have the same signature
will resolve to the same type node. Additionally, there is no edge connecting a
function pointer type and the instructions which belong to this function.

github.com//issues/82
ChrisCummins added a commit that referenced this issue Jul 15, 2022
This adds a fourth node type, and a fourth edge flow, both called
"type". The idea is to represent types as first-class elements in the
graph representation. This allows greater compositionality by breaking
up composite types into subcomponents, and decreases the required
vocabulary size required to achieve a given coverage.

Background
----------

Currently, type information is stored in the "text" field of nodes for
constants and variables, e.g.:

    node {
      type: VARIABLE
      text: "i8"
    }

There are two issues with this:

 * Composite types end up with long textual representations,
   e.g. "struct foo { i32 a; i32 b; ... }". Since there is an
   unbounded number of possible structs, this prevents 100% vocabulary
   coverage on any IR with structs (or other composite types).

 * In the future, we will want to encode different information on data
   nodes, such as embedding literal values. Moving the type information
   out of the data node "frees up" space for something else.

Overview
--------

This changes the representation to represent types as first-class
elements in the graph. A "type" node represents a type using its
"text" field, and a new "type" edge connects this type to variables or
constants of that type, e.g. a variable "int x" could be represented as:

    node {
      type: VARIABLE
      text: "var"
    }
    node {
      type: TYPE
      text: "i32"
    }
    edge {
      flow: TYPE
      source: 1
    }

Composite types
---------------

Types may be composed by connecting multiple type nodes using type
edges. This allows you to break down complex types into a graph of
primitive parts. The meaning of composite types will depend on the
IR being targetted, the remainder describes the process for
LLVM-IR.

Pointer types
-------------

A pointer is a composite of two types:

    [variable] <- [pointer] <- [pointed-type]

For example:

    int32_t* instance;

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      text: TYPE
      target: 1
    }
    edge {
      text: TYPE
      source: 1
      target: 2
    }

Where variables/constants of this type receive an incoming type edge
from the [pointer] node, which in turn receives an incoming type edge
from the [pointed-type] node.

One [pointer] node is generated for each unique pointer type. If a
graph contains multiple pointer types, there will be multiple
[pointer] nodes, one for each pointed type.

Struct types
------------

A struct is a compsite type where each member is a node type which
points to the parent node. Variable/constant instances of a struct
receive an incoming type edge from the root struct node. Note that
the graph of type nodes representing a composite struct type may be
cyclical, since a struct can contain a pointer of the same type (think
of a binary tree implementation). For all other member types, a new
type node is produced. For example, a struct with two integer members
will produce two integer type nodes, they are not shared.

The type edges from member nodes to the parent struct are
positional. The position indicates the element number. E.g. for a
struct with three elements, the incoming type edges to the struct node
will have positions 0, 1, and 2.

This example struct:

    struct s {
      int8_t a;
      int8_t b;
      struct s* c;
    }

    struct s instance;

Would be represented as:

    node {
      type: TYPE
      text: "struct"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      target: 2
      position: 1
    }
    edge {
      flow: TYPE
      target: 3
      position: 2
    }
    edge {
      flow: TYPE
      source: 3
    }
    edge {
      flow: TYPE
      target: 4
    }

Array Types
-----------

An array is a composite type [variable] <- [array] <- [element-type].
For example, the array:

    int a[10];

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "[]"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      source: 1
      target: 2
    }

Function Pointers
-----------------

A function pointer is represented by a type node that uniquely identifies the
*signature* of a function, i.e. its return type and parameter types. The caveat
of this is that pointers to different functions which have the same signature
will resolve to the same type node. Additionally, there is no edge connecting a
function pointer type and the instructions which belong to this function.

github.com//issues/82
ChrisCummins added a commit that referenced this issue Jul 15, 2022
This adds a fourth node type, and a fourth edge flow, both called
"type". The idea is to represent types as first-class elements in the
graph representation. This allows greater compositionality by breaking
up composite types into subcomponents, and decreases the required
vocabulary size required to achieve a given coverage.

Background
----------

Currently, type information is stored in the "text" field of nodes for
constants and variables, e.g.:

    node {
      type: VARIABLE
      text: "i8"
    }

There are two issues with this:

 * Composite types end up with long textual representations,
   e.g. "struct foo { i32 a; i32 b; ... }". Since there is an
   unbounded number of possible structs, this prevents 100% vocabulary
   coverage on any IR with structs (or other composite types).

 * In the future, we will want to encode different information on data
   nodes, such as embedding literal values. Moving the type information
   out of the data node "frees up" space for something else.

Overview
--------

This changes the representation to represent types as first-class
elements in the graph. A "type" node represents a type using its
"text" field, and a new "type" edge connects this type to variables or
constants of that type, e.g. a variable "int x" could be represented as:

    node {
      type: VARIABLE
      text: "var"
    }
    node {
      type: TYPE
      text: "i32"
    }
    edge {
      flow: TYPE
      source: 1
    }

Composite types
---------------

Types may be composed by connecting multiple type nodes using type
edges. This allows you to break down complex types into a graph of
primitive parts. The meaning of composite types will depend on the
IR being targetted, the remainder describes the process for
LLVM-IR.

Pointer types
-------------

A pointer is a composite of two types:

    [variable] <- [pointer] <- [pointed-type]

For example:

    int32_t* instance;

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      text: TYPE
      target: 1
    }
    edge {
      text: TYPE
      source: 1
      target: 2
    }

Where variables/constants of this type receive an incoming type edge
from the [pointer] node, which in turn receives an incoming type edge
from the [pointed-type] node.

One [pointer] node is generated for each unique pointer type. If a
graph contains multiple pointer types, there will be multiple
[pointer] nodes, one for each pointed type.

Struct types
------------

A struct is a compsite type where each member is a node type which
points to the parent node. Variable/constant instances of a struct
receive an incoming type edge from the root struct node. Note that
the graph of type nodes representing a composite struct type may be
cyclical, since a struct can contain a pointer of the same type (think
of a binary tree implementation). For all other member types, a new
type node is produced. For example, a struct with two integer members
will produce two integer type nodes, they are not shared.

The type edges from member nodes to the parent struct are
positional. The position indicates the element number. E.g. for a
struct with three elements, the incoming type edges to the struct node
will have positions 0, 1, and 2.

This example struct:

    struct s {
      int8_t a;
      int8_t b;
      struct s* c;
    }

    struct s instance;

Would be represented as:

    node {
      type: TYPE
      text: "struct"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      target: 2
      position: 1
    }
    edge {
      flow: TYPE
      target: 3
      position: 2
    }
    edge {
      flow: TYPE
      source: 3
    }
    edge {
      flow: TYPE
      target: 4
    }

Array Types
-----------

An array is a composite type [variable] <- [array] <- [element-type].
For example, the array:

    int a[10];

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "[]"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      source: 1
      target: 2
    }

Function Pointers
-----------------

A function pointer is represented by a type node that uniquely identifies the
*signature* of a function, i.e. its return type and parameter types. The caveat
of this is that pointers to different functions which have the same signature
will resolve to the same type node. Additionally, there is no edge connecting a
function pointer type and the instructions which belong to this function.

github.com//issues/82
ChrisCummins added a commit that referenced this issue Jul 16, 2022
This adds a fourth node type, and a fourth edge flow, both called
"type". The idea is to represent types as first-class elements in the
graph representation. This allows greater compositionality by breaking
up composite types into subcomponents, and decreases the required
vocabulary size required to achieve a given coverage.

Background
----------

Currently, type information is stored in the "text" field of nodes for
constants and variables, e.g.:

    node {
      type: VARIABLE
      text: "i8"
    }

There are two issues with this:

 * Composite types end up with long textual representations,
   e.g. "struct foo { i32 a; i32 b; ... }". Since there is an
   unbounded number of possible structs, this prevents 100% vocabulary
   coverage on any IR with structs (or other composite types).

 * In the future, we will want to encode different information on data
   nodes, such as embedding literal values. Moving the type information
   out of the data node "frees up" space for something else.

Overview
--------

This changes the representation to represent types as first-class
elements in the graph. A "type" node represents a type using its
"text" field, and a new "type" edge connects this type to variables or
constants of that type, e.g. a variable "int x" could be represented as:

    node {
      type: VARIABLE
      text: "var"
    }
    node {
      type: TYPE
      text: "i32"
    }
    edge {
      flow: TYPE
      source: 1
    }

Composite types
---------------

Types may be composed by connecting multiple type nodes using type
edges. This allows you to break down complex types into a graph of
primitive parts. The meaning of composite types will depend on the
IR being targetted, the remainder describes the process for
LLVM-IR.

Pointer types
-------------

A pointer is a composite of two types:

    [variable] <- [pointer] <- [pointed-type]

For example:

    int32_t* instance;

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      text: TYPE
      target: 1
    }
    edge {
      text: TYPE
      source: 1
      target: 2
    }

Where variables/constants of this type receive an incoming type edge
from the [pointer] node, which in turn receives an incoming type edge
from the [pointed-type] node.

One [pointer] node is generated for each unique pointer type. If a
graph contains multiple pointer types, there will be multiple
[pointer] nodes, one for each pointed type.

Struct types
------------

A struct is a compsite type where each member is a node type which
points to the parent node. Variable/constant instances of a struct
receive an incoming type edge from the root struct node. Note that
the graph of type nodes representing a composite struct type may be
cyclical, since a struct can contain a pointer of the same type (think
of a binary tree implementation). For all other member types, a new
type node is produced. For example, a struct with two integer members
will produce two integer type nodes, they are not shared.

The type edges from member nodes to the parent struct are
positional. The position indicates the element number. E.g. for a
struct with three elements, the incoming type edges to the struct node
will have positions 0, 1, and 2.

This example struct:

    struct s {
      int8_t a;
      int8_t b;
      struct s* c;
    }

    struct s instance;

Would be represented as:

    node {
      type: TYPE
      text: "struct"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      target: 2
      position: 1
    }
    edge {
      flow: TYPE
      target: 3
      position: 2
    }
    edge {
      flow: TYPE
      source: 3
    }
    edge {
      flow: TYPE
      target: 4
    }

Array Types
-----------

An array is a composite type [variable] <- [array] <- [element-type].
For example, the array:

    int a[10];

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "[]"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      source: 1
      target: 2
    }

Function Pointers
-----------------

A function pointer is represented by a type node that uniquely identifies the
*signature* of a function, i.e. its return type and parameter types. The caveat
of this is that pointers to different functions which have the same signature
will resolve to the same type node. Additionally, there is no edge connecting a
function pointer type and the instructions which belong to this function.

github.com//issues/82
ChrisCummins added a commit that referenced this issue Jul 16, 2022
This adds a fourth node type, and a fourth edge flow, both called
"type". The idea is to represent types as first-class elements in the
graph representation. This allows greater compositionality by breaking
up composite types into subcomponents, and decreases the required
vocabulary size required to achieve a given coverage.

Background
----------

Currently, type information is stored in the "text" field of nodes for
constants and variables, e.g.:

    node {
      type: VARIABLE
      text: "i8"
    }

There are two issues with this:

 * Composite types end up with long textual representations,
   e.g. "struct foo { i32 a; i32 b; ... }". Since there is an
   unbounded number of possible structs, this prevents 100% vocabulary
   coverage on any IR with structs (or other composite types).

 * In the future, we will want to encode different information on data
   nodes, such as embedding literal values. Moving the type information
   out of the data node "frees up" space for something else.

Overview
--------

This changes the representation to represent types as first-class
elements in the graph. A "type" node represents a type using its
"text" field, and a new "type" edge connects this type to variables or
constants of that type, e.g. a variable "int x" could be represented as:

    node {
      type: VARIABLE
      text: "var"
    }
    node {
      type: TYPE
      text: "i32"
    }
    edge {
      flow: TYPE
      source: 1
    }

Composite types
---------------

Types may be composed by connecting multiple type nodes using type
edges. This allows you to break down complex types into a graph of
primitive parts. The meaning of composite types will depend on the
IR being targetted, the remainder describes the process for
LLVM-IR.

Pointer types
-------------

A pointer is a composite of two types:

    [variable] <- [pointer] <- [pointed-type]

For example:

    int32_t* instance;

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      text: TYPE
      target: 1
    }
    edge {
      text: TYPE
      source: 1
      target: 2
    }

Where variables/constants of this type receive an incoming type edge
from the [pointer] node, which in turn receives an incoming type edge
from the [pointed-type] node.

One [pointer] node is generated for each unique pointer type. If a
graph contains multiple pointer types, there will be multiple
[pointer] nodes, one for each pointed type.

Struct types
------------

A struct is a compsite type where each member is a node type which
points to the parent node. Variable/constant instances of a struct
receive an incoming type edge from the root struct node. Note that
the graph of type nodes representing a composite struct type may be
cyclical, since a struct can contain a pointer of the same type (think
of a binary tree implementation). For all other member types, a new
type node is produced. For example, a struct with two integer members
will produce two integer type nodes, they are not shared.

The type edges from member nodes to the parent struct are
positional. The position indicates the element number. E.g. for a
struct with three elements, the incoming type edges to the struct node
will have positions 0, 1, and 2.

This example struct:

    struct s {
      int8_t a;
      int8_t b;
      struct s* c;
    }

    struct s instance;

Would be represented as:

    node {
      type: TYPE
      text: "struct"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      target: 2
      position: 1
    }
    edge {
      flow: TYPE
      target: 3
      position: 2
    }
    edge {
      flow: TYPE
      source: 3
    }
    edge {
      flow: TYPE
      target: 4
    }

Array Types
-----------

An array is a composite type [variable] <- [array] <- [element-type].
For example, the array:

    int a[10];

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "[]"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      source: 1
      target: 2
    }

Function Pointers
-----------------

A function pointer is represented by a type node that uniquely identifies the
*signature* of a function, i.e. its return type and parameter types. The caveat
of this is that pointers to different functions which have the same signature
will resolve to the same type node. Additionally, there is no edge connecting a
function pointer type and the instructions which belong to this function.

github.com//issues/82
ChrisCummins added a commit that referenced this issue Jul 16, 2022
This adds a fourth node type, and a fourth edge flow, both called
"type". The idea is to represent types as first-class elements in the
graph representation. This allows greater compositionality by breaking
up composite types into subcomponents, and decreases the required
vocabulary size required to achieve a given coverage.

Background
----------

Currently, type information is stored in the "text" field of nodes for
constants and variables, e.g.:

    node {
      type: VARIABLE
      text: "i8"
    }

There are two issues with this:

 * Composite types end up with long textual representations,
   e.g. "struct foo { i32 a; i32 b; ... }". Since there is an
   unbounded number of possible structs, this prevents 100% vocabulary
   coverage on any IR with structs (or other composite types).

 * In the future, we will want to encode different information on data
   nodes, such as embedding literal values. Moving the type information
   out of the data node "frees up" space for something else.

Overview
--------

This changes the representation to represent types as first-class
elements in the graph. A "type" node represents a type using its
"text" field, and a new "type" edge connects this type to variables or
constants of that type, e.g. a variable "int x" could be represented as:

    node {
      type: VARIABLE
      text: "var"
    }
    node {
      type: TYPE
      text: "i32"
    }
    edge {
      flow: TYPE
      source: 1
    }

Composite types
---------------

Types may be composed by connecting multiple type nodes using type
edges. This allows you to break down complex types into a graph of
primitive parts. The meaning of composite types will depend on the
IR being targetted, the remainder describes the process for
LLVM-IR.

Pointer types
-------------

A pointer is a composite of two types:

    [variable] <- [pointer] <- [pointed-type]

For example:

    int32_t* instance;

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      text: TYPE
      target: 1
    }
    edge {
      text: TYPE
      source: 1
      target: 2
    }

Where variables/constants of this type receive an incoming type edge
from the [pointer] node, which in turn receives an incoming type edge
from the [pointed-type] node.

One [pointer] node is generated for each unique pointer type. If a
graph contains multiple pointer types, there will be multiple
[pointer] nodes, one for each pointed type.

Struct types
------------

A struct is a compsite type where each member is a node type which
points to the parent node. Variable/constant instances of a struct
receive an incoming type edge from the root struct node. Note that
the graph of type nodes representing a composite struct type may be
cyclical, since a struct can contain a pointer of the same type (think
of a binary tree implementation). For all other member types, a new
type node is produced. For example, a struct with two integer members
will produce two integer type nodes, they are not shared.

The type edges from member nodes to the parent struct are
positional. The position indicates the element number. E.g. for a
struct with three elements, the incoming type edges to the struct node
will have positions 0, 1, and 2.

This example struct:

    struct s {
      int8_t a;
      int8_t b;
      struct s* c;
    }

    struct s instance;

Would be represented as:

    node {
      type: TYPE
      text: "struct"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      target: 2
      position: 1
    }
    edge {
      flow: TYPE
      target: 3
      position: 2
    }
    edge {
      flow: TYPE
      source: 3
    }
    edge {
      flow: TYPE
      target: 4
    }

Array Types
-----------

An array is a composite type [variable] <- [array] <- [element-type].
For example, the array:

    int a[10];

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "[]"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      source: 1
      target: 2
    }

Function Pointers
-----------------

A function pointer is represented by a type node that uniquely identifies the
*signature* of a function, i.e. its return type and parameter types. The caveat
of this is that pointers to different functions which have the same signature
will resolve to the same type node. Additionally, there is no edge connecting a
function pointer type and the instructions which belong to this function.

github.com//issues/82
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datasets Graph and text datasets Enhancement New feature or request Machine Learning Anything relevant to //deeplearning/ml4pl/models
Projects
None yet
Development

No branches or pull requests

1 participant