Object Graph Relation Encoding Network

(OGRE-net) - version 2

Consider a domestic robot being asked to pick up ``the cup nearest to the plate''. Natural language is an intuitive way for humans to interact with robots. However, enabling robots to comprehend natural language spatial instructions such as this is challenging because objects in the scene must be identified by a combination of their type, cup, and their relation to other objects, 'nearest to the plate'.

Specifically, we define a task involving an image of a table setting and a phrase that contains a spatial relationship, such as ``the cup nearest to the plate''. The system should respond with a reference to the selected object in a object model built from the image.

This problem is a complex multi-modal task. It requires a Neural Network model to ground relational language jointly in text and image data. There are no cues such as texture and colour to help the model detect a spatial relationship like 'nearest'. Furthermore, in contrast to the target labels of object detection datasets, spatial relationships are abstract concepts that do not readily fit the paradigm of supervised training targets.

A challenge we address the in the model is that an image may contain an arbitrary number of objects. The model must therefore scale dynamically with no fixed upper bound on the number of objects.

The model is trained on scenes with a small number of items, however it is capable of performing inference on more complex scenes than those encountered during training. This is a key benefit in learning relationship features. They enable a complex structure, such as a scene of a table setting, to be processed as a composite of related components. Rather than select a fixed number of features to represent a scene model, the graph representation treats the scene as an arbitrary sized collection of binary object relationships.

Setting up OGRE-net

Python dependencies

pip install -r requirements.txt

InferSent

See https://github.com/phil-hawkins/InferSent for InferSent setup instructions and pretrained weights.

Generating synthetic training data

python datasetgen/kitchen/generate.py

Training OGER-net

python train.py

Running evaluations

python int_eval.py

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.vscode		.vscode
datasetgen/kitchen		datasetgen/kitchen
images		images
nlp		nlp
ogrenetv2		ogrenetv2
.gitignore		.gitignore
.gitmodules		.gitmodules
Int_eval.py		Int_eval.py
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Object Graph Relation Encoding Network

(OGRE-net) - version 2

Setting up OGRE-net

Python dependencies

InferSent

Generating synthetic training data

Training OGER-net

Running evaluations

About

Releases

Packages

Languages

License

phil-hawkins/OGREnet-v2

Folders and files

Latest commit

History

Repository files navigation

Object Graph Relation Encoding Network

(OGRE-net) - version 2

Setting up OGRE-net

Python dependencies

InferSent

Generating synthetic training data

Training OGER-net

Running evaluations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages