-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' into development
- Loading branch information
Showing
60 changed files
with
4,211 additions
and
1,919 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,12 @@ | ||
[submodule "ext/lmdb"] | ||
path = ext/lmdb | ||
url = https://github.com/LMDB/lmdb.git | ||
branch = stable | ||
[submodule "ext/tpl"] | ||
path = ext/tpl | ||
url = https://github.com/troydhanson/tpl.git | ||
branch = stable | ||
[submodule "ext/spookyhash"] | ||
path = ext/spookyhash | ||
url = https://github.com/centaurean/spookyhash.git | ||
[submodule "ext/collections-c"] | ||
path = ext/collections-c | ||
url = https://github.com/srdja/Collections-C.git |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
Data Structure Internals | ||
======================== | ||
|
||
**(Draft)** | ||
|
||
Lakesuperior has its own methods for handling in-memory graphs. These methods | ||
rely on C data structures and are therefore much faster than Python/RDFLib | ||
objects. | ||
|
||
The graph data model modules are in :py:module:`lakesuperior.model.graph`. | ||
|
||
The Graph Data Model | ||
-------------------- | ||
|
||
Triples are stored in a C hash set. Each triple is represented by a pointer to | ||
a ``BufferTriple`` structure stored in a temporary memory pool. This pool is | ||
tied to the life cycle of the ``SimpleGraph`` object it belongs to. | ||
|
||
A triple structure contains three pointers to ``Buffer`` structures, which | ||
contain a serialized version of a RDF term. These structures are stored in the | ||
``SimpleGraph`` memory pool as well. | ||
|
||
Each ``SimpleGraph`` object has a ``_terms`` property and a ``_triples`` | ||
property. These are C hash sets holding addresses of unique terms and | ||
triples inserted in the graph. If the same term is entered more than once, | ||
in any position in any triple, the first one entered is used and is pointed to | ||
by the triple. This makes the graph data structure very compact. | ||
|
||
In summary, the pointers can be represented this way:: | ||
|
||
<serialized term data in mem pool (x3)> | ||
^ ^ ^ | ||
| | | | ||
<Term structures in mem pool (x3)> | ||
^ ^ ^ | ||
| | | | ||
<Term struct addresses in _terms set (x3)> | ||
^ ^ ^ | ||
| | | | ||
<Triple structure in mem pool> | ||
^ | ||
| | ||
<address of triple in _triples set> | ||
|
||
Let's say we insert the following triples in a ``SimpleGraph``:: | ||
|
||
<urn:s:0> <urn:p:0> <urn:o:0> | ||
<urn:s:0> <urn:p:1> <urn:o:1> | ||
<urn:s:0> <urn:p:1> <urn:o:2> | ||
<urn:s:0> <urn:p:0> <urn:o:0> | ||
|
||
The memory pool contains the following byte arrays of raw data, displayed in | ||
the following list with their relative addresses (simplified to 8-bit | ||
addresses and fixed-length byte strings for readability):: | ||
|
||
0x00 <urn:s:0> | ||
0x09 <urn:p:0> | ||
0x12 <urn:o:0> | ||
|
||
0x1b <urn:s:0> | ||
0x24 <urn:p:1> | ||
0x2d <urn:o:1> | ||
|
||
0x36 <urn:s:0> | ||
0x3f <urn:p:1> | ||
0x48 <urn:o:2> | ||
|
||
0x51 <urn:s:0> | ||
0x5a <urn:p:0> | ||
0x63 <urn:o:0> | ||
|
||
However, the ``_terms`` set contains only ``Buffer`` structures pointing to | ||
unique addresses:: | ||
|
||
0x00 | ||
0x09 | ||
0x12 | ||
0x24 | ||
0x2d | ||
0x48 | ||
|
||
The other terms are just unutilized. They will be deallocated en masse when | ||
the ``SimpleGraph`` object is garbage collected. | ||
|
||
The ``_triples`` set would then contain 3 unique entries pointing to the unique | ||
term addresses:: | ||
|
||
0x00 0x09 0x12 | ||
0x00 0x24 0x2d | ||
0x00 0x24 0x48 | ||
|
||
(the actual addresses would actually belong to the structures pointing to the | ||
raw data, but this is just an illustrative example). | ||
|
||
The advantage of this approach is that the memory pool is contiguous and | ||
append-only (until it gets purged), so it's cheap to just add to it, while the | ||
sets that must maintain uniqueness and are the ones that most operations | ||
(lookup, adding, removing, slicing, copying, etc.) are done on, contain much | ||
less data and are therefore faster. |
Submodule collections-c
added at
719fd8
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.