Schema

Tuple representation

The SolRDF schema [1] is very simple and intuitive. The basic underlying idea is that each document represents a triple or a quad:

<field name="s" type="string" indexed="true" stored="true" required="true" />
<field name="p" type="string" indexed="true" stored="true" required="true" />
<field name="o" type="string" indexed="false" stored="true" required="true" />
<field name="c" type="string" indexed="true" stored="true" required="false" />

As you can see, the o(bject) is the only field that is not indexed. That because it is the "representation" of the stored object that has been copied verbatim. On top of that, at index time, the object member of the incoming tuple, depending on its type (i.e. URI, blank node, literal) and datatype (in case it is a literal), is also copied into one of these fields:

<!-- URI, string or untyped literals -->  
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<field name="o_s" type="string" indexed="true" stored="false"/>

<!-- boolean literals -->
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<field name="o_b" type="boolean" indexed="true" stored="false" />

<!-- Date / datetime literals --> 
<fieldType name="datetime" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
<field name="o_d" type="datetime" indexed="true" stored="false" />

<!-- numeric literals -->
<fieldType name="numeric" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>	
<field name="o_n" type="numeric" indexed="true" stored="false" />

Additionaly, only if the object is a string literal and it has a language tag, then that value (the language) is indexed in the following field:

<!-- (optional) language attribute for string literals -->
<field name="o_lang" type="string" indexed="true" stored="false" />

Identity

A tuple identity is a unique identifier that is assigned to each incoming tuple. It is computed at runtime by a SignatureUpdateProcessor [2] that has been put in the middle of the update chain:

<updateRequestProcessorChain name="dedupe"> 
   <processor class="solr.processor.SignatureUpdateProcessorFactory"> 
      <bool name="enabled">true</bool> 
      <str name="signatureField">id</str> 
      <bool name="overwriteDupes">false</bool> 
      <str name="fields">s,p,o,c</str>
      <str name="signatureClass">solr.processor.Lookup3Signature</str> 
   </processor> 
   <processor class="solr.LogUpdateProcessorFactory" /> 
   <processor class="solr.RunUpdateProcessorFactory" /> 
</updateRequestProcessorChain>

The identity field, which is called, not surprisingly, "id", is computed from all compounding members of the tuple (s,p,o,c). That makes sure that two tuples representing the same assertion will get the same identifier and therefore won't result in a duplicate statement.

[1] https://github.com/agazzarini/SolRDF/wiki/Schema
[2] https://wiki.apache.org/solr/Deduplication

1. Introduction
2. User Guide
  2.1 Get me up and running
  2.2 Add Data
  2.3 RDF Mode
    2.3.1 SPARQL 1.1 Protocol
     2.3.1.1 Query
     2.3.1.2 Update
    2.3.3 Graph Store Protocol
  2.4 Hybrid mode
    2.4.1 Querying
    2.4.2 Faceted search
       2.4.2.1 Fields
       2.4.2.2 Objects queries
       2.4.2.3 Objects ranges queries
  2.5 Deployments
    2.6.1 Standalone
    2.6.2 SolrCloud
  2.6 Message Catalog
3. Developer Guide
  3.1 Development Environment
  3.2 (Java) Client API
  3.3 Solr Configuration
    3.3.1 schema.xml
    3.3.2 solrconfig.xml
  3.4 Components
    3.4.1 Stream Loader
    3.4.2 Query Parser
    3.4.3 Search Component
    3.4.4 Facet Component
    3.4.5 Response Writer
4. Continuous Integration
5. Roadmap
6. Mailing lists

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema

Tuple representation

Identity

Clone this wiki locally