Skip to content

solrconfig.xml

Andrea Gazzarini edited this page May 9, 2015 · 10 revisions

The SolRDF solrconfig.xml [1] contains a lot of extensions on top of the built-in Solr components and modules. Although most of them are described in this guide, in a dedicated section, this page offers an overview about how they are arranged in the main Solr configuration file and what their role is.

For all things / attributes / parameters / sections not covered here and generally related with Solr configuration you can refer to the Solr Wiki [2].
Being a regula Solr configuration file, you can change, tune and adjust each aspect of Solr behaviour, in terms of search and indexing. However, keep in mind that something (explicitly described in this guide) cannot be changed, otherwise the "RDF" perspective will stop working.

HTTPServletRequest

When Solr receives an incoming request, it creates a kind of context, which is actually a Map<Object,Object> that carries on several attributes collected through the execution chain.
By default the HTTP request representation (i.e. the javax.servlet.http.HttpServletRequest instance) is not inserted in such context. SolRDF requires that for several reasons (e.g. HTTP Headers): that's the reason you'll find the following fragment in solrconfig.xml:

<requestParsers 
   enableRemoteStreaming="true" 
   multipartUploadLimitInKB="2048000" 
   formdataUploadLimitInKB="2048" 
   addHttpRequestToContext="true"/>

The addHttpRequestToContext in the snippet above, which is absent (i.e. false) in a default Solr installation, causes the HttpServletRequest to be inserted in the Solr (Request) Context. Later, it can be retrieved using the "httpRequest" key:

// request is an instance of SolrQueryRequest
Map<Object,Object> ctx = request.getContext();
HttpServletRequest httpRequest = (HttpServletRequest) ctx.get("httpRequest");
String accept = httpRequest.getHeader(HttpHeaders.ACCEPT);

Standard SearchHandler

This is a solr.SearchHandler that enables standard (e.g. lucene, dismax, edismax) queries on Solr. It isn't actually used in SolRDF components but I found it useful for debugging purposes (see issue #25 [3]).

<requestHandler name="/solr-query" class="solr.SearchHandler">
   <lst name="defaults">
      <str name="echoParams">none</str>
      <int name="rows">10</int>
      <str name="wt">xml</str>
   </lst>
</requestHandler>

SPARQL 1.1 Query SearchHandler

The /sparql-query is a solr.SearchHandler that accepts and executes SPARQL 1.1 queries [4]. A couple of things about this handler:

  • Note that the most part of parameters are defined as invariants: that means SolRDF needs those settings. Specifically:
    • the sort parameter is needed for internally using a deep paging iterator over search results [5]
    • the defType parameter must be "sparql" as it indicates the SparqlQParser (described later)
    • the "hybrid" value of wt parameter instantiates a specific QueryResponseWriter that writes pure or hybrid SPARQL results
  • There's a SearchComponent (under the "components" array) that is also called "sparql-query". This has nothing to do with this search handler, as it is the "Query" SearchComponent which is responsible to materially execute the SPARQL Query. Maybe one day I will call this component in a different way ;)
  • The facet component operates only in hybrid mode (SPARQL + Solr features). This is not the default FacetComponent but instead a custom subclass
<requestHandler name="/sparql-query" class="solr.SearchHandler">
	<lst name="invariants">
		<str name="echoParams">none</str>
		<str name="defType">sparql</str>
		<str name="sort">id asc</str>
		<str name="wt">hybrid</str>
	</lst>
        <lst name="defaults">
	   <!--  
             Default query value for Hybrid mode.
           -->
           <str name="dfhq">SELECT * WHERE { ?s ?p ?o }</str>
        </lst>		
	<arr name="components">
		<str>sparql-query</str>
		<str>facet</str>
	</arr>		
</requestHandler>

As you can see a new parameter called dfhq has been introduced for declaring a default query when SolRDF is running in hybrid mode. Although the standard parameter "q" (or "query") can be used, keep in mind that it will act as a default query for both modes (RDF and Hybrid), and depending on the context, that couldn't be what you want: for instance, imagine a default "q" value of

SELECT * WHERE {?s ?p ?o }

There won't be any problem at all with the Hybrid mode, as the query results are paginated. Instead, with plain RDF mode that will end in a huge result-set. While this could be an expected result, remember that this "select all" query will be executed also if a client simply forgets the query in the HTTP request.

SPARQL 1.1 Update

The /sparql-update is an UpdateRequestHandler subclass that accepts and executes SPARQL 1.1 update commands [6].
Note the update chain has been customized with a dedupe UpdateRequestProcessor, as explained in the Schema [7] section.

<requestHandler name="/sparql-update" class="org...RdfUpdateRequestHandler">
  <lst name="defaults">
     <str name="update.chain">dedupe</str>
  </lst>
</requestHandler> 

SPARQL 1.1 Endpoint SearchHandler

The /sparql is a custom SearchHandler that acts as a SPARQL 1.1 endpoint. It acts as a single endpoint facade, accepts SPARQL 1.1 requests and dispatches them among the two handlers described above (/sparql-query and /sparql-update).

<requestHandler name="/sparql" class="org...Sparql11SearchHandler">
   <lst name="invariants">
      <str name="s">/sparql-query</str>
      <str name="u">/sparql-update</str>
   </lst>
</requestHandler>

SPARQL SearchComponent

A subclass of SearchComponent for executing SPARQL 1.1 query (and update) requests. No special configuration needs to be done on this component, as it acts as a bridge / adapter between Jena and Solr worlds.

<searchComponent name="sparql-query" class="org...SparqlSearchComponent"/>

Facet SearchComponent

In Hybrid mode, SolRDF mixes plain SPARQL results with Solr features, including pagination and faceted search. The standard FacetComponent has been subclassed in order to provide special features described in the Faceted Search section [8].

<searchComponent name="facet" class="org...RDFacetComponent"/>

Query parser

The sparql-query is a custom QParserPlugin: a factory that creates a query parser which is able to understand SPARQL 1.1. No special configuration needs to be done on this component, as it acts as a bridge / adapter between Jena and Solr worlds.

<queryParser name="sparql" class="org...SparqlQParserPlugin"/>

Query response writer

This is probably the most complex component in SolRDF. It manages the output of a given query execution, both in:

  • RDF mode, where it outputs standard SPARQL results [e.g. 9, 10]
  • Hybrid mode, where Solr features like faceting and pagination are mixed with SPARQL results [11]

The configuration allows to associate content-types with specific SPARQL query type (SELECT, CONSTRUCT, ASK).

<queryResponseWriter name="hybrid" class="org.gazzax.labs.solrdf.response.HybridResponseWriter">
	<lst name="content-types">
		<!-- SELECT -->
		<str name="111">
                   application/sparql-results+xml,
                   application/sparql-results+json,
                   text/csv,
                   text/plain,
                   text/tab-separated-values
                </str>
		<!-- CONSTRUCT -->
		<str name="222">
                   application/rdf+xml,
                   application/n-triples,
                   text/turtles
                </str>
		<!-- DESCRIBE -->
		<str name="333">
                   application/rdf+xml,
                   application/n-triples, 
                   text/turtles</str>
		<!-- ASK -->
		<str name="444">
                   text/csv,
                   text/plain,
                   text/tab-separated-values,
                   application/sparql-results+xml,
                   application/sparql-results+json
                </str>
	</lst>		
</queryResponseWriter>

[1] solconfig.xml
[2] https://wiki.apache.org/solr/SolrConfigXml
[3] https://github.com/agazzarini/SolRDF/issues/25
[4] http://www.w3.org/TR/sparql11-query
[5] http://yonik.com/solr/paging-and-deep-paging
[6] http://www.w3.org/TR/sparql11-update
[7] https://github.com/agazzarini/SolRDF/wiki/Schema#identity
[8] https://github.com/agazzarini/SolRDF/wiki/Faceted%20search
[9] http://www.w3.org/TR/rdf-sparql-XMLres
[10] http://www.w3.org/TR/sparql11-results-json
[11] https://github.com/agazzarini/SolRDF/wiki/User%20Guide#hybrid-mode

Clone this wiki locally