Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Multi Stemmer to multilang setups #131

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 24 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Other than standard Grav requirements, this plugin does have some extra requirem
* **PHP pdo_sqlite** Driver
* **PHP pdo_mysql** Driver (only required because library references some MySQL constants, MySQL db is not used)

| PHP by default comes with **PDO** and the vast majority of linux-based systems already come with SQLite.
| PHP by default comes with **PDO** and the vast majority of linux-based systems already come with SQLite.

### Installation of SQLite on Mac systems

Expand All @@ -47,7 +47,7 @@ $ brew install sqlite

### Installation of SQLite on Windows systems

Download the appropriate version of SQLite from the [SQLite Downloads Page](https://www.sqlite.org/download.html).
Download the appropriate version of SQLite from the [SQLite Downloads Page](https://www.sqlite.org/download.html).

Extract the downloaded ZIP file and run the `sqlite3.exe` executable.

Expand Down Expand Up @@ -87,7 +87,7 @@ filter:
items:
- [email protected]
powered_by: true
search_object_type: Grav
search_object_type: Grav
```

The configuration options are as follows:
Expand All @@ -110,12 +110,16 @@ The configuration options are as follows:
* `no` - no stemmer
* `arabic` - Arabic language
* `croatian` - Croatian language
* `french` - French language
* `german` - German language
* `italian` - Italian language
* `polish` - Polish language
* `porter` - Porter stemmer for English language
* `portuguese` - Portuguese language
* `russian` - Russian language
* `ukrainian` - Ukrainian language
* an array of language: Stemmer (see Multi-Language Support)

* `display_route` - display the route in the search results
* `display_hits` - display the number of hits in the search results
* `display_time` - display the execution time in the search results
Expand All @@ -135,7 +139,7 @@ TNTSearch relies on your content being indexed into the SQLite index database be

### Indexing

The first step after installation of the plugin, is to index your content. There are several ways you can accomplish this.
The first step after installation of the plugin, is to index your content. There are several ways you can accomplish this.

#### CLI Indexing

Expand Down Expand Up @@ -167,7 +171,7 @@ This indicates a successful indexing of your content.

#### Admin Plugin Indexing

If you are using the admin plugin you can index your content directly from the plugin. TNTSearch adds a new **quick-tray** icon that lets you create a new index or re-index all your content quickly and conveniently with a single click.
If you are using the admin plugin you can index your content directly from the plugin. TNTSearch adds a new **quick-tray** icon that lets you create a new index or re-index all your content quickly and conveniently with a single click.

![](assets/tntsearch-quicktray.png)

Expand All @@ -188,12 +192,23 @@ tntsearch:

#### Multi-Language Support

With the new 3.0 version of TNTSearch, support has been added for multiple languages (Grav 1.6 required). Internally, this means that rather that store the index as `user:://data/tntsearch/grav.index`, multiple indexes are created per language configured in Grav. For example if you have set the supported languages to `['en', 'fr', 'de']`, then when you perform an index, you will get three files: `en.index`, `fr.index`, and `de.index`. When querying the appropriate **active language** determines which index is queried. For example, performing the search on a page called `/fr/search` will result in the `fr.index` database to be used, and French results to be returned.
With the new 3.0 version of TNTSearch, support has been added for multiple languages (Grav 1.6 required). Internally, this means that rather that store the index as `user:://data/tntsearch/grav.index`, multiple indexes are created per language configured in Grav. For example if you have set the supported languages to `['en', 'fr', 'de']`, then when you perform an index, you will get three files: `en.index`, `fr.index`, and `de.index`. When querying the appropriate **active language** determines which index is queried. For example, performing the search on a page called `/fr/search` will result in the `fr.index` database to be used, and French results to be returned.

Not from the admin front end, only in the yaml, You can set a disctinct Stemmer for each language :
```
stemmer:
de: german
fr: french
en: porter
dk: no
```

Note Indexing will take longer depending on the number of languages you support as TNTSearch has to index each page in each language.

> NOTE: While accented characters is supported in this release, there is currently no support in the underlying TNTSearch library to match non-accented characters to accented ones, so exact matches are required.



#### Scheduler Support

One of the great new features of Grav 1.6 is the built in **Scheduler** that allows plugin-provided functionality to be run periodically. TNTSearch is a great use-case for this capability as it allows an indexing job to be scheduled to be run every few hours without the need to manually keep things in sync. There are a few options that allow you to configure this capability.
Expand Down Expand Up @@ -245,9 +260,9 @@ For example, say we have a homepage that is built from a few modular sub-pages w
{% endfor %}

{{ page.content|raw }}
```
```

As you can see this simply ensures the module pages as defined in the page's collection are displayed, then the actual page content is displayed.
As you can see this simply ensures the module pages as defined in the page's collection are displayed, then the actual page content is displayed.

To instruct TNTSearch to index with this template rather than just using the Page content by itself, you just need to add an entry in the `home.md` frontmatter:

Expand Down Expand Up @@ -341,7 +356,7 @@ public function onTNTSearchQuery(Event $e)
$query = $e['query'];
$options = $e['options'];
$fields = $e['fields'];

$fields->results[] = $page->route();
$e->stopPropagation();
}
Expand Down
3 changes: 2 additions & 1 deletion blueprints.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ form:
0: Disabled
validate:
type: bool

stemmer:
type: select
size: small
Expand All @@ -186,6 +186,7 @@ form:
no: Disabled
arabic: Arabic
croatian: Croatian
french: French
porter: English
german: German
italian: Italian
Expand Down
24 changes: 18 additions & 6 deletions classes/GravTNTSearch.php
Original file line number Diff line number Diff line change
Expand Up @@ -230,13 +230,25 @@ public function createIndex()
$this->tnt->setDatabaseHandle(new GravConnector);
$indexer = $this->tnt->createIndex($this->index);

// Disable stemmer for users with older configuration.
if ($this->options['stemmer'] == 'default') {
$indexer->setLanguage('no');
} else {
$indexer->setLanguage($this->options['stemmer']);
// Disable stemmer for users with older configuration.
$tmpStemmer = 'no';
if(is_array($this->options['stemmer']))
{
// New config allowing a Stemmer per lang for multilang site
$tmpStemmer = $this->options['stemmer'][$this->language];
if($tmpStemmer == 'default'){
// Allow user to use old config style just in case
$tmpStemmer = 'no';
}
}

else if ($this->options['stemmer'] !== 'default') {
$tmpStemmer = $this->options['stemmer'];
}
$stemmer = $tmpStemmer;
$indexer->setLanguage($stemmer);
// Print stemmer on the cli
echo "\nStemmer : $stemmer";
echo "\n\n";
$indexer->run();
}

Expand Down