This repository has been archived by the owner on May 27, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 171
Feature/build custom analyzer #328
Open
jpgilaberte
wants to merge
40
commits into
branch-3.0.13
Choose a base branch
from
feature/build_custom_analyzer
base: branch-3.0.13
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
e42179e
Added tokenizers
64aee42
Add lowercase, edgeNGram and thai tokenizers
jpgilaberte 1eec1d5
Reformat code
jpgilaberte 9966fe0
Add tokenizers in builder module
jpgilaberte 491987b
Scala refactor in tokenizer feature
jpgilaberte 33b3011
Add license
jpgilaberte e999d1f
Add license in scala files
jpgilaberte 567c0df
Add license in test files
jpgilaberte 563c76c
Add license in custom analyzer
jpgilaberte 049d826
Refactor tokenizers
jpgilaberte 048a1df
Add charFilters
jpgilaberte 1645cf0
Add tokenFilter
jpgilaberte 5ec11fd
Add builder objects
jpgilaberte 759c248
Add plugin Test
jpgilaberte c98e3fc
Add testAt CustomAnalizer
jpgilaberte 3461943
Add JavaDoc in builder
jpgilaberte 598f314
Add ScalaDoc in plugin
jpgilaberte a1d5f0f
Add TokenFilter documentation
jpgilaberte 3532cab
Fix RST format
jpgilaberte af386ef
Fix RST format
jpgilaberte c63b553
Fix RST format
jpgilaberte b16b9fc
Fix package name format
jpgilaberte a4cd8f3
Fix package name format
jpgilaberte 3822c65
Fix mandatory column size
jpgilaberte 5ed61a8
Fix mandatory column size
jpgilaberte cfa3a20
Add more TokenFilters
jpgilaberte b45479e
Add new TokenFilter documentation
jpgilaberte f80c83f
Fix rst format
jpgilaberte 1c55d09
Fix rst format
jpgilaberte 29d2fa4
Fix persian charfilter
jpgilaberte cf88e27
Fix documentation
jpgilaberte a2f7085
Add char filter test
jpgilaberte 956de0a
Add token filter test
jpgilaberte 4551f00
Add token filters in builder
jpgilaberte fc10a93
Add field in HtmlStripCharFilter
jpgilaberte e2dc196
Refactor test package
jpgilaberte da3a8dd
Fix documentation
jpgilaberte d01f7ab
Add token filter test
jpgilaberte b7a5770
Refactor CustomAnalyzerIT
jpgilaberte 270f62b
Add CustomAnalyzerIT removed
jpgilaberte File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
61 changes: 61 additions & 0 deletions
61
.../main/java/com/stratio/cassandra/lucene/builder/index/schema/analysis/CustomAnalyzer.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
/* | ||
* Copyright (C) 2014 Stratio (http://stratio.com) | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package com.stratio.cassandra.lucene.builder.index.schema.analysis; | ||
|
||
import com.fasterxml.jackson.annotation.JsonCreator; | ||
import com.fasterxml.jackson.annotation.JsonProperty; | ||
import com.stratio.cassandra.lucene.builder.index.schema.analysis.charFilter.CharFilter; | ||
import com.stratio.cassandra.lucene.builder.index.schema.analysis.tokenFilter.TokenFilter; | ||
import com.stratio.cassandra.lucene.builder.index.schema.analysis.tokenizer.Tokenizer; | ||
|
||
|
||
/** | ||
* {@link Analyzer} using a Lucene's {@code Analyzer}s in classpath. | ||
* | ||
* It's uses the {@code Analyzer}'s default (no args) constructor. | ||
* | ||
* @author Juan Pedro Gilaberte {@literal <[email protected]>} | ||
*/ | ||
public class CustomAnalyzer extends Analyzer{ | ||
|
||
/** The {@code TokenFilter} array. */ | ||
@JsonProperty("token_filter") | ||
private final TokenFilter[] tokenFilter; | ||
|
||
/** The {@code CharFilter} array. */ | ||
@JsonProperty("char_filter") | ||
private final CharFilter[] charFilter; | ||
|
||
/** The {@code Tokenizer} instance. */ | ||
@JsonProperty("tokenizer") | ||
private final Tokenizer tokenizer; | ||
|
||
/** | ||
* Builds a new {@link CustomAnalyzer} using custom tokenizer, char_filters and token_filters. | ||
* | ||
* @param tokenizer an {@link Tokenizer} the tokenizer to use. | ||
* @param charFilter an {@link CharFilter[]} the charFilter array to use. | ||
* @param tokenFilter an {@link TokenFilter[]} the tokenFilter array to use. | ||
*/ | ||
@JsonCreator | ||
public CustomAnalyzer(@JsonProperty("tokenizer") Tokenizer tokenizer, @JsonProperty("char_filter") CharFilter[] charFilter, | ||
@JsonProperty("token_filter") TokenFilter[] tokenFilter) | ||
{ | ||
this.tokenizer = tokenizer; | ||
this.charFilter = charFilter; | ||
this.tokenFilter = tokenFilter; | ||
} | ||
} |
32 changes: 32 additions & 0 deletions
32
...ava/com/stratio/cassandra/lucene/builder/index/schema/analysis/charFilter/CharFilter.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
/* | ||
* Copyright (C) 2014 Stratio (http://stratio.com) | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package com.stratio.cassandra.lucene.builder.index.schema.analysis.charFilter; | ||
|
||
import com.fasterxml.jackson.annotation.JsonSubTypes; | ||
import com.fasterxml.jackson.annotation.JsonTypeInfo; | ||
import com.stratio.cassandra.lucene.builder.JSONBuilder; | ||
|
||
/** | ||
* Created by jpgilaberte on 25/05/17. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
*/ | ||
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, include = JsonTypeInfo.As.PROPERTY, property = "type") | ||
@JsonSubTypes({@JsonSubTypes.Type(value = MappingCharFilter.class, name = "mapping"), | ||
@JsonSubTypes.Type(value = HtmlStripCharFilter.class, name = "htmlstrip"), | ||
@JsonSubTypes.Type(value = PatternCharFilter.class, name = "pattern"), | ||
@JsonSubTypes.Type(value = PersianCharFilter.class, name = "persian")}) | ||
public class CharFilter extends JSONBuilder{ | ||
|
||
} |
45 changes: 45 additions & 0 deletions
45
...tratio/cassandra/lucene/builder/index/schema/analysis/charFilter/HtmlStripCharFilter.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
/* | ||
* Copyright (C) 2014 Stratio (http://stratio.com) | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package com.stratio.cassandra.lucene.builder.index.schema.analysis.charFilter; | ||
|
||
import com.fasterxml.jackson.annotation.JsonCreator; | ||
|
||
import java.util.ArrayList; | ||
|
||
/** | ||
* Created by jpgilaberte on 30/05/17. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pease put author label like this |
||
*/ | ||
public class HtmlStripCharFilter extends CharFilter{ | ||
|
||
@JsonCreator | ||
public HtmlStripCharFilter(){} | ||
|
||
@JsonCreator | ||
public HtmlStripCharFilter(ArrayList<String> escapedtags) { | ||
this.escapedtags = escapedtags; | ||
} | ||
|
||
private ArrayList<String> escapedtags; | ||
|
||
public ArrayList<String> getEscapedtags() { | ||
return escapedtags; | ||
} | ||
|
||
public HtmlStripCharFilter setEscapedtags(ArrayList<String> escapedtags) { | ||
this.escapedtags = escapedtags; | ||
return this; | ||
} | ||
} |
33 changes: 33 additions & 0 deletions
33
.../stratio/cassandra/lucene/builder/index/schema/analysis/charFilter/MappingCharFilter.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
/* | ||
* Copyright (C) 2014 Stratio (http://stratio.com) | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package com.stratio.cassandra.lucene.builder.index.schema.analysis.charFilter; | ||
|
||
import com.fasterxml.jackson.annotation.JsonCreator; | ||
import com.fasterxml.jackson.annotation.JsonProperty; | ||
|
||
/** | ||
* Created by jpgilaberte on 25/05/17. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
*/ | ||
public class MappingCharFilter extends CharFilter{ | ||
|
||
@JsonProperty("mapping") | ||
private final String mapping; | ||
|
||
@JsonCreator | ||
public MappingCharFilter( @JsonProperty("mapping") String mapping){ | ||
this.mapping = mapping; | ||
} | ||
} |
37 changes: 37 additions & 0 deletions
37
.../stratio/cassandra/lucene/builder/index/schema/analysis/charFilter/PatternCharFilter.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
/* | ||
* Copyright (C) 2014 Stratio (http://stratio.com) | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package com.stratio.cassandra.lucene.builder.index.schema.analysis.charFilter; | ||
|
||
import com.fasterxml.jackson.annotation.JsonCreator; | ||
import com.fasterxml.jackson.annotation.JsonProperty; | ||
|
||
/** | ||
* Created by jpgilaberte on 30/05/17. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. == |
||
*/ | ||
public class PatternCharFilter extends CharFilter{ | ||
|
||
@JsonProperty("pattern") | ||
final String pattern; | ||
|
||
@JsonProperty("replacement") | ||
final String replacement; | ||
|
||
@JsonCreator | ||
public PatternCharFilter(@JsonProperty("pattern") String pattern, @JsonProperty("replacement") String replacement){ | ||
this.pattern = pattern; | ||
this.replacement = replacement; | ||
} | ||
} |
26 changes: 26 additions & 0 deletions
26
.../stratio/cassandra/lucene/builder/index/schema/analysis/charFilter/PersianCharFilter.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
/* | ||
* Copyright (C) 2014 Stratio (http://stratio.com) | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package com.stratio.cassandra.lucene.builder.index.schema.analysis.charFilter; | ||
|
||
import com.fasterxml.jackson.annotation.JsonCreator; | ||
|
||
/** | ||
* Created by jpgilaberte on 30/05/17. | ||
*/ | ||
public class PersianCharFilter extends CharFilter{ | ||
@JsonCreator | ||
public PersianCharFilter(){} | ||
} |
28 changes: 28 additions & 0 deletions
28
...tio/cassandra/lucene/builder/index/schema/analysis/tokenFilter/ApostropheTokenFilter.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
/* | ||
* Copyright (C) 2014 Stratio (http://stratio.com) | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package com.stratio.cassandra.lucene.builder.index.schema.analysis.tokenFilter; | ||
|
||
import com.fasterxml.jackson.annotation.JsonCreator; | ||
|
||
/** | ||
* Created by jpgilaberte on 25/05/17. | ||
*/ | ||
public class ApostropheTokenFilter extends TokenFilter{ | ||
|
||
@JsonCreator | ||
public ApostropheTokenFilter(){} | ||
} | ||
|
28 changes: 28 additions & 0 deletions
28
...ndra/lucene/builder/index/schema/analysis/tokenFilter/ArabicnormalizationTokenFilter.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
/* | ||
* Copyright (C) 2014 Stratio (http://stratio.com) | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package com.stratio.cassandra.lucene.builder.index.schema.analysis.tokenFilter; | ||
|
||
import com.fasterxml.jackson.annotation.JsonCreator; | ||
|
||
/** | ||
* Created by jpgilaberte on 25/05/17. | ||
*/ | ||
public class ArabicnormalizationTokenFilter extends TokenFilter{ | ||
|
||
@JsonCreator | ||
public ArabicnormalizationTokenFilter(){} | ||
} | ||
|
28 changes: 28 additions & 0 deletions
28
...tio/cassandra/lucene/builder/index/schema/analysis/tokenFilter/ArabicstemTokenFilter.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
/* | ||
* Copyright (C) 2014 Stratio (http://stratio.com) | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package com.stratio.cassandra.lucene.builder.index.schema.analysis.tokenFilter; | ||
|
||
import com.fasterxml.jackson.annotation.JsonCreator; | ||
|
||
/** | ||
* Created by jpgilaberte on 25/05/17. | ||
*/ | ||
public class ArabicstemTokenFilter extends TokenFilter{ | ||
|
||
@JsonCreator | ||
public ArabicstemTokenFilter(){} | ||
} | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just copy the comments before here