Elastic indexer #105

AvinashBukkittu · 2019-12-30T04:13:00Z

This PR

Adds evaluator support for the pipeline. This means, we can now add an evaluator to the pipeline and call evaluate() on the pipeline to evaluate on a dataset.
Adds ElasticIndexer along with ElasticSearchIndexProcessor processor to index the documents.
Adds ElasticSearchProcessor for searching documents in an elastic indexer
Creates Passage Reranker example for MS Marco Dataset. Provides a baseline model for ranking using just Elastic Indexer
- Adds an EvalReader to read MS Marco eval dataset
- Adds MS Marco Eval script in the example

…eaders

Conflicts: setup.py

Conflicts: docs/requirements.txt

codecov · 2019-12-30T04:24:54Z

Codecov Report

Merging #105 into master will increase coverage by 0.6%.
The diff coverage is 76.67%.

@@            Coverage Diff            @@
##           master     #105     +/-   ##
=========================================
+ Coverage   61.18%   61.78%   +0.6%     
=========================================
  Files          94      100      +6     
  Lines        6425     6684    +259     
=========================================
+ Hits         3931     4130    +199     
- Misses       2494     2554     +60

Impacted Files	Coverage Δ
forte/data/readers/__init__.py	`100% <100%> (ø)`	⬆️
forte/indexers/tests/indexers_test.py	`100% <100%> (ø)`	⬆️
forte/processors/base/__init__.py	`100% <100%> (ø)`	⬆️
forte/processors/base/query_processor.py	`89.47% <100%> (+1.23%)`	⬆️
forte/data/readers/tests/conllu_ud_reader_test.py	`97.77% <100%> (ø)`	⬆️
forte/common/evaluation.py	`81.25% <100%> (+1.25%)`	⬆️
forte/processors/__init__.py	`100% <100%> (ø)`	⬆️
forte/processors/bert_based_query_creator.py	`84.9% <100%> (+0.29%)`	⬆️
forte/data/readers/tests/corpus_reader_test.py	`100% <100%> (ø)`
forte/processors/elastic_search_processor.py	`42.85% <42.85%> (ø)`
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 994ae19...9255a96. Read the comment docs.

hunterhector

I think in general the PR is OK. There are a few simple comments here, plus the comments in #103. Maybe we can plan to merge them today after these are fixed.

forte/processors/elastic_search_query_creator.py

forte/processors/elastic_search_processor.py

forte/processors/base/index_processor.py

examples/passage_reranker/ms_marco_evaluator.py

examples/passage_reranker/reader.py

forte/indexers/indexers.py

forte/processors/elastic_search_processor.py

AvinashBukkittu · 2019-12-31T04:27:12Z

Importing relevant comments from #103

Adding indexer+reranker inference pipeline; passage reranking bert model #103 (comment)

Let's have more typing here.

Added typing in MS Marco Evaluator

Adding indexer+reranker inference pipeline; passage reranking bert model #103 (comment)

It would be better to store only some necessary information from the pack, here we only need the doc_id?

Simplified the logic of MS Marco Evaluator in b7632b8

Adding indexer+reranker inference pipeline; passage reranking bert model #103 (comment)

passage is too specific as a name. how about rank_list?

Changed to results from passages

Adding indexer+reranker inference pipeline; passage reranking bert model #103 (comment)

Add some docstring here to teach users to extend this method in order to create more complex queries.

Done in b7632b8

Adding indexer+reranker inference pipeline; passage reranking bert model #103 (comment)

Do we need to benchmark the speed of the indexer? Hopefully, our wrapper won't decrease the speed a lot.

Added a benchmark testcase in b7632b8

Adding indexer+reranker inference pipeline; passage reranking bert model #103 (comment)

The design of the _process_query seem to be a little difficult, especially on returning input_pack.

I agree. Essentially, the following three lines

query = Query(pack=query_pack)
query.set_value(value=query_value)
query_pack.add_entry(query)

are common in QueryProcessor. If we want to abstract away these details in _process method, I couldn't think of a better way other than to return the query_pack and the query_value from _process_query

Adding indexer+reranker inference pipeline; passage reranking bert model #103 (comment)

at least in bulk mode, we should add a couple more documents.

Increased the limit to 10,000

hunterhector

Comments addressed in b7632b8

forte/processors/elastic_search_processor.py

Conflicts: .travis.yml setup.py

AvinashBukkittu and others added 19 commits November 26, 2019 04:33

Add ElasticSearchIndexer along with test cases

aa56b7b

Add current directory to PYTHONPATH.

747166d

Add a simple python script to index documents

57f10b2

Merge branch 'chatbot' into reranker-demo

fadd827

Add Elasticsearch query creator and searcher

495600a

Add elasticsearch dependency in docs

edc11b5

Adding simple corpus reader

59a334a

Merge branch 'master' into ms-readers

5a3f5e8

Fixing mypy error

bf0acfc

Merge branch 'master' of https://github.com/asyml/forte into ms-readers

f37dab6

Merge branch 'ms-readers' of https://github.com/asyml/forte into ms-r…

646f7fe

…eaders

Merge branch 'master' into elastic-indexer

e983048

Conflicts: setup.py

Merge branch 'reranker-demo' into elastic-indexer

d9a58b1

Merge branch 'ms-readers' into elastic-indexer

524ae68

Conflicts: setup.py

Add ElasticSearchIndexProcessor for indexing documents

4bdc386

Set self.documents to empty after bulk addition

8ee5324

Adding Evaluator and an EvalReader

8b0331e

Fix CI

edd6bbb

Merge branch 'master' into elastic-indexer

49a1716

Conflicts: docs/requirements.txt

hunterhector requested changes Dec 30, 2019

View reviewed changes

AvinashBukkittu added 3 commits December 30, 2019 16:43

Add missing copyright headers

2fc3e5f

Add mypy annotations and docstrings

4776d40

Add Elasticsearch test case. Add typing info

b7632b8

hunterhector previously approved these changes Jan 3, 2020

View reviewed changes

forte/processors/elastic_search_processor.py Show resolved Hide resolved

forte/processors/elastic_search_processor.py Show resolved Hide resolved

Merge branch 'master' into elastic-indexer

9255a96

Conflicts: .travis.yml setup.py

AvinashBukkittu dismissed hunterhector’s stale review via 9255a96 January 5, 2020 00:57

hunterhector approved these changes Jan 6, 2020

View reviewed changes

hunterhector merged commit c3f5e01 into master Jan 6, 2020

mgupta1410 deleted the elastic-indexer branch February 28, 2020 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elastic indexer #105

Elastic indexer #105

AvinashBukkittu commented Dec 30, 2019

codecov bot commented Dec 30, 2019 •

edited

Loading

hunterhector left a comment

AvinashBukkittu commented Dec 31, 2019

hunterhector left a comment

Elastic indexer #105

Elastic indexer #105

Conversation

AvinashBukkittu commented Dec 30, 2019

codecov bot commented Dec 30, 2019 • edited Loading

Codecov Report

hunterhector left a comment

Choose a reason for hiding this comment

AvinashBukkittu commented Dec 31, 2019

hunterhector left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 30, 2019 •

edited

Loading