A synonym.txt for Solr Instances. Solr is a great search engine but it is even better with a bit of training. One of the most used ways to train Solr is to add a synonyms.txt file. Building a synonyms.txt file for a particular corpus of language is not an easy exercise. This repository is an attempt to build a synonyms.txt file for a legal corpus so that Solr can be used to search a corpus of documents of a legal nature.
The results of this effort rather than being strictly and traditionally versioned are contained in different synonyms.txt files.
The general idea of this repository is that any synonyms.txt file which is in the base directory is considered ready while synonyms.txt files which are kept in subdirectories are not yet ready to be used. Once a file is ready it shall be moved to this directory and denoted with the appropriate base from which it was derived.
synonyms.txt-freeLawProject
is drawn from the CourtListener repository and is licensed GPL according to the terms of the CourtListener License file.- Status: Ready
synonyms.txt-gaoOasis
is drawn from the 1987 Legal Synonyms Document published by the Government Accountability Office. Synonyms from that document have been OCR'ed, cleaned, and processed into a compatible synonyms.txt file.- Status: Ready
synonyms.txt-uscode-diceX.X
is built from the USCode with a Dice coefficient of X.X (a few have been outputed). Synonyms have been built using the script in the Build From Corpus directory of this repository.- Status: Semi-Ready
Save the synonyms.txt-...
file which is most suitable to your project to the correct location which has been referenced in your Solr config file. When saving the file, the name should be changed to synonyms.txt
or else you must update your Solr config files to reference the correct filename.
- Fork
- Hack
- Pull Request
All works within any directories labeled Research or the like are licensed and copyrighted to their authors and are hosted here only as relevant background material which has informed the work herein undertaken.
See LICENSE file for all other terms and conditions with respect to the remainder of the work in this repository.