-
Notifications
You must be signed in to change notification settings - Fork 7
NFDI4Chem TS #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
NFDI4Chem TS #48
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
# The NFDI4Chem Terminology Service | ||
|
||
## About NFDI4Chem | ||
|
||
[NFDI4Chem](https://www.nfdi4chem.de/) is the Chemistry Consortium in the National Research Data Infrastructure for Germany [NFDI](https://www.nfdi.de/?lang=en) | ||
(German: **N**ationale **F**orschungs**D**aten**I**nfrastruktur), a project launched in 2018 and funded by the German | ||
government to build a national research data infrastructure in Germany for a wide range of scientific disciplines. The | ||
vision of NFDI4Chem is to digitize all key steps in chemical research to support scientists in their efforts to collect, | ||
store, process, analyze, publish, and reuse research data. Actions to promote open science and research data management | ||
(RDM) in line with the [FAIR data principles](https://www.go-fair.org/fair-principles/) are fundamental objectives of NFDI4Chem to provide the chemistry | ||
community with a holistic approach to research data access. To this end, the overall goal is to develop and maintain | ||
innovative and user-friendly services and novel scientific approaches based on the reuse of research data. NFDI4Chem | ||
aims to represent all disciplines of chemistry in academia and therefore works closely with thematically related | ||
NFDI consortia. | ||
|
||
## NFDI4Chem Terminology Service | ||
|
||
Research data is not just a collection of numbers or images in a scientific journal article, experimental section, or | ||
supplementary information. To fully comprehend the deduction of results and enable the exploration of new data-driven, | ||
interdisciplinary research questions, access to the raw data and knowledge of how it was generated, processed, and | ||
analyzed is essential. Research data needs to be FAIR (**F**indable, **A**ccessible, **I**nteroperable, **R**eusable) | ||
for both humans and machines. Achieving machine-usable FAIR research data requires extensive metadata annotation, which | ||
describes the data and its context. To semantically describe research data, ontologies, taxonomies, terminologies, or | ||
vocabularies can be used, playing an important role in creating semantically rich, discipline-specific metadata. In | ||
addition, consensual definitions of entities are formed, ensuring conceptual alignment across domains, even if the | ||
nomenclature of the individual domains is different. Please visit the | ||
[NFDI4Chem Knowledge Base article about Ontologies](https://knowledgebase.nfdi4chem.de/knowledge_base/docs/ontology/?_highlight=terminol#sources-and-further-information) to learn more about this topic. | ||
|
||
Using terminologies for data annotation presents a significant challenge as it requires selecting the appropriate terms | ||
from the available terminologies. To make an informed decision, one must either possess prior knowledge of the suitable | ||
terminologies and terms for a specific scientific context and use case or acquire this knowledge by browsing the | ||
available terminologies. A terminology service is a tool for browsing terminologies that aims to make them | ||
understandable to humans. Therefore, it should provide detailed information about the terminologies and the terms | ||
they contain. This is particularly important when comparing multiple terminologies that cover the same or overlapping | ||
areas of knowledge. Terminology users must be able to recognize the semantic differences and similarities, as well | ||
as the interdependencies between terminologies, to make informed decisions. Although there are many sophisticated | ||
terminology services available, such as the [Basic Register of Thesauri, Ontologies & Classifications (BARTOC)](https://bartoc.org/) | ||
or the [Ontology Lookup Service (OLS)](https://www.ebi.ac.uk/ols4), they contain a large selection of different terminologies and ontologies | ||
that can overwhelm domain experts. | ||
|
||
The [NFDI4Chem Terminology Service (TS)](https://terminology.nfdi4chem.de/ts/) only indexes terminologies that are most relevant to chemistry using | ||
various parameters such as subject area relevance and license to facilitate domain experts' selection. Additionally, | ||
the NFDI4Chem TS provides functions that go beyond simple search and provision of ontologies, enabling collaborative | ||
curation and management. To minimize the risk of losing track, users can list or write GitHub issues directly from | ||
the TS. If an ontology is not maintained on GitHub or if additional contextual information is desired, users can | ||
write notes at the ontology or term level to make open issues, recommendations, or insights more visible to | ||
themselves and others. | ||
|
||
Using the example of "[mass concentration](https://terminology.nfdi4chem.de/ts/search?and=false&sorting=title&page=1&size=10&q=mass+concentration)", | ||
a search in the NFDI4Chem TS yields almost 6,000 results. However, selecting the "Exact Match" option reduces the | ||
number of results to six (see [Figure 1](images/nfdi4chem_ts_fig1.png)). | ||
|
||
 | ||
|
||
Of the six results, the option "Unit of Mass Concentration" is unsuitable as it only describes the unit. Therefore, | ||
only five options remain, all of which are in principle suitable for semantically describing the term | ||
"mass concentration". However, the term ["mass concentration" defined in the Chemical Methods Ontology (CHMO)](https://terminology.nfdi4chem.de/ts/ontologies/chmo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FCHMO_0002821&obsoletes=false) | ||
appears to be the most appropriate, as it is already used by another ontology, as indicated by the "Also in" field. | ||
This example briefly illustrates how the NFDI4Chem TS assists chemists in identifying and selecting the most | ||
appropriate terminology for their specific use case. | ||
|
||
In addition to the previously described use case of using the NFDI4Chem TS through the Graphical User Interface (GUI), | ||
the TS can also be utilized by other services, such as [Electronic Lab Notebooks (ELNs)](https://knowledgebase.nfdi4chem.de/knowledge_base/docs/eln/), through the provided | ||
[Application Programming Interface (API)](https://terminology.nfdi4chem.de/ts/api). In addition to the search functionality, the Terminology Service API | ||
provides other functions for interacting with indexed ontologies. For example, the autocomplete or direct select | ||
function can be used to populate forms or dropdown menus. The interactive [Swagger documentation](https://service.tib.eu/ts4tib/swagger-ui.html#/) documents all | ||
available API functions and allows for live testing. | ||
|
||
To implement the search example above in Python using the NFDI4Chem TS API, follow these steps: | ||
|
||
``` | ||
import requests | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @stuchalk to review code There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Smund27 I have just added a Python file with an update of the code in this contribution. The code would not execute without an update. The following updates have been included:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. replaced the code in the contribution with the code provided in the Python file by Stuart |
||
import urllib | ||
|
||
class ChemSearch: | ||
def __init__(self, query, exactMatch, page, size): | ||
self.query = urllib.parse.quote_plus(query) | ||
self.exactMatch = exactMatch | ||
self.rows = size | ||
self.start = page | ||
self.collection = "NFDI4CHEM" | ||
self.base_api_url = "https://service.tib.eu/ts4tib/api/ | ||
search?groupField=iri&&schema=collection" | ||
search_result = [] | ||
search_facet = [] | ||
|
||
|
||
def run(self): | ||
params = "&q={}&exact={}&start={}&rows={}&classification={}".format( | ||
self.query, | ||
self.exactMatch, | ||
self.start, | ||
self.rows, | ||
self.collection) | ||
searchUrl = self.base_api_url + params | ||
result = requests.get(searchUrl) | ||
if result.status_code == 200: | ||
result = result.json() | ||
self.search_result = result['response'] | ||
self.search_facet = result['facet_counts']['facet_fields'] | ||
return True | ||
|
||
return False | ||
|
||
|
||
def get_number_of_found_result(self): | ||
print(self.search_result['numFound']) | ||
|
||
return True | ||
|
||
|
||
def get_results(self): | ||
for res in self.search_result['docs']: | ||
print("Iri: " + res['iri']) | ||
print("short_form: " + res['short_form']) | ||
print("ontology_name: " + res['ontology_name']) | ||
print("Type: " + res['type']) | ||
|
||
return True | ||
|
||
|
||
def get_facet(self): | ||
for facet_name, facet_values in self.search_facet.items(): | ||
print("Facet Field Name: " + facet_name) | ||
for i in range(len(facet_values)): | ||
if i % 2 == 0: | ||
print("facet item: " + facet_values[i]) | ||
else: | ||
print("count: " + str(facet_values[i])) | ||
|
||
return True | ||
|
||
# running the above code | ||
search = ChemSearch("mass concentration", True, 0, 40) | ||
search.run() | ||
search.get_number_of_found_result() | ||
search.get_results() | ||
search.get_facet() | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
""" Code to create a NFDI4Chem Terminology Service search class based on the service API """ | ||
# The code below has been formatted using the best practices for coding in Python | ||
# found at https://peps.python.org/pep-0008/ "PEP 8 – Style Guide for Python Code" | ||
|
||
# import the Python packages needed to run this code | ||
import requests | ||
import urllib.parse | ||
|
||
|
||
# | ||
class ChemSearch: | ||
def __init__(self, query, exactMatch, page, size): | ||
""" Initialize (configure) parameters for the API search """ | ||
self.query = urllib.parse.quote_plus(query) | ||
self.exactMatch = exactMatch | ||
self.rows = size | ||
self.start = page | ||
self.collection = "NFDI4CHEM" | ||
self.base_api_url = "https://service.tib.eu/ts4tib/api/search?groupField = iri & & schema = collection" | ||
self.search_result = [] | ||
self.search_facet = [] | ||
|
||
def run(self): | ||
""" Run the search on the terminology server and load results into the 'search_result' attribute """ | ||
params = "&q={}&exact={}&start={}&rows={}&classification={}".format( | ||
self.query, | ||
self.exactMatch, | ||
self.start, | ||
self.rows, | ||
self.collection) | ||
|
||
searchUrl = self.base_api_url + params | ||
result = requests.get(searchUrl) | ||
if result.status_code == 200: | ||
result = result.json() | ||
self.search_result = result['response'] | ||
self.search_facet = result['facet_counts']['facet_fields'] | ||
return True | ||
return False | ||
|
||
def get_number_of_found_result(self): | ||
""" Display the number results are found """ | ||
print(self.search_result['numFound']) | ||
return True | ||
|
||
def get_results(self): | ||
""" Display the search result data """ | ||
for res in self.search_result['docs']: | ||
print("Iri: " + res['iri']) | ||
print("short_form: " + res['short_form']) | ||
print("ontology_name: " + res['ontology_name']) | ||
print("Type: " + res['type']) | ||
return True | ||
|
||
""" Its not clear what this code does """ | ||
# def get_facet(self): | ||
# for facet_name, facet_values in self.search_facet.items(): | ||
# print("Facet Field Name: " + facet_name) | ||
# for i in range(len(facet_values)): | ||
# if i % 2 == 0: | ||
# print("facet item: " + facet_values[i]) | ||
# else: | ||
# print("count: " + str(facet_values[i])) | ||
# return True | ||
|
||
|
||
# This code uses the Class developed above to run a search for the term "mass concentration" | ||
search = ChemSearch("mass concentration", True, 0, 40) | ||
search.run() | ||
search.get_number_of_found_result() | ||
search.get_results() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Overwhelm domain experts' seems like quite a sweeping statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Sentence reformulated as follows:
"Despite the existence of sophisticated terminology services such as the Basic Register of Thesauri, Ontologies & Classifications (BARTOC) or the Ontology Lookup Service (OLS), the vast array of terminologies and ontologies they encompass makes it challenging for domain experts to identify the most relevant and suitable ones for their specific use cases."
Hope this fits better now.