Skip to content
Gregor Leban edited this page Mar 20, 2019 · 9 revisions

Event Registry imports news articles with an extensive set of meta-data and extracts additional information, such as events and stories. All this information is available to be returned as the result of the various web requests. We will start from the basic to more complex ones.

Article

An article is a news article that we collect and import into the Event Registry. Articles are collected using RSS feeds and contain properties such as title, body, date, time, source (news publisher), image, a list of concepts (see below), a list of categories (see below), a list of extracted dates (dates mentioned in the article) and others. Available information that Event Registry can provide for an article is described in the Article data model page.

Concept

A concept is an annotation that can be assigned to an article, story or event. Concepts can represent entities (people, locations, organizations) or non-entities/keywords (things such as table, personal computer, toy, robot, ...). All concepts have a unique ID called URI. In Event Registry, we use Wikipedia's URLs as concept URIs. The concept URI for Barack Obama is for example https://en.wikipedia.org/wiki/Barack_Obama. Each concept can have various properties, such as labels in different languages, synonyms, image and description. Concepts are very important since they allow you to search for things based on the thing itself and not using a keyword. For example, if you want to find all articles that mention White House, you can't use the keyword "White House" to find articles in Spanish, since there it is written as "Casa Blanca". If you use the concept URI for White House (https://en.wikipedia.org/wiki/White_House) you can, however, retrieve mentions of White House in all languages.

As mentioned, concepts are assigned to articles, stories, and events. A concept is associated with the article if it is mentioned in it. There is also a score assigned to the mentioned concept which is in the range between 1 (the concept is not very relevant for the article) and 5 (the concept is very relevant for the article). Similarly, a concept can be associated with a story or event. Stories and events are assigned concepts that are common in the containing articles. The score assigned to the concepts, in this case, represents the importance of the concept for the story/event and can be in the range 0 - 100.

Available information that can be provided about a concept is described in the Concept data model page.

In order to get concept's URI for a given label, you can use the utility method EventRegistry.getConceptUri(). In case the method returns None you have to perform the search using the label as a keyword.

Category

Event Registry uses a taxonomy of categories from DMOZ. The DMOZ taxonomy has over 1 million categories organized into several levels. In Event Registry, we use only the top 3 levels of taxonomy which amounts to about 50.000 categories. Each category has a unique identifier (URI) that can be used when querying. Available information that can be provided about a category is described in the Category data model page.

Categories are assigned to articles, stories and events using machine learning models that are trained for each category separately. Categories don't represent a particular mention in the articles/stories/events, but instead represent what topic the content is about. Categories are currently assigned only to content in the English language.

Story

Articles in Event Registry are grouped into clusters based on their similarity. A story is therefore a cluster of articles that should be about the same event. Each story only contains articles in a single language. There are various properties that are extracted for each story: language, event URI (event to which the story is assigned to), title, summary, date, concepts, categories and others.

Event

An event is a collection of one or more stories that all report about the same world event. Stories in the event can be in different languages, but it is not a requirement - two or more stories in the same language can also be assigned to an event. Each event can provide various properties, such as title and summary (in all available languages), date, location, list of stories, article count, a list of concepts, a list of categories, a list of frequently mentioned dates and others. Available information that can be provided about an event is described in the Event data model page.


There are also several objects that can be requested or are included in the results, which are less critical to be aware of, but are likely to be relevant to you at some point. These include sources, locations, concept classes, and topic pages.

Source

Each article has a news source. A news source can have properties such as URI (hostname), title, description, location (geographical location) and importance. Available information that can be provided about a news source is described in the News source data model page.

Location

A location in Event Registry stands for a geographical location and can be associated with stories, events, news sources (location of the publisher) and certain concepts (those that represent locations). Information about geographical locations are obtained from GeoNames. Details that can be returned about a geographical location include GeoNames id, wiki URL (URL of the corresponding Wikipedia page), population, geolocation, area, continent and others.

Concept class

A concept class is a group of concepts that are semantically related. Event Registry has, for example, a concept class "Movie Actors" that contains the list of all known movie actors. Concept classes can be used in search queries.