-
Notifications
You must be signed in to change notification settings - Fork 29
Lexonomy development plan (December 2016)
Michal Boleslav Měchura, [email protected]
December 2016
This is an archived document which is no longer current.
For a more up-to-date version see Lexonomy development plan (November 2017).
Lexonomy's mission is to be a tool for writing and publishing dictionaries (and other dictionary-like datasets) where users find the right balance between power (= empowering users to do what they need to do) and ease of use (= not having a steep learning curve).
The current version of Lexonomy, which went public in May 2016, was meant as an experimental prototype. The experiment has been a success: it appears that Lexonomy does have the right balance between power and ease of use, and that people do want a tool like this. However, to develop Lexonomy further and in the long term, I will need to rewrite it from scratch.
I am planning to rewrite Lexonomy and present it at the next eLex conference in September 2017. This rewritten version of Lexonomy will have all the features the current version has plus some (= probably not all) of the new and/or improved features described in the first three sections of this document (Entry editing, Dictionary configuration, Publishing). The remaining sections (Housekeeping stuff, Sketch Engine integration) have a more long-term focus on how the Lexonomy project will unfold in the future.
While editing an entry, Lexonomy will have an undo button, like you'd find in a typical word processor.
Internally, Lexonomy keeps track of who saved what and when, and has a complete history for each entry. Based on this, the next version of Lexonomy will offer features for viewing an entry's history and for restoring previous versions.
It will be possible to set up a number of entry templates in a dictionary. When the user starts a new entry, they will have the option of basing it on a template, as well as starting from a completely blank entry as they do now.
Motivation: Templates are often used in dictionary writing systems to encourage structural consistency across multiple entries that belong together, such as colour terms or country names.
Lexonomy will not use any form of entry locking (= blocking other users from opening/saving an entry if someone else currently has it opened). Instead, Lexonomy will go for a light-weight approach: it will inform the user if it looks like the entry he or she is editing is currently opened by another user or if the entry has been saved by another user in the mean time.
It will be possible to navigate around the XML editor using cursor arrow keys and other keyboard shortcuts, as an altenative to pointing and clicking with the mouse.
Note: This will actually be a feature of Xonomy, the XML editor used inside Lexonomy.
Users will be able to switch the XML editor between 'nerd' mode in which XML mark-up is visible as it would be in source code, and 'laic' mode which hides the XML syntax and displays a more user-friendly layout. A dictionary administrator will be able to set the initial editing mode for each user.
Note: This will make use of a feature which is already available in Xonomy, the XML editor used inside Lexonomy. In the current version of Lexonomy, the 'nerd' mode is what you see when editing an entry and the 'laic' mode is that you see when editing a dictionary's configuration.
Users will be able to upload images and other media files (sound files, videos) into a dictionary and link to them from XML elements and attributes in entries. When formatting entries for display Lexonomy will make sure the media files are presented appropriately (images are displayed, sound and videos are playable).
Users will be able to include external links (= internet URLs with an optional caption) in entries. Lexonomy will make sure they are clickable when the entry is formatted for display.
Users will be able to include cross-references from one entry to another entry or to a location in another entry (such as specific sense inside a dictionary entry). Lexonomy will make sure the cross-references are clickable when the entry is formatted for display. Also, Lexonomy will keep track of what cross-references what, will make sure that cross-references are never broken, and (if the dictionary is so configured) will make sure that cross-references are reciprocated: if X cross-references Y, then Y must also cross-reference X.
A dictionary administrator will be able to configure a dictionary such that some parts of an entry will be 'shareable' between several entries. This will ensure that, for example, phraseological subentries are able to appear under more than one headword.
Note: This will implement a suggestion from my Raslan paper.
The next version of Lexonomy will redefine slightly what it means for a user to have read-only access to a dictionary. Currently, read-only users can see the formatted entries but cannot see their XML structure. In the future, read-only user will be able to see the XML structure but will not be allowed to change it.
It will be possible for a dictionary administrator to limit a user's access to only a subset of entries, or to only some locations in an entry. For example, some user may only be allowed to add translations to entries but cannot change anything else.
When creating a new dictionary, Lexonomy gives you a choice of several templates (as well as a completely blank one). All these templates are very simple, which makes them suitable for learning Lexonomy but not for real work. I will develop a few more realistic templates suitable for industrial-strength applications, replicating (subsets of) schemas from well-known standards such as TEI and LMF.
In the current version of Lexonomy it is possible to have more than one type of entry in a single dictionary. It seems that nobody needs this feature and it only confuses people, so we're going to get rid of it. In the future, there will only be one type of entry (= one doctype) in each dictionary.
A dictionary administrator will be able to set up a mapping between two dictionaries. Based on this mapping, Lexonomy will guide users to make sure the dictionaries are synchronized.
Note: This will implement a suggestion from my Raslan paper.
The schema editor that Lexonomy currently has in its Configuration interface is too simple. It can be used to create simple schemas for learning and exploring, but it is not expressive enough for real-world applications. Its main limitation is that you cannot use the same element name as a child under more than one type of parent.
I will build and develop a more sophisticated schema editor which will overcome this and other limitations, but which will at the same time remain intuitively easy to use. The main idea is and will be that you can create an entry schema in Lexonomy there without having to learn the syntax of DTDs or any other schema formalism. The goal is for Lexonomy's schema editor to be expressive enough for what you'd find in the DTD of a typical dictionary.
Dictionary administrators who do not want to use Lexonomy's schema editor will be able to upload a schema they have written in some other formalism, such as a Xonomy document specification or a DTD.
Note: To be able to upload DTDs or other kinds of schemas, it will first be necessary to create mappings to Xonomy document specifications (as Xonomy is the XML editor used in Lexonomy). This is a non-trivial task which would make a nice student project!
The current version of Lexonomy does not allow you to use XML attributes in entries. This is an unnecessary limitation which will be removed in the next version.
The entry formatting features Lexonomy currently gives you are very primitive: you can make selections from a few colours and font sizes but not much else. This is hugely inadequate for real-world applications. Also, entry formatting is mixed with schema editing, which is illogical and probably confusing.
The next version of Lexonomy will bring a more sophisticated stylesheet editor. The goal is for it to be about as expressive as a simple XSL/CSS stylesheet, but no hand-coding or knowledge of XSL/CSS syntax will be required.
Dictionary administrators who do not want to use Lexonomy's stylesheet editor will be able to upload their own XSL/CSS stylesheet.
In the current version, Lexonomy simply assumes that the first text node in an entry is its headword and title, and this is what it uses in the list on the left-hand side. The next version will give the dictionary administrator more flexibility in deciding what constitutes the entry's title.
In the current version, Lexonomy only allows you to search for entries based on their title (= headword). The next version will bring more sophistication into this. A dictionary administrator will be able to specify which elements and attributes are to be indexed for search purposes and how their values are to be interpreted (as strings, numbers, dates...). Based in this, Lexonomy will enable users to filter the list of entries based on arbitrary combinations of criteria.
This mechanism will allow Lexonomy, among other things, to replicate some features of sophisticated dictionary writing systems such as: keeping track of entry status ('new', 'in progress', 'finished' etc.) and filtering based on that status; grouping entries in batches and working on a single batch at a time; assigning individual entries to individual users; etc.
In the current version of Lexonomy, when you decide to publish your dictionary, you have no option but to publish all entries in the dictionary. In the next version you will optionally be able to select only a subset of entries for publishing, for example only entries that have a specific status.
When you publish a dictionary, the current version of Lexonomy doesn't have a proper search engine: it only has an alphabetical list of headwords and a textbox for filtering that list based on the first few characters.
The next version will have a more sophisticated search facility which will include lemmatized search, spelling error detection, matching on multi-word items, and matching on other entry elements than just headwords. Dictionary administrators will be able to configure these when publishing the dictionary.
When publishing a dictionary in Lexonomy, the current version allows you to supply a 'blurb': a short description which appears on the dictionary's homepage. The blurb is in plain text. The next version of Lexonomy will allow rich-text formatting in the blurb: bold, italics, paragraph breaks, clickable links, etc.
In addition to a blurb, the next version of Lexonomy will allow you to add one or more publicly visible 'about' pages to your published dictionary.
You will be able to upload your own logo and/or a background graphic when publishing a dictionary.
Dictionaries published in Lexonomy will automatically have machine-readable goodies such as an OpenSearch plugin and a sitemap, and will have good machine-readable metadata for SEO (Search Engine Optimization).
The current version of Lexonomy does not allow you to change the dictionary's URL once you've created it. The next version will allow that.
It will be possible to translate Lexonomy's user interface into other languages beside English.
The server-side bits of Lexonomy will be written in Node.js. For backend storage we'll use SQLite. The client-side bits will of course be written in the usual client-side web technologies (HTML + CSS + JavaScript).
Lexonomy's "home" installation at www.lexonomy.eu is where anyone and everyone will, as now, continue to be able to open an account and start writing and publishing dictionaries. This will continue to be a free service, with no payment required.
When using the home installation there will be some (light-weight) restrictions: you will only be allowed to publish a dictionary under an open-source license and there will be some volume restrictions: how big your dictionary can be, how much traffic you can have each month, and so on. These restrictions will be generous enough to allow for real work: their motivation will be to protect the server from excessive load, not to recruit paying customers.
Note: these restrictions will not apply on your own local installation of Lexonomy, see the next section.
The home installation will be hosted and maintained by me and/or Lexical Computing.
Lexonomy will be developed as a open-source project. I will decide later which license exactly Lexonomy will have, but it will be one which allows unrestricted re-use for any purpose, including commercial purposes. I will decide later which code repository Lexonomy's source code will live in but it'll probably be GitHub.
It will be possible for anyone to download Lexonomy and install it on their own servers (both Linux and Windows). People will be able to download and install Lexonomy on their own (documentation will be available), or they will have the option of hiring someone who knows Lexonomy to set it up for them: myself, Lexical Computing, some combination of both, or someone else entirely.
Lexonomy will have a special relationship with the Sketch Engine corpus query system. Together, Sketch Engine and Lexonomy will provide support to lexicographers along the entire pipeline of producing a dictionary, from corpus to screen.
There are two ways of connecting corpus query tools to dictionary writing tools: (1) the push model in which the user of a corpus query tool, while extracting data from a corpus, "pushes" automatically generated entries into a dictionary writing tool for editing later, and (2) the pull model in which the user of a dictionary writing tool, while editing a dictionary, "pulls" data from a corpus, for example to populate an entry with automatically extracted example sentences. Both models will be implemented in Lexonomy and Sketch Engine.
Lexonomy will provide an API through which Sketch Engine users will be able to "push" automatically pre-generated dictionary entries into a dictionary in Lexonomy, either individually for a single entry (using Sketch Engine's tickbox lexicography method) or en masse for a list of headwords. Dictionary administrators will be able to configure this so that the entry structure in Lexonomy matches one of the export templates in Sketch Engine.
From the opposite perspective, Lexonomy will be able to talk to Sketch Engine's API to "pull" fragments of entries into existing entries, such as example sentences, collocates, thesaurus items and translations. This will be done by talking to Sketch Engine's API. Dictionary administrators will be able to configure which entry elements this applies to.