diff --git a/README.md b/README.md index 8dbbdd8..dce0200 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Quartz -Quartz is a container app for visualizing corpus data from Sketch Engine servers. It's a portable alternate interface that focuses on graphing quantitative data for linguistic analysis. Set up access to your corpora, make API queries to a Sketch Engine or NoSketch Engine server, and view results with interactive graphs. +Quartz is a container app template for visualizing corpus data from Sketch Engine servers. It's a portable alternate interface that focuses on graphing quantitative data for linguistic analysis. Set up access to your corpora, make API queries to a Sketch Engine or NoSketch Engine server, and view results with interactive graphs. Some default graphing features are included, but the repo is designed for adaptation to specific projects. Quartz is made with Python, the Dash framework and Docker. To use it you'll need API access to a Sketch Engine or NoSketch Engine server. @@ -22,36 +22,32 @@ Related software: ## Getting started -### Simply put - 1. Clone the repo -2. Set up environment variables in `.env` (use `.env-example` to get started) +2. Set up environment variables in `.env` (copy and rename `.env-example` to get started) 3. Set up configuration files in `config/` and make a `data/` directory for storing data -4. Option 1: run Quartz directly Flask app for testing or local usage without Docker +4. Option 1: run Quartz directly as a Flask app (for testing or local usage without Docker, e.g., `set -a && source .env && set +a && python app.py`) 5. Option 2: build and use the Docker image `docker-compose up` -6. Visit the app at `http://127.0.0.1:8080/` and make corpus queries - -### Also consider +6. Visit the app at `http://127.0.0.1:8080/` and make a query to make sure it works -To make queries to the Sketch Engine server, get an [API key](https://www.sketchengine.eu/documentation/api-documentation/). +To make queries to the Sketch Engine server, get an [API key](https://www.sketchengine.eu/documentation/api-documentation/) and review their [fair use policy](https://www.sketchengine.eu/fair-use-policy/). -To work with your own server, check out NoSketch Engine. +To work with your own server, check out NoSketch Engine. Accessing any corpus on any (No)SkE server should work. -The example below uses the Susanne corpus on Sketch Engine, although any corpus on any (No)SkE server should work in principle. Review Sketch Engine's [fair use policy](https://www.sketchengine.eu/fair-use-policy/) before making calls. - ->Warning: Quartz expects the (No)SkE server to be available when the app/container is first started and fails if otherwise. On startup it makes initial API calls to collect corpus information. Once those calls are cached, having server access isn't technically required. +>Warning: Quartz expects the (No)SkE server to be available when the app/container is first started and fails if otherwise. On startup it makes initial API calls to collect corpus information. Once those calls are cached, having server access isn't technically required to view cached queries. ### Environment variables +This example uses the Susanne corpus on Sketch Engine. + Quartz expects a few environment variables to be available. Set these up by renaming `.env-example` to `.env` and adapt as needed. -Key environment variables: +Some important environment variables: 1. A YAML configuration file is needed to define which corpora are available - `CORPORA_YML=config/corpora-ske.yml` 2. A server to interact with - `SGEX_SERVER=ske` points to Sketch Engine's server - - `SGEX_SERVER=https://api.sketchengine.eu/bonito/run.cgi` or use a full URL to a server + - or use a full URL to a server`SGEX_SERVER=https://api.sketchengine.eu/bonito/run.cgi` 3. A username and API key for the server, if required - `SGEX_API_KEY=""` - `SGEX_USERNAME=""` @@ -61,23 +57,33 @@ Key environment variables: ### Corpora configuration file -A YAML file is needed with details about each corpus to include in Quartz: see [config/corpora-ske.yml](/config/corpora-ske.yml) for an example. Several entries are needed to define how to access the corpus via API, interpret and label its attributes (text types), and make attributes comparable with those in other corpora (if applicable). +A YAML file is needed with details about each corpus. This example includes the SkE Susanne corpus - create one or more config files to define sets of corpora to study together. ```yaml +# settings for corpora +# corpus name used by API preloaded/susanne: # name shown in graphs name: Susanne + # unique color code + color: "#636EFA" # corpus description file (optional) md_file: config/susanne.md # text types to exclude exclude: - doc.wordcount - font.type - # text type labels (for cleaner in-app display and mapping to comparable attributes) + # text type labels (required for every non-excluded text type/attribute) label: doc.file: file doc.n: "n" head.type: head + # text types considered comparable with other corpora + # comparable: + # - + # text types to visualize w/ choropleth (requires ISO3 strings, case insensitive) + # choropleth: + # - ``` ### Trying out the app @@ -96,7 +102,7 @@ API-based data collection requires understanding the [Sketch Grammar Explorer](h Quartz was developed with funding from the [Humanitarian Encyclopedia](https://humanitarianencyclopedia.org) and support from the University of Granada [LexiCon research group](http://lexicon.ugr.es). It's the upstream repository for the [Humanitarian Encyclopedia Dashboard](https://humanitarianencyclopedia.org/analysis) ([GitHub repo](https://github.com/Humanitarian-Encyclopedia/he-dashboard)). If you're interested in the Dashboard or studying humanitarian discourse, make a free account at the Encyclopedia to try it out. -Quartz relies on APIs made available thanks to the work of [Lexical Computing](https://www.lexicalcomputing.com/) and [Sketch Engine contributors](https://www.sketchengine.eu/bibliography-of-sketch-engine/). The [Docker image](https://github.com/ELTE-DH/NoSketch-Engine-Docker) from Eötvös Loránd University Department of Digital Humanities has also been utilized. +Quartz relies on APIs made available thanks to the work of [Lexical Computing](https://www.lexicalcomputing.com/) and [Sketch Engine contributors](https://www.sketchengine.eu/bibliography-of-sketch-engine/). The [Docker image](https://github.com/ELTE-DH/NoSketch-Engine-Docker) from Eötvös Loránd University Department of Digital Humanities is also quite helpful. This app includes [Dash Bootstrap Components](https://dash-bootstrap-components.opensource.faculty.ai/); also check out [Dash's community forum](https://community.plotly.com/) for tips on visualization techniques.