Skip to content

frances is an advanced cloud-based text mining digital platform that leverages information extraction, knowledge graphs, natural language processing (NLP), deep learning, and parallel processing techniques. It has been specifically designed to unlock the full potential of historical digital textual collections.

Notifications You must be signed in to change notification settings

frances-ai/frances-api

Repository files navigation

frances backend

This repository is the backend of an improved version of frances

This project contains a flask backend (this repository), react frontend, defoe

User can access frances here: frances


Local Development

Requirements

  • Python3.9
  • Docker
  • postgresql
  • jena fuseki
  • defoe

Get source code repository, models and knowledge graph files

For the source code, run:

git clone https://github.com/frances-ai/frances-api

Install Python3.9

See instruction here: python3.9 install

Install dependencies

In the frances-api directory, run

pip install -r web_app/requirements.txt

Run defoe grpc server

see instructions here: defoe

Install Docker, run postgresql and fuseki using docker

see instructions here: docker

The required knowledge graphs can be generated from Knowledge Graph Generator, these knowledge graphs should be to fuseki server, see stain/fuseki

start postgresql database and fuseki server using docker compose. In the frances-api directory, run

docker compose -f docker-compose.dev.yml up

start the backend

In the frances-api/web_app directory, run

gunicorn -b :5000 'query_app:create_app()'

Running defoe in Dataproc cluster

In addition to running defoe locally using defoe grpc server, you can also connect to your dataproc clusters to run defoe queries in frances.

Set up defoe dataproc cluster

We have designed dataproc initialization action scripts to automatically setup defoe running environment. You can find these scripts in the directory: frances-api/web_app/google_cloud/cloud_storage_init_files/init.

Before creating the cluster, several files are required to upload to google cloud storage bucket.

Upload files to google cloud storage

  1. Creat a bucket
  2. Create defoe.zip file from defoe. In the defoe_lib directory, run:
    zip -r defoe.zip defoe
  3. Update the init scripts (line 36) :
    gcloud storage cp gs://<your bucket name>/defoe.zip /home/defoe.zip
  4. Upload files to the bucket. The required files are:
    • updated cluster init folder: frances-api/web_app/google_cloud/cloud_storage_init_files/init.
    • defoe.zip in step 2
    • run_query.py in defoe: defoe_lib/run_query.py
    • precomputedResult folder: frances-api/web_app/query_app/precomputedResult.

Creat the dataproc cluster using init scripts

see instructions here: dataproc initialization actions

Public fuseki server

Since defoe requires fuseki server to query knowledge graphs, we need to make it accessible to the cloud cluster. You can follow the instruction on local development part for fuseki install. But here, you apply it in a cloud VM. Note that, you should make the fuseki accessible to the cluster by opening the port in firewall rule settings.

Run frances locally with defoe dataproc cluster

We have built a dataproc_defoe_service to adopt the defoe dataproc cluster. All you need to do is change the MODE in frances-api/web_app/query_app/resolver.py:

# In line 25 of resolver.py
kg_base_url = "your fuseki server url"

# In line 30 of resolver.py
MODE = "gs"

# From line 68 - 76 of resolver.py
MAIN_PYTHON_FILE_URI = "gs://<your bucket name>/run_query.py"
PYTHON_FILE_URIS = ["file:///home/defoe.zip"]
PROJECT_ID = "<your project id>"
BUCKET_NAME = "<your bucket name>"
DEFAULT_CLUSTER = {
    "cluster_name": "<your cluster name>",
    "project_id": PROJECT_ID,
    "region": "<your cluster region>"
}

Since it uses google client library to access dataproc and cloud storage, you need to Set up Application Default Credentials in local environment See the instructions here: https://cloud.google.com/docs/authentication/provide-credentials-adc#local-dev

After this, you can run this backend similar to above local development process, except you don't need to run defoe grpc server and local fuseki server.


Cloud Deployment

Build docker image for backend:

To support multiple architectures, run the following command in frances-api directory to build the image:

docker buildx build --platform <linux/arm/v7,linux/arm64/v8,>linux/amd64 --tag <docker username>/frances-api:latest --push .

You can choose which architectures (linux/arm/v7, linux/arm64/v8, linux/amd64) to support.

Build docker image for frontend:

Setup frontend

Run the following command in frances-frontend directory to build the image:

docker buildx build --platform <linux/arm/v7,linux/arm64/v8,>linux/amd64 --tag <docker username>/frances-front:latest --push .

Set up Application Default Credentials in the cloud vm

See the instructions here: https://cloud.google.com/docs/authentication/provide-credentials-adc#local-dev

If your Cloud VM has gcloud CLI installed (pre-installed in all Google Cloud VM), just run the following command:

gcloud auth application-default login

Run all the services using docker compose

  1. Update the docker-compose.prod.yml based on your cloud configuration.
  2. Upload the docker-compose.prod.yml file to the cloud VM.
  3. Run all services using the following command (in the same directory with uploaded docker compose file):
    sudo docker compose -f docker-compose.prod.yml up
    

About

frances is an advanced cloud-based text mining digital platform that leverages information extraction, knowledge graphs, natural language processing (NLP), deep learning, and parallel processing techniques. It has been specifically designed to unlock the full potential of historical digital textual collections.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages