This is the main folder of the Grevia Project,developed during the EIT FOOD-RIS-INSPIRE-SUMMER-SCHOOL-2024.It is a prototype web application designed to support farmers by leveraging Neo4j for knowledge graph management, advanced analytics, and semantic search capabilities. The project integrates sustainable solution providers in agrifood, organized in a graph DB, to suggest sustainable solutions based on user queries.
- Semantic search engines – Used by e-commerce sites to find products similar to a query.
- Recommender systems – Leverages embeddings and knowledge graphs to suggest similar entities or resources without generating new textual summaries.
- Domain-specific semantic retrieval tools – Matches user queries against a knowledge base, returning ranked documents or entities.
You can explore the location of the solution providers collected for the project, by clicking to the map image here.
(I will share them personally upon request, anyway the logic of the repo should work with your own data if you want to practice) Contains the RDF/Turtle files that represent the knowledge graphs (KGs) for the project:
providersKG_with_microcategories.ttl
– Final KG of providers as of 09/12/2024.providersKG_embed.ttl
– KG of solution providers enriched with OpenAI embeddings.cordisKG_embed.ttl
– KG of research projects enriched with OpenAI embeddings.
Holds configuration files:
.env
– Environment variables (e.g., OpenAI API keys).neo4j.conf
– Neo4j configuration settings.
Python scripts to manage and interact with the Neo4j database:
embedding_providers.py
– Embeds providers.part0_normalizeProvidersKG.py
– Normalizes the KG.part1_extract_ProvidersSpec_embeddings.py
– Extracts embeddings for provider specialties.part2_mkEmbed_anchors.py
– Embeds anchor terms.part3AssignSpecToMacroCat.py
– Assigns specialties to macro-categories.TestNoAnchor_part1_KmeansClustering.py
– Clusters data without anchor terms.Testpca_preparation.py
– Prepares data for PCA.
Scripts specifically for Neo4j management:
addEmbedToprovidersdb.py
– Adds embeddings to the Neo4j database.add_locationToProviders.py
– Adds location data to providers.backup_greviaKG.py
– Backs up the KG.loadGeoLocation_providers.py
– Loads provider geolocation data.load_cordisKG.py
– Loads the CORDIS knowledge graph.
Contains Jupyter notebooks for exploratory data analysis, testing queries, and prototyping algorithms.
Documentation for the project, including:
- Figures and visualizations (e.g., PCA results, clustering visuals).
readme.md
– Project documentation.
Logs generated during application operation for debugging and monitoring purposes.
Backup files for Neo4j to ensure data safety and recovery.
Contains the web application files:
app.py
– Main application file.appTest2.py
– Test version 2 of the app.appTest3.py
– Test version 3 of the app.
- Ensure Neo4j Desktop or AuraDB is properly set up.
- Place the Turtle files in the
import_data
folder. - Use the
load_data.py
script to populate the Neo4j database. - Run the Streamlit web application to explore the knowledge graph.
- Add the package to the dependencies list.
- Run:
conda env update --name grevia-app --file environment.yml --prune
- Verify:
conda list new-package-name
Run the following to create a constraint for unique URIs:
CREATE CONSTRAINT n10s_unique_uri FOR (r:Resource) REQUIRE r.uri IS UNIQUE;
- Visualize Entire Graph (nodes and relations):
MATCH (n)-[r]->(m) RETURN n, r, m LIMIT 100;
- Visualize Specific Labels or Types:
MATCH (s:Specialty)-[r]->(m)
RETURN s, r, m
LIMIT 100;
- Visualize Nodes Only:
MATCH (n)
RETURN n
LIMIT 100;
- Add Specialty to Existing Company:
MATCH (org:schema1__Organization {schema1__name: "Agriverts"}),
(vf:specialties__Specialty {rdf__value: "verticalfarming"}),
(pn:specialties__Specialty {rdf__value: "plant_nutrition"}),
(ai:specialties__Specialty {rdf__value: "ai"})
MERGE (org)-[:custom__hasSpecialty]->(vf)
MERGE (org)-[:custom__hasSpecialty]->(pn)
MERGE (org)-[:custom__hasSpecialty]->(ai)
WITH org, vf, pn, ai
OPTIONAL MATCH (vf)-[:specialties__belongsToMicrocategory]->(microVF)
OPTIONAL MATCH (pn)-[:specialties__belongsToMicrocategory]->(microPN)
OPTIONAL MATCH (ai)-[:specialties__belongsToMicrocategory]->(microAI)
RETURN org, vf, pn, ai, microVF, microPN, microAI;
git add .
git commit -m "Update project files"
git push origin main
.env
*.pyc
__pycache__/
logs/
config/.config
backups/
providers_location.json