The research background of this paper is the increasing global importance of the GIS discipline. GIS technology is widely applied in urban planning, environmental monitoring, disaster management, and other areas. This interdisciplinary application has attracted a growing number of students to pursue related graduate programs. However, applicants currently face the following key issues:
- Information overload: Data on over 600 GIS projects and more than 2000 professors from around the world are dispersed across different sources, making information retrieval difficult.
- Lack of personalized guidance: Existing tools cannot provide precise recommendations based on an applicant’s research interests or career goals.
- Difficulty understanding research dynamics: Students find it challenging to gain in-depth insights into professors' research interests and trends in the field, which affects their decision-making.
To address these challenges, this paper proposes the GISphere-KG platform. By integrating and organizing heterogeneous data through a knowledge graph (KG) and leveraging the natural language processing capabilities of large language models (LLMs), the platform offers intelligent search, matching, and recommendation functionalities for applicants. The core research questions include:
- How can applicants more efficiently and intuitively access project resources that align with their interests?
- How can applicants discover professors with similar research directions?
- How can personalized recommendations for suitable GIS projects be provided based on an applicant’s interests?
- Collected information from over 600 GIS projects and 2000 professors across 97 countries and regions, including details such as country, city, university, and professors' research interests.
- Data cleaning and standardization are performed to ensure accuracy and consistency.
- Designed a seven-category entity structure—including relationships such as "Professor - Research Interests - University - Geographic Location"—and visualized it using Neo4j.
- Defined semantic relationships (e.g., similarity between professors' research interests) to support complex semantic queries.
- Utilized state-of-the-art embedding models (e.g., text-embedding-ada-002) to convert research interests into semantic vectors, with cosine similarity used to compute the similarity between interests.
- Established "Professor Research Interest Similarity" relationships to support rapid discovery of related professors.
- Explicit graph search: Directly queries the entities and relationships within the graph database (e.g., professors' research interests or universities' geographic locations).
- Implicit graph search: Infers semantically similar projects and related professors based on the research interests provided by the applicant.
Note
If you encounter any issues during usage, or have any suggestions for improvement, please feel free to submit an issue or pull request on GitHub to help us improve this project together.
Access the live application at: https://gispherekg.streamlit.app/
To run the application locally, follow these steps:
-
Neo4j Setup:
- Create a Neo4j instance (for free).
- Navigate to 'Back up and restore' and select 'Restore from backup file.'
- Upload one of the backup files located in the
llm-chatbot-python/data/
folder:neo4j-database_v1.backup
: The original graph database used in the published paper.neo4j-database_v2.backup
: An updated version (as of March 2025) containing enriched GISphere data, including more universities and professors—particularly in mainland China. This version also features improved entity segmentation and enhanced linkage to research interests, powered by LLM-assisted processing.
-
Environment Variables: Create a
secrets.toml
file inllm-chatbot-python/.streamlit/
folder and configure the following environment variables:- Neo4j Database:
NEO4J_URI
,NEO4J_USERNAME
,NEO4J_PASSWORD
- OpenAI LLM:
OPENAI_API_KEY
,OPENAI_MODEL
- Neo4j Database:
-
Install Dependencies: In the project directory, run:
cd llm-chatbot-python pip install -r requirements.txt
-
Run the Application: Launch the app on http://localhost:8501/ with:
streamlit run bot.py
This project is built upon the Streamlit framework. For additional resources, please refer to the following:
- Streamlit Documentation
- GitHub Repo: Build a Neo4j-backed Chatbot using Python
- Tutorial: Build a Neo4j-backed Chatbot using Python
Gu, Z., Li, W., Zhou, B., Wang, Y., Chen, Y., Ye, S., Wang, K., Gu, H. and Kang, Y. (2025), GISphere Knowledge Graph for Geography Education: Recommending Graduate Geographic Information System/Science Programs. Transactions in GIS, 29: e13283. https://doi.org/10.1111/tgis.13283