Distinguishing Dementia Using Hebrew NLP and LLM Models

This project focuses on developing a machine learning pipeline to distinguish early-stage dementia among Hebrew-speaking individuals. Using Natural Language Processing (NLP) techniques and Large Language Models (LLMs), the goal is to analyze speech patterns and identify linguistic markers associated with dementia. The approach incorporates a distribution and variety of speech characteristics from diverse individuals.

Key Objectives

Data Collection: Gather a diverse dataset of speech samples from Hebrew-speaking individuals, including those diagnosed with dementia and healthy controls.
Feature Extraction: Use NLP techniques to extract linguistic features such as syntax, semantics, fluency, and speech dynamics.
Model Development: Fine-tune Hebrew-compatible LLMs to classify dementia-related speech patterns.
Evaluation: Measure model accuracy and generate insights into linguistic differences between healthy individuals and those with dementia.

Repository Structure

Steps in the Process

Data Collection:
- Collect speech samples from clinical datasets (dementia patients) and healthy individuals.
- Convert audio files to text using Hebrew-compatible speech-to-text systems.
Data Preprocessing:
- Clean and normalize Hebrew text for NLP tasks.
- Tokenize, lemmatize, and tag parts of speech using Hebrew NLP tools.
Feature Engineering:
- Extract linguistic features:
  - Lexical Features: Vocabulary richness, word frequency.
  - Syntactic Features: Grammar patterns and sentence structure.
  - Semantic Features: Contextual coherence and sentiment analysis.
  - Prosodic Features: Speech rate and intonation (if audio is used).
Model Development:
- Fine-tune Hebrew-compatible LLMs:
  - AlephBERT, HeBERT, or multilingual models like mBERT and XLM-RoBERTa.
- Train models to classify dementia-related speech patterns.
Model Evaluation:
- Evaluate models using metrics like Accuracy, Precision, Recall, and F1-score.
- Use t-SNE or PCA to visualize differences between dementia and non-dementia speech.

Tools and Technologies

Programming: Python
NLP Libraries: transformers, spaCy, nltk
Speech-to-Text: Google Speech API, Wav2Vec2 for Hebrew
Machine Learning: TensorFlow, PyTorch, scikit-learn
Visualization: Matplotlib, Seaborn

How to Run the Project

Clone the repository:

git clone https://github.com/your-username/Dementia-NLP-Hebrew.git

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
שגב - כלי עיבוד שפה בעברית.xlsx		שגב - כלי עיבוד שפה בעברית.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Distinguishing Dementia Using Hebrew NLP and LLM Models

Key Objectives

Repository Structure

Steps in the Process

Tools and Technologies

How to Run the Project

About

Uh oh!

Releases

Packages

segevcoh7/Early-Dementia-Prediction-via-NLP-Model

Folders and files

Latest commit

History

Repository files navigation

Distinguishing Dementia Using Hebrew NLP and LLM Models

Key Objectives

Repository Structure

Steps in the Process

Tools and Technologies

How to Run the Project

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages