Skip to content
View JPonsa's full-sized avatar

Highlights

  • Pro
Block or Report

Block or report JPonsa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
JPonsa/README.md

About me

Intellectually curious Health Data Scientist passionate about the data revolution in the healthcare sector. I am particularly interested in creating innovative AI/ML solutions that deliver high value in the pharmaceutical/healthcare sector. After 6+ years of consultancy experience delivering complex data products and assisting large client organisations, I wish to expand my Data Science skills and knowledge. My goal is to pivot my career to continue as a Data Scientist in the healthcare industry.

To know more about me, please feel free to contact me or visit my LinkedIn

Project summary

Main Portfolio Project

As part of my Health Data Science MSc dissertation at UCL, I have built a Knowledge Graph Retrieve Augmented Generation (KG-RAG) system that leverages Large Language Models to efficiently interrogate and analyse a large collection of clinical trial protocols from ClinicalTrials.gov.

Key learnings:

  • Deploy open-source Large Language Models (LLMs), such as Llama3 or Mixtral8x7b, in High-Performance Computing (HPC) using vLLM.
  • Process semi-structured Clinical Trial Protocols using Non-SQL/MongoDB.
  • Creation and hosting of a Knowlege Graph using BioCypher and Neo4j AuraDB.
  • Implementation of a ReAct design using DSPy, creating custom tools that can be used by an LLM to query Knowledge Graphs and SQL dbs.
  • Use high-level frameworks such as Llama-index and LangChain for txt-2-SQL and txt-2-Cypher.
  • How to evaluate Large Language Models.

Do you want to know more about this project?

Full Portfolio

Please, see below a summary of a few projects showcasing my Data Science skills.

Skill \ Technology UCI Heart Disease Card Fraud Disaster Tweets Causal Impact
Business question Diagnose which patients
are suffering heart diseases
Detect likely
fraudulent transactions
Identify disaster events
mentioned in text/tweets
Quantify the effect of COVID
lockdown in stock price
Language Python Python Python Python / R
ML type Classifier Classifier NLP Classifier Time Series Regression
Data Engineering pySpark
Feature Engineering Time Series Features Word Embedding
Over / Under sampling SMOTE
Traditional ML Sklearn Sklearn Causal Impact
Gradient Boosting XGBoost CatBoost
Deep Learning LSTM, GRU, DistilBert
Hyper fine tunning Optuna
Explainable ML SHAP Values
User Interface Streamlit
ML Ops MLFlow MLFlow

Volunteering

I participated with the NHS Pycom in the development of nhspy-plothedots, a package for Statistical Process Control analysis and plotting. My mean contribution was creating unit test scripts. This gave me an opportunity to (a) know more about the package so I can contribute in other areas in the future and (b) practice software development skills (e.g. unit testing, raise pull request) that I have used in my professional career but they may not show up in my Data Science portfolio.

Pinned

  1. UCI_Heart_Disease UCI_Heart_Disease Public

    Jupyter Notebook

  2. nhs-pycom/nhspy-plotthedots nhs-pycom/nhspy-plotthedots Public

    We are working with the NHS-R community to develop a python implementation of 'NHSRplotthedots' SPC package to support NHSE/I 'Making Data Count' programme

    Jupyter Notebook 12 4

  3. card_fraud_detection card_fraud_detection Public

    Jupyter Notebook

  4. causal_inference_ts causal_inference_ts Public

    HTML