This repository hosts a project focused on data analysis and generation of AI models to assist in the process of classifying medical images for cancer stage analysis.
Team: Augusto Pereira, Dionizio, João Victor, Milena Andreuzo, Thalyta Genaro.
Keywords: Data Analysis, Cancer Stage Analysis, AI Models and Machine Learning, Histological Images, Cancer Research, Cell Contamination Detection.
Objective: The primary objective is to categorize images as either "high-information/high-cancer" or "low-information/low-cancer," aiding in the identification of suitable images for the subsequent cancer stage analysis.
The images utilized in this project were obtained through the scanning of histological slide samples using a CZI format scanner, subsequently converted into JPG format. The dataset was generously provided by the Experimental Pathology Laboratory at the School of Medicine, PUC-PR.
- Exploratory data analysis.
- Training AI models for image classification.
- Visualization of classification results.
- Programming Language: Python
- Main Libraries: TensorFlow, OpenCV, Scikit-learn
- Virtual Environment: Conda, Google Colab and Jupyter Notebook
Histological images are divided into hundreds of fragments, and the percentage of useful images for determining the cancer stage in the sample is approximately 50%.
Before the introduction of our models, this process is carried out manually, consuming a significant amount of time for doctors. The goal is to utilize Data Analysis and AI models to accelerate this initial image selection process.
Image classification plays a crucial role in ensuring that only the most suitable images proceed to the subsequent stage, which involves identifying the cancer stage.
The bad fragments to define the stage of cancer normally have these 4 problems:
- Blood Vessel with Blood cells
- RBCs and cancer cells may look alike and get in the way of counting
- Random artefact above the histological photo
- Tissue fold
- With little or no information
- Like so much white space
Below we have some examples with this problems:
Blood vessel with blood cells:
- Codes
- Contains the models and tests for building the AI.
- Example-Problems
- Examples of low- or high-quality images used in the model.
- Tables
- Auxiliary tables for the code.
- Notebooks (*.ipynb)
- Análise_do_Neoplasia_usando_YOLOv8
- Notebook demonstrating the use of YOLOv8 to solve the problem.
- Treinando_o_Modelo_YOLOv8
- Notebook for re training the YOLOv8 models.
- Análise_do_Neoplasia_usando_YOLOv8
Contributions are welcome! If you want to contribute to this project, fell free!
- Name: João Victor
- Contact: [email protected]
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
For inquiries related to this project, contact me.
- We thank the Experimental Pathology Laboratory - School of Medicine, PUC-PR, for providing the data.
Please note that the images used in this project are sourced from real individuals and have been provided solely for the purpose of analysis and AI model creation. To respect privacy and ethical considerations, the actual images will not be displayed. Instead, only the vectorized information and relevant metadata will be presented and discussed in the documentation.
We are committed to handling sensitive data with the utmost care and following ethical guidelines to ensure the responsible and respectful use of this information throughout the project.