Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data request] figshare brain tumor dataset #5395

Open
BirkhoffLee opened this issue May 3, 2024 · 3 comments
Open

[data request] figshare brain tumor dataset #5395

BirkhoffLee opened this issue May 3, 2024 · 3 comments
Labels
dataset request Request for a new dataset to be added

Comments

@BirkhoffLee
Copy link

BirkhoffLee commented May 3, 2024

  • Name of dataset: Figshare brain tumor dataset
  • URL of dataset: https://doi.org/10.6084/m9.figshare.1512427.v5
  • License of dataset: CC BY 4.0
  • Short description of dataset and use case(s): 3064 T1-weighted contrast-inhanced images from 233 patients with three kinds of brain tumor: meningioma (708 slices), glioma (1426 slices), and pituitary tumor (930 slices).

This brain tumor dataset containing 3064 T1-weighted contrast-inhanced images
from 233 patients with three kinds of brain tumor: meningioma (708 slices),
glioma (1426 slices), and pituitary tumor (930 slices). Due to the file size
limit of repository, we split the whole dataset into 4 subsets, and achive
them in 4 .zip files with each .zip file containing 766 slices.The 5-fold
cross-validation indices are also provided.


This data is organized in matlab data format (.mat file). Each file stores a struct
containing the following fields for an image:

cjdata.label: 1 for meningioma, 2 for glioma, 3 for pituitary tumor
cjdata.PID: patient ID
cjdata.image: image data
cjdata.tumorBorder: a vector storing the coordinates of discrete points on tumor border.
For example, [x1, y1, x2, y2,...] in which x1, y1 are planar coordinates on tumor border.
It was generated by manually delineating the tumor border. So we can use it to generate
binary image of tumor mask.
cjdata.tumorMask: a binary image with 1s indicating tumor region


This data was used in the following paper:

  1. Cheng, Jun, et al. "Enhanced Performance of Brain Tumor Classification via Tumor Region Augmentation
    and Partition." PloS one 10.10 (2015).
  2. Cheng, Jun, et al. "Retrieval of Brain Tumors by Adaptive Spatial Pooling and Fisher Vector
    Representation." PloS one 11.6 (2016). Matlab source codes are available on github
    https://github.com/chengjun583/brainTumorRetrieval

Jun Cheng
School of Biomedical Engineering
Southern Medical University, Guangzhou, China
Email: [email protected]

Folks who would also like to see this dataset in tensorflow/datasets, please thumbs-up so the developers can know which requests to prioritize.

And if you'd like to contribute the dataset (thank you!), see our guide to adding a dataset.

@BirkhoffLee BirkhoffLee added the dataset request Request for a new dataset to be added label May 3, 2024
@BirkhoffLee
Copy link
Author

Here's the python code shared by someone on Kaggle that transforms the raw .mat files into numpy arrays of brain tumor MRI images: https://www.kaggle.com/code/tasni18/brain-tumor-classification

@ccl-core
Copy link
Collaborator

ccl-core commented May 7, 2024

Hello @BirkhoffLee and thank you for raising this issue!

Are you planning to add this dataset to TFDS yourself? If yes, you can follow this guide to adding a dataset.

As an example, you can refer to this recent commit that introduced the Databricks Dolly dataset.

@BirkhoffLee
Copy link
Author

Hello @BirkhoffLee and thank you for raising this issue!

Are you planning to add this dataset to TFDS yourself? If yes, you can follow this guide to adding a dataset.

As an example, you can refer to this recent commit that introduced the Databricks Dolly dataset.

I'd love to, but I have a few questions:

  1. Removal of some data. I currently use the dataset on an image classification research project. The original dataset was published with MATLAB format. I have extracted the images as .PNG files (i.e.: removing some data in the orig dataset). Can I keep it as-is in the TFDS repo? To be more specific, only retaining cjdata.label and cjdata.image.
  2. Training split. The original dataset does not split the data for training and testing. How am I supposed to handle it in this repo?
  3. Hosting. Does the TFDS / Tensorflow project offer any place to store the dataset files? I do not see other datasets hosted here.

I have another dataset that I wish to be added into this repo: https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-classification-mri. If I can have guidelines clarified then I'd be able to add it as well.

I'm new to the sector and apologies for any naive questions that I may have above, however I do wish to contribute to this repo because it makes research a lot easier. Much thanks :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset request Request for a new dataset to be added
Projects
None yet
Development

No branches or pull requests

2 participants