The Brazilian Annotated X-ray (BRAX) dataset is an automatically annotated dataset consisting of chest X-ray studies from 24,959 patients in a large general hospital in Brazil. The dataset contains a total of 40,967 images.
Labels were automatically extracted from medical reports in Portuguese using NLP tools:
- No Finding: Value is 1 if no other label is present, except for support devices.
- Enlarged Cardiomediastinum
- Cardiomegaly
- Lung Lesion
- Lung Opacity
- Edema
- Consolidation
- Pneumonia
- Atelectasis
- Pneumothorax
- Pleural Effusion
- Pleural Other
- Fracture
- Support Devices
Each row in the dataset includes the following information:
- PatientID: A unique identifier for the patient. As part of the de-identification procedure, patient IDs were randomly generated.
- PatientSex: The sex of the patient. Enumerated values:
- "M" for male
- "F" for female
- "O" for other
- PatientAge: Patient's age is provided in 5-year age groups. Patients aged 85 or older are classified as "85 or more."
- AccessionNumber: A DICOM identifier for the study. This was randomly generated as part of the de-identification process.
- StudyDate: A fictitious date for the study.
- Labels: Columns indicating the presence of the 14 labels. The code "1" represents positive, "0" represents negation, and "-1" indicates uncertainty.
- ViewPosition: The radiographic view associated with the patient's position. Defined terms:
- AP - Anterior/Posterior
- PA - Posterior/Anterior
- LL - Left Lateral
- RL - Right Lateral
- RLD - Right Lateral Decubitus
- LLD - Left Lateral Decubitus
- RLO - Right Lateral Oblique
- LLO - Left Lateral Oblique
- Rows: The size (number of pixels) along the vertical axis of the image matrix.
- Columns: The size (number of pixels) along the horizontal axis of the image matrix.
- Manufacturer: An index representing the CT scanner's manufacturer. Manufacturer names are coded as integers to conceal their identities while still allowing future research on possible biases related to vendor/machine settings.
Dimensions | Modality | Task Type | Anatomical Structures | Number of Categories | Data Volume | File Format |
2D | X-Ray | Unsupervised Representation Learning, Classification | Heart, chest and lungs | 14 | 24,959 cases, 19,351 patients, 40,967 images | PNG, DICOM |
Dataset Statistics | size |
min | 1024x1082 |
median | 1024x1082 |
max | 1024x1082 |
├── id_00082e3a-ec11c281-24a79518-35d3cc78-22432fb1
│ ├── Study_09342613.22970294.40563343.35634289.53163857
│ │ ├──Series_34523850.21768222.07508551.49190893.14603932
│ │ │ ├── image-48219538-15808688-10728535-52591088-74513595.dcm
│ │ ├──Series_46177599.95157937.50203011.63555832.78161828
│ │ │ ├── image-48219538-15808688-10728535-52591088-74513595.dcm
│ ├── Study_51027964.83117427.20948980.39828954.71003607
│ │ ├──Series_57104384.74837822.26263330.97688944.88328246
│ │ │ ├── image-48651870-23127024-63651831-17193122-94277772.dcm
│ │ ├──Series_72993604.79060724.14705971.37953714.05369399
│ │ │ ├── image-08788867-77959894-95405066-47915205-10581326.dcm
Eduardo Pontes Reis (Hospital Israelita Albert Einstein)
Official Website:
Download Link:
Article Address:
Publication Date: 2022-06
title={BRAX, Brazilian labeled chest x-ray dataset},
author={Reis, Eduardo P and De Paiva, Joselisa PQ and Da Silva, Maria CB and Ribeiro, Guilherme AS and Paiva, Victor F and Bulgarelli, Lucas and Lee, Henrique MH and Santos, Paulo V and Brito, Vanessa M and Amaral, Lucas TW and others},
journal={Scientific Data},
publisher={Nature Publishing Group UK London}
Original introduction article is here.