Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Multi-Modal Image+Text] Explore available software packages to pre-process the text reports. #36

Open
4 of 8 tasks
sumedhasingla opened this issue Jul 16, 2018 · 10 comments
Assignees

Comments

@sumedhasingla
Copy link
Contributor

sumedhasingla commented Jul 16, 2018

Location: /pghbio/dbmi/batmanlab/Data/radiologyTextDataset2/singla/RAD-ALL.deid

Keyword or concept tagging

  • Noble Coder Named Entity Recognition (NER) engine for biomedical text. DBMI tool . Can be used with TIES

  • TIES inbuild annotation tool

  • Work with Mike to get the annotations which TIES stores for each report.

  • Apache cTAKES

Negation identifier

Pre-processing for VQA

  • Converting the report to question

Pre-processing for Image Captioning

  • Converting the report to a template
@sumedhasingla sumedhasingla changed the title [Multi modal Image+Text] Preprocess the text reports. [Multi-Modal Image+Text] Preprocess the text reports. Jul 16, 2018
@kayhan-batmanghelich
Copy link
Collaborator

@pyadolla the template is explained in this issue (one of the papers there):
#35

@kayhan-batmanghelich
Copy link
Collaborator

@sumedhasingla
Copy link
Contributor Author

@kayhan-batmanghelich
Copy link
Collaborator

In this paper, they used a tool from NIH called Medical Text Indexer. Here is what they did:

image

This might be helpful for tagging. Please take a look.

@kayhan-batmanghelich
Copy link
Collaborator

kayhan-batmanghelich commented Jul 26, 2018

@pyadolla would you please add the results of the CliNER here for the record.

@sumedhasingla
Copy link
Contributor Author

@Sumedha
Run TIES, Medical Text Indexer (MTI), CliNER on Finding and Impression sections of the report.

@sumedhasingla sumedhasingla changed the title [Multi-Modal Image+Text] Preprocess the text reports. [Multi-Modal Image+Text] Explore available software packages to pre-process the text reports. Aug 6, 2018
@kayhan-batmanghelich
Copy link
Collaborator

@sumedhasingla if you got some preliminary results from TIES, paste an example here.

@sumedhasingla
Copy link
Contributor Author

sumedhasingla commented Sep 7, 2018

NOBLE Tool, extensively tags the reports with the concepts + semantic type with a chosen thesaurus. I am using "NCI_Metathesaurus". The concepts found by NOBLe are used as input for pyContext to find the negations.
The result of NOBLE Tool tag on about 8k reports is at location: '/pghbio/dbmi/batmanlab/Data/radiologyTextDataset2/singla/RAD-ALL-NOBLE-ContextPY-ImageFileName.csv'

@sumedhasingla
Copy link
Contributor Author

TIES annotation tool, cannot run the reports we have in RAD-ALL.deid as these reports were directly extracted from MARS and there is no way to query them or find them through TIES interface.

To process and get tags using TIES tool, we again have to extract reports from the TIES (500k) and save annotation information with the report. We ran this process through a small sample of about 5k reports. The results are at location: /pghbio/dbmi/batmanlab/Data/radiologyTextDataset2/Reports/test-concepts

The problem with this approach is, TIES can handle these annotation for only 5k files at a time. The process have to re-run after every 5K reports. Also, while building query in TIES to extract reports, the queries should be such that the number of reports , resulted from the query is atmost 5K.

As, TIES uses NOBLE Tool under the hood. So may be we can skip TIES annotation.

@sumedhasingla
Copy link
Contributor Author

sumedhasingla commented Sep 7, 2018

An analysis of the unique word in these 8k reports.
Vocabulary size: 7,495
image
Top-20 semantic type
image
Top-20 concept words
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants