Specify more than one coordinates in the filter #230

NirutaDhimal · 2022-10-01T08:44:08Z

I am trying to remove clean pixels of 2 -3 coordinates of a dicom file. How do I specify it in the filter ?
[

This is my filter But I want to clean pixels on bounding box 4,462,23,102. How do I add this in filter?

vsoch · 2022-10-01T08:48:43Z

Here is an example of an entry with two coordinates: https://github.com/pydicom/deid/blob/master/deid/data/deid.dicom#L301-L308

NirutaDhimal · 2022-10-01T08:56:39Z

I tried that but It's not working. Only the First coordinates were removed.

vsoch · 2022-10-01T12:39:30Z

That's a bug then! Are you able to share some dummy data so I can use for a debuggig/test case?

vsoch · 2022-10-01T12:42:17Z

Actually I have a good test case already! I should be able to work on this and propose a fix today - stay tuned!

NirutaDhimal · 2022-10-01T14:13:13Z

I don't have any dummy data. Sorry for that. Thanks

vsoch · 2022-10-01T17:51:23Z

@NirutaDhimal I tested the syntax with multiple coordinates and it appears to work as I'd expect (both sets show up):

In [1]: results['results']
Out[1]: 
[{'reason': 'and Manufacturer contains Philips Modality contains US ImageType contains Cardiology',
  'group': 'graylist',
  'coordinates': [[0, '0,0,1024,23'], [0, '0,0,1024,23']]}] # notice there are two here

I think very likely the issue is that you either have a bug in the bounding box you are asking for, or a "contains" statement that doesn't apply to your image. It's hard to debug beyond that without seeing your data. However, the format for the coordinates that I showed above is correct.

NirutaDhimal · 2022-10-06T02:10:53Z

@vsoch I printed client.detect() result and both set shows up in the coordinates. But when I did client.save(), the saved image still contains the text in the 2nd coordinate.

vsoch · 2022-10-06T02:26:46Z

Did you do client.clean() ?

NirutaDhimal · 2022-10-06T03:13:51Z

Yes, I did that. First I did client.detect() and then client.clean() and at last I did client.save()

vsoch · 2022-10-06T03:16:42Z

I could help to debug but I’d need you to send me a dummy example (an image and recipe) that reproduces the error. Just to confirm - this is a 2D dicom?

perhaps you could take your image, make a copy that replaces identifiers, write all pixels as some single value (e.g the mean across the image) and then we would try to clean and write a different value?

NirutaDhimal · 2022-10-06T04:49:19Z

Yes its a 2D dicom file and it has 3 channels. I will send you the dummy data

NirutaDhimal · 2022-10-06T05:59:57Z

@vsoch This link has the dummy data and the deid.dicom file. Please find it. I have created the dummy file by the process you described above.
https://drive.google.com/drive/folders/1gwV6UwB44zq042pa6jmwlKZIANvKgTXV?usp=sharing

vsoch · 2022-10-06T06:27:54Z

Thank you @NirutaDhimal ! It's after midnight here, but I should be able to debug this (and get back to you) later tomorrow. Stay tuned!

vsoch · 2022-10-06T17:18:35Z

okay I think I see the cleaning is working? Are you saving? I am using your files in my PWD.

from deid.dicom import DicomCleaner
import os

client = DicomCleaner(deid='deid.dicom')
client = DicomCleaner(output_folder=os.getcwd())

client.detect('dummy.dcm')

{'flagged': True,
 'results': [{'reason': 'and Modality contains CT Manufacturer contains GE Rows equals 1024 BurnedInAnnotation contains YES',
   'group': 'graylist',
   'coordinates': [[0, '0,0,300,150'], [0, '724,0,300,150']]},
  {'reason': 'and Manufacturer contains GE Modality contains CT ManufacturerModelName contains GE|Stanford SeriesDescription contains Dose or ImageType contains SCREEN SAVE',
   'group': 'graylist',
   'coordinates': [[0, '0,0,512,121']]},
  {'reason': 'and Modality contains CT|MR ImageType contains DERIVED|SECONDARY|SCREEN|SAVE',
   'group': 'blacklist',
   'coordinates': []}]}

client.clean()
# This is needed to save the cleaned image - the original won't be touched
client.save_dicom()

And then the image I see there is a clear change. Here is the original:

And changed:

NirutaDhimal · 2022-10-08T02:53:30Z

@vsoch only one coordinates were removed from the dummy example when I tried it.
This is my code

and this is the result

vsoch · 2022-10-08T02:57:36Z

Something is different between what you and I are doing because my results lists has three entries, and yours has one. Make sure you are using the master branch of deid and copying what I did exactly, and with the same file.

NirutaDhimal · 2022-10-08T03:00:51Z

I installed deid using pip and I am working on the same file. I try to copy what you exactly did

vsoch · 2022-10-08T03:03:28Z

Could it be that I used the default deid.dicom config and it covers the locations? Or should it have covered the top left?

NirutaDhimal · 2022-10-08T03:11:01Z

Given your coordinates, it should have covered the top left and top right. Which deid.dicom file did you use? I haven't specified any group for this in the deid.dicom

{'reason': 'and Manufacturer contains GE Modality contains CT ManufacturerModelName contains GE|Stanford SeriesDescription contains Dose or ImageType contains SCREEN SAVE',
'group': 'graylist',
'coordinates': [[0, '0,0,512,121']]},
{'reason': 'and Modality contains CT|MR ImageType contains DERIVED|SECONDARY|SCREEN|SAVE',
'group': 'blacklist',
'coordinates': []}]}
Is this by default?

vsoch · 2022-10-08T03:13:47Z

Yes! So if the user doesn’t provide a custom recipe, we use this default, which is packaged with deid in the data folder. It seems that the headers in your dicom file triggered some of the default locations, but perhaps we are missing the top left? Should we try adding it? And where did you derive your metadata and coordinates from?

NirutaDhimal · 2022-10-08T03:18:58Z

I copied your code and this what I am getting.

But I gave the the custom recipe. I used dcmdump on my original dcm file and derived the metadata and for coordinates I used itksnap to get them as there were texts on these coordinates in the original file and I wanted to remove them

NirutaDhimal · 2022-10-08T03:19:18Z

It's supposed to remove on top left and top right

vsoch · 2022-10-08T03:33:25Z

Ah ok so our viewers are flipped!

What I'm seeing is that:

our viewers are flipped (I used the papaya viewer online, and those sections were on the right for me)
the default deid.dicom encompasses one of the sections you are interested in, and an additional one you didn't target.
the third section coodinates are still missing.

Now that I understand there should be another box on the other side, let me take a look again at the dummy data you sent and see if I can make sense of it. Back in a bit!

NirutaDhimal · 2022-10-08T03:38:46Z

ok thanks for helping me

NirutaDhimal · 2022-10-08T04:04:19Z

I used weasis viewer.

vsoch · 2022-10-08T04:06:47Z

Quick question - how did you derive the second coordinate? The image is 1024 by 1024, and the coordinate format is two mins followed by two maxes, so if you expect it to blank out the top right I would have expected the mins to be smaller than the maxes. What appears to be happening is that you are "selecting" a box of empty pixels.

vsoch · 2022-10-08T04:18:47Z

Okay so here is an example where I've adjusted your coordinates. There is probably a better way to describe it, but what I'm doing is shuffling around your coordinates so min values < max values, and extending the box to span your image (and remember my viewer is flipped). Trying to put this into human terms, I changed your recipe to:

FORMAT dicom

 %filter graylist

LABEL ANNOTATIONS
    contains Modality CT
    + contains ConversionType WSD
    + contains Manufacturer GE
    + contains ImageType MIP
    + contains ManufacturerModelName Revolution 
    + contains SoftwareVersions taix_bj.84.i386
    + contains Rows 1024
    + contains Columns 1024
    coordinates 1,12,227,195
    coordinates 271,9,1024,118 
                      # minr, minc, maxr, maxc
                      # indexed in the data like mask[minc:maxc, minr:maxr]
                      # those values are set to 0 to "blank" them
%header
ADD PatientIdentityRemoved YES
REPLACE PatientID var:id 
REPLACE SOPInstanceUID var:source_id

So from the above, if either of the max values is less than a min, that's an index that will silently error (giving you an empty array). So for example, if I adjust your values to the above I get:

So from what I can tell, the issue here is that the coordinate is wrong. The only reason we got two boxes on my first try is that I erroneously used the default deid.dicom recipe, and not the one you supplied. So if you derived these on your own, make sure they are in that format I've shown above, and that supplying the values actually produces boxes (the first one did not, the second box was an empty array).

NirutaDhimal · 2022-10-08T04:19:37Z

coordinates are in this format (x,y,w,h) right? My second ccordinate is (737, 9, 271, 118) which means the bottom right of the bounding box is has pixel (1108,127) both of which are less than 1024

vsoch · 2022-10-08T04:31:10Z

We follow the same convention as CTP: https://mircwiki.rsna.org/index.php?title=The_CTP_DICOM_Pixel_Anonymizer and you can see our original discussion here: https://gist.github.com/vsoch/df6957be12c34e62b21000603f1687e5 and documentation of it here:

deid/deid/dicom/pixels/detect.py

Lines 271 to 282 in 27d46bb

    
           # https://gist.github.com/vsoch/df6957be12c34e62b21000603f1687e5 
        
           # minr, minc, maxr, maxc = coordinate 
        
           # self.cleaned[minc:maxc, minr:maxr] = 0  # should fill with black 
        
           # self.cleaned[A:B, C:D] 
        
           # image[A:B,C:D] 
        
           # A: refers to ymin 
        
           # B: refers to ymax 
        
           # C: refers xmin 
        
           # D: refers to xmax 
        
           # self.cleaned[ymin:ymax, xmin:xmax] 
        
           # coordinate must be [xmin, ymin, xmax, ymax] 
        
           # x0,y0,x1,y1.

Based on this logic, you have an index 737:271 which doesn't make sense. So either we fudged the logic in the cleaner, or the coordinate needs to be adjusted.

Pinging @fimafurman and @wetzelj in case I'm missing something - I've been away from imaging a long time and don't trust myself.

NirutaDhimal · 2022-10-08T04:42:22Z

I edited the format in which you specified and it worked.

But I followed the CTP convention. It's (x, y, w, h) where w, h is the width and height of the bounding box.

Thanks for helping me and giving your time :)

vsoch · 2022-10-08T04:51:14Z

Perfect! Let's wait to hear from @wetzelj and @fimafurman before closing the issue! If we somehow used a modified format (and also CTP coordinates) we will likely want to tweak one or the other. For the time being, if this works correctly use the format we currently have. Thanks @NirutaDhimal !

wetzelj · 2022-10-13T16:15:20Z

In taking a look at the original discussion about this in deid, I think the coordinates are working as designed (xmin, ymin, xmax, ymax) - this has always been my understanding of how coordinates should be identified. While we're still on deid v0.2.28, the following recipe:

LABEL Testing
  contains Modality CR
  coordinates 0,0,50,50
  coordinates 50,50,100,100
  coordinates 100,100,150,150
  coordinates 150,150,200,200
  coordinates 200,200,250,250
  coordinates 250,250,300,300
  coordinates 300,300,350,350
  coordinates 350,350,400,400
  coordinates 400,400,450,450
  coordinates 450,450,500,500
  coordinates 500,500,550,550
  coordinates 550,550,600,600
  coordinates 600,600,650,650
  coordinates 650,650,700,700

Results in the the following image (pixel 0,0 is top left in my viewer):

This does not seem to match the definition prescribed by CTP. Following CTP rules, the recipe that should be used for the diagonal line in the image above would be:

LABEL Testing
  contains Modality CR
  coordinates 0,0,50,50
  coordinates 50,50,50,50
  coordinates 100,100,50,50
  coordinates 150,150,50,50
  coordinates 200,200,50,50
  coordinates 250,250,50,50
  coordinates 300,300,50,50
  coordinates 350,350,50,50
  coordinates 400,400,50,50
  coordinates 450,450,50,50
  coordinates 500,500,50,50
  coordinates 550,550,50,50
  coordinates 600,600,50,50
  coordinates 650,650,50,50

If the intent is that the coordinates definition should match CTP format, then we definitely seem to have a bug.

vsoch · 2022-10-13T17:05:06Z

Agree!

So what about if we allow both - we have a "ctp" type of coordinate (that gets read differently) and we maintain the default (to not break people's workflows?) Or do you think it's better to have one standard, and fix up everything we do now to conform to that?

wetzelj · 2022-10-13T17:11:57Z

Personally I think a new "ctp" type would be preferred over a "one standard". To flip from our current pattern to the "ctp" would introduce a subtle breaking change - that I could envision people could miss very easily since it's just a behavior change. If we did want to force a single standard, I would prefer to introduce ctp as a new type and then, if we really want it, at a later point in time deprecate the "coordinates" method.

vsoch · 2022-10-13T17:13:05Z

Gotcha! The trick then will be figuring out which coordinate come from CTP. I can take a first pass at this if y'all are good with this approach.

vsoch · 2022-10-13T23:14:40Z

okay please see #234

vsoch · 2022-11-22T00:53:21Z

We just merged the fix that should resolve this - thanks everyone!

vsoch mentioned this issue Oct 13, 2022

add support for translating ctp coordinates #234

Merged

3 tasks

vsoch closed this as completed Nov 22, 2022

Specify more than one coordinates in the filter #230

Specify more than one coordinates in the filter #230

Comments

NirutaDhimal commented Oct 1, 2022

vsoch commented Oct 1, 2022

NirutaDhimal commented Oct 1, 2022

vsoch commented Oct 1, 2022

vsoch commented Oct 1, 2022

NirutaDhimal commented Oct 1, 2022

vsoch commented Oct 1, 2022

NirutaDhimal commented Oct 6, 2022

vsoch commented Oct 6, 2022

NirutaDhimal commented Oct 6, 2022

vsoch commented Oct 6, 2022

NirutaDhimal commented Oct 6, 2022 • edited Loading

NirutaDhimal commented Oct 6, 2022

vsoch commented Oct 6, 2022

vsoch commented Oct 6, 2022

NirutaDhimal commented Oct 8, 2022

vsoch commented Oct 8, 2022

NirutaDhimal commented Oct 8, 2022

vsoch commented Oct 8, 2022

NirutaDhimal commented Oct 8, 2022

vsoch commented Oct 8, 2022 • edited Loading

NirutaDhimal commented Oct 8, 2022

NirutaDhimal commented Oct 8, 2022

vsoch commented Oct 8, 2022

NirutaDhimal commented Oct 8, 2022

NirutaDhimal commented Oct 8, 2022

vsoch commented Oct 8, 2022

vsoch commented Oct 8, 2022 • edited Loading

NirutaDhimal commented Oct 8, 2022

vsoch commented Oct 8, 2022 • edited Loading

NirutaDhimal commented Oct 8, 2022

vsoch commented Oct 8, 2022

wetzelj commented Oct 13, 2022

vsoch commented Oct 13, 2022

wetzelj commented Oct 13, 2022

vsoch commented Oct 13, 2022

vsoch commented Oct 13, 2022

vsoch commented Nov 22, 2022

NirutaDhimal commented Oct 6, 2022 •

edited

Loading

vsoch commented Oct 8, 2022 •

edited

Loading

vsoch commented Oct 8, 2022 •

edited

Loading

vsoch commented Oct 8, 2022 •

edited

Loading