Grids Ontology Improvement #827

hepengfe · 2022-06-07T01:28:24Z

This PR fixes #817.

Description of changes

change Grids -> Grid and move it to core.py as a data structure rather than an ontology
- dealing edge cases of residual grid cells(last grid cells per row/column). We use the image size as upper bound for these grid cells so they might be smaller than regular grid cells.
improve grid initialization in BoundingBox
add more properties and functions in Box based on three use cases.
- A standalone box (not associated with grid)
- Ground truth bounding box. (known absolute position and shape)
- Predicted bounding box. (associated with a grid cell and known height/width offsets)
add complete docstring with use cases in Box
Box condition attributes
- self._is_grid_associated
- self._is_box_center_set()

Possible influences of this PR.

Describe what are the possible side-effects of the code change.

Test Conducted

Describe what test cases are included for the PR.

…x the initialization parameter requirement)

…o grid_improve

hepengfe · 2022-06-07T23:41:36Z

The implementation design considerations is:

Regular shape CV objects should have a reference point. In our case, Box is an example and BoundingBox is even more special that its reference point needs to be center of grid cell.
Due to the limitation of ontology object __init__, we can not let Grid be the parameter to initialize BoundingBox and associate BoundingBox and Grid.
Based on the considerations above, I initialize both BoundingBox center and Box center (reference point) as the image center and record whether it's a default box center and whether it's associated with a grid. And it will give corresponding Warning/ValueError appropriately.
I provide methods to set box center(reference point) based on user input point location (cy, cx) or Grid.

Several questions about the design:
I think Grid should be a class that is somehow independent from DataPack and specific image payload. As I think Grid is a just an interface that able to provides grid cell location given an image shape. As images in a dataset is basically the same, we might shouldn't create a Grid for every image. But it seems like Grid is bound to DataPack and DataPack contains few images. If we store grid inside DataPack, we might have to duplicate grid for each image.

codecov · 2022-06-07T23:55:38Z

Codecov Report

Merging #827 (d48b693) into master (8e03abe) will increase coverage by 0.05%.
The diff coverage is 80.87%.

@@            Coverage Diff             @@
##           master     #827      +/-   ##
==========================================
+ Coverage   80.75%   80.81%   +0.05%     
==========================================
  Files         254      254              
  Lines       19364    19574     +210     
==========================================
+ Hits        15637    15818     +181     
- Misses       3727     3756      +29

Impacted Files	Coverage Δ
forte/data/data_store.py	`93.17% <ø> (ø)`
forte/data/entry_converter.py	`82.89% <ø> (-0.44%)`	⬇️
forte/data/ontology/top.py	`76.45% <68.96%> (+0.70%)`	⬆️
forte/data/ontology/core.py	`76.13% <72.09%> (-0.79%)`	⬇️
forte/data/data_pack.py	`85.49% <100.00%> (-0.03%)`	⬇️
tests/forte/grids_test.py	`100.00% <100.00%> (ø)`
tests/forte/image_annotation_test.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e03abe...d48b693. Read the comment docs.

hepengfe · 2022-06-08T18:07:38Z

Usually, there are many regular ontologies(such as bounding box) and many images but only few grids. The number of grids actually needed only depends on the number of different image sizes and the number of grid configurations.

My proposal is that
Past design: Each datapack stores few images and store few grids. The number of grids in total is dependent the number of images. With a large datasets, there will be many duplicated grids distributed in many DataPack.

New design: we implement a Grid interface such that only a collection of grids (probably tens of them) is needed to query-related data given image and grid-related parameters/regular ontology such as BoundingBox. Grid itself just like other ontology doesn't store image data. In addition, it is not bound to any other regular ontology.

If this design is expected, we might not want it to inherit entry as it will create new ones for each DataPack.

hunterhector · 2022-06-08T21:29:56Z

Usually, there are many regular ontologies(such as bounding box) and many images but only few grids. The number of grids actually needed only depends on the number of different image sizes and the number of grid configurations.
My proposal is that Past design: Each datapack stores few images and store few grids. The number of grids in total is dependent the number of images. With a large datasets, there will be many duplicated grids distributed in many DataPack.
New design: we implement a Grid interface such that only a collection of grids (probably tens of them) is needed to query-related data given image

Good thoughts on limiting "number of grids", feels like a good idea, though I haven't considered everything about it.

and grid-related parameters/regular ontology such as BoundingBox. Grid itself just like other ontology doesn't store image data. In addition, it is not bound to any other regular ontology.

I have a few suggestions on this detail though. You can already see the problem of making Grid be like other ontology items. First, it cannot be stored as data along with each DataPack (since this will cause duplications). Second, it cannot inherit an Entry. I think this basically means we shouldn't consider Grid inside the ontology framework.

Instead, if you want to achieve this, we probably will make Grid a basic data structure used by the Forte DataPack system only. (i.e. Grid is more like a List, SortedList, TList or Adjacent Array, instead of an Entry). In DataPack, we can create some default Grids and users won't involve in this process, they are the underlying data structures that speed up certain computations only.

Having Grid as a new data structure does not mean that users won't be able to use it, for example, we can imagine supporting Grid as an attribute type for a certain ontology (i.e Grid is not used as an Entry type, but can be used as an attribute type, like TList). Consider the following specification:

{
  "entry_name": "User.Defined.Type",
  "entry_attributes":
   {
       "grid": "forte.data.core.grid",      
   }
}

We can allow users to invoke some functions like computing the position of a particular bounding box with any grids, even the one they defined. However, these should be advanced and rare use cases for now.

hunterhector

To continue this PR, let's make Grid a data structure, like Span.

… list

…o grid_improve

hunterhector

You can store Grid in the data store, but don't store them like other entries, because they are not.
I think you haven't handled (or at least haven't documented) the behavior of the edge cases well. How do we treat the pixels on the side where y > c_h * self._height. Let's make sure we have a common understanding of the behavior, test the edge cases and write the documentation for it

hunterhector · 2022-06-23T18:19:23Z

forte/data/data_store.py

@@ -681,7 +679,7 @@ def _new_grid(
        tid: int = self._new_tid() if tid is None else tid
        entry: List[Any]

-        entry = [image_payload_idx, None, tid, type_name]
+        entry = [None, None, tid, type_name]
        entry += self._default_attributes_for_type(type_name)


If a Grid is no longer an entry, why are we still storing them like entries?

hunterhector · 2022-06-23T18:19:56Z

forte/data/data_store.py

@@ -883,7 +881,7 @@ def num_entries(self, entry_type_name: str) -> int:

    def _add_entry_raw(
        self,
-        entry_type: Type[Entry],
+        entry_type: Union[Type[Entry], Type[Grid]],


don't do this here, we basically don't store them together like other entries anymore.

hunterhector · 2022-06-23T18:20:14Z

forte/data/data_store.py

@@ -1082,7 +1080,6 @@ def add_image_annotation_raw(
    def add_grid_raw(


shouldn't have add_grid_raw

hunterhector · 2022-06-23T18:21:51Z

forte/data/ontology/core.py

+        image_height: int,
+        image_width: int,
+    ):
+        if height <= 0 or width <= 0:


why don't we also check for image_height

hunterhector · 2022-06-23T18:23:28Z

forte/data/ontology/core.py

+        self.c_h, self.c_w = (
+            image_height // self._height,
+            image_width // self._width,
+        )  # compute the height and width of grid cells


We round up the division here, but what's the implication? Does it mean some grid cells are larger? Or do we have residual values? Which ones are larger and what's the behavior? We need full documentation of these

hunterhector · 2022-06-23T18:25:28Z

forte/data/ontology/core.py

+        ] = img_arr[
+            h_idx * self.c_h : (h_idx + 1) * self.c_h,
+            w_idx * self.c_w : (w_idx + 1) * self.c_w,
+        ]


What would the behavior related to the residual (incomplete) grids on the edge?

the residual grid cell (index range) will be smaller than other regular ones.

This needed to be explained in the docstring clearly.

forte/data/ontology/core.py

hunterhector · 2022-06-23T18:29:56Z

forte/data/ontology/top.py

@@ -1010,40 +876,89 @@ class Box(Region):
        pack: the container that this ``Box`` will be added to.
        image_payload_idx: the index of the image payload. If it's not set,
            it defaults to 0 which meaning it will load the first image payload.
+        height: the height of the box, the unit is one image array entry.


Users have no idea what "one image array entry" means

hunterhector · 2022-06-23T18:31:43Z

forte/data/ontology/top.py

        cy: the row index of the box center in the image array,
-            the unit is one image array entry.
+            the unit is one image array entry. If not set, the box center
+            will be set to center of image (half height of the image).


do we really want to provide default values here?

The idea I was thinking is that regular shape makes sense only when it has a reference point (box center here).
I am thinking two layers of conditions

Whether it has a non-default reference point (maybe we don't need to set them to an actual value here).

Whether it's associated with a grid. (It might or might not depending on the box use cases)

'

… with respector to Box.corners)

mylibrar

I guess we will need a comprehensive example to showcase the power of Grid (maybe in another PR).

forte/data/data_pack.py

forte/data/ontology/core.py

forte/data/ontology/top.py

mylibrar · 2022-06-30T23:25:53Z

forte/data/ontology/top.py

+        grid cell center.
+        3. When we predict a box, we will have the predicted box shape (height,
+        width) and the offset between the box center and the grid cell center,
+        then we can compute the box center.


Can we create a standalone Box also for ground truth box or predicted box? Do we assume that the input parameters to a predicted box will be the box shape and gird center offset?

Yes. In the first use case, only cy and cx are required to be set.

hunterhector

The main thing about this PR is the documentation. It is a bit hard for me to understand the purpose of classes and functions. We need to create a simple tutorial to explain how these things work (and also include some examples in the docstring itself).

@mylibrar I also haven't checked the implementation details, please check that part.

hunterhector · 2022-07-02T23:12:42Z

forte/data/data_pack.py

@@ -172,7 +170,9 @@ def __init__(self, pack_name: Optional[str] = None):
        self._data_store: DataStore = DataStore()
        self._entry_converter: EntryConverter = EntryConverter()
        self.image_annotations: List[ImageAnnotation] = []
-        self.grids: List[Grids] = []
+        self.payloads: List[np.ndarray] = []


Here is an unused variable. And more, this make a new assumption that we store a list of payload in the format as np.ndarray. Let's not do this now (at least not in this PR).

hunterhector · 2022-07-02T23:13:14Z

forte/data/data_pack.py

-        self.grids: List[Grids] = []
+        self.payloads: List[np.ndarray] = []
+
+        self.grids: List[Grid] = []


This is also not used in this file.

hunterhector · 2022-07-02T23:18:24Z

forte/data/ontology/core.py

+    """
+    Regular grid with a grid configuration dependent on the image size.
+    It is a data structure used to retrieve grid-related objects such as grid
+    cells from the image. Grid itself doesn't store any data.


I think the core functionality of a Grid is used to index the image area so that we can easily compute the overlap of any two areas. Let's don't forget this.

Btw, is there any use case you find that we need to retrieve the image inside the grid cell?

I think bounding box prediction is usually based on only image on inside a grid cell (anchor box) in YOLO algorithm.

Is it a convention or just a choice of a particular library (YOLO)?

you will need to explain these things to the users. before that, can you make sure you explain this to the team so people understand what you are doing?

hunterhector · 2022-07-02T23:19:29Z

forte/data/ontology/core.py

+    we compute the height and the width of grid cells.
+    For example, if the image size (image_height,image_width) is (640, 480)
+    and the grid shape (height, width) is (2, 3)
+    the size of grid cells (self.c_h, self.c_w) will be (320, 160).


I guess you might want to discuss how we handle the cases where the numbers are indivisible?

hunterhector · 2022-07-02T23:20:45Z

forte/data/ontology/core.py

+    The grid can be totally "free-form" that we don't initialize it with any
+    image size and pass the image size directly into the method/operation on
+    the fly. However, since the number of different image shapes are limited,
+    we can bound one grid to one image size, and we can also do the image/grid


This part is still confusing, try to explain this to someone who never works on this and see if they understand. For example, why does one need to bound the grid to the image size? What are "size check"?

hunterhector · 2022-07-02T23:39:53Z

forte/data/ontology/top.py

+    @property
+    def cy(self) -> int:
+        """
+        Compute and return row index of the box center in the image array.


hunterhector · 2022-07-02T23:40:39Z

forte/data/ontology/top.py

-        return (self._cy, self._cx)
+    def box_center(self) -> Tuple[int, int]:
+        """
+        Get the box center by using the property function cy() and cx().


don't write implementation details in function docstring, users often won't care (and might get confused).

hunterhector · 2022-07-02T23:41:03Z

forte/data/ontology/top.py

        """
-        Get corners of box.
+        Compute and return the corners of the box.


What is a corner? Which corners? Shape of the tuple?

hunterhector · 2022-07-02T23:42:13Z

forte/data/ontology/top.py


    @property
-    def area(self):
+    def area(self) -> int:


Something like area should also exist in the upper classes

hunterhector · 2022-07-02T23:42:59Z

forte/data/ontology/top.py

+            True if the offset is set, False otherwise.
+        """
+        return self._cy_offset is not None and self._cx_offset is not None
+

 class BoundingBox(Box):


what differences would a boundingbox have comparing to a box?

Normally I don't think a box need be associated with a grid but it can be associated with grid anyway. A bounding box are much more often associated with a grid due to the CV algorithm.

hepengfe · 2022-07-12T18:17:55Z

Closed as created a new PR #876
Also, Box is kept as it is in master branch(no changes made).

Grids -> Grid

a27d779

hepengfe added topic: ontology All issues related to ontologies in core, top and base_ontology topic:cv labels Jun 7, 2022

hepengfe self-assigned this Jun 7, 2022

hepengfe added 5 commits June 7, 2022 16:06

initialize box center as image center when no indices are given (rela…

a195f51

…x the initialization parameter requirement)

Merge branch 'master' into grid_improve

1e494c8

Merge branch 'grid_improve' of https://github.com/feipenghe/forte int…

2547fbb

…o grid_improve

fix variable name

b366978

self._check_default_box_center -> self._check_box_center_status

b901387

fix pylint errors

ab63ad8

hepengfe requested review from hunterhector and mylibrar June 7, 2022 23:49

Merge branch 'master' into grid_improve

db52114

hepengfe closed this Jun 8, 2022

hepengfe reopened this Jun 8, 2022

hunterhector reviewed Jun 20, 2022

View reviewed changes

hepengfe added 8 commits June 20, 2022 08:35

move grid to from top.py to core.py

854c271

Merge branch 'asyml:master' into grid_improve

d7dbbc8

move grid from top.py to core.py as a data structure and remove grids…

e40e396

… list

update grids test

e238385

Merge branch 'grid_improve' of https://github.com/feipenghe/forte int…

fb8cf49

…o grid_improve

adjust grid tests

3bcc8ab

grid integration in DataStore

e759466

pylint

210c52c

hepengfe marked this pull request as ready for review June 22, 2022 22:43

hunterhector reviewed Jun 24, 2022

View reviewed changes

hepengfe added 10 commits June 29, 2022 17:21

resolve the merge conflict

7276830

add test cases for box out of the image

0007f78

Grids -> Grid

9588ac7

rm Grids

c3b2ba8

rm Grids in data store

c0b3d8c

'

add offset setter, getter and checker

7203b23

change ImagePayload adding operation

5fd59ca

pylint fix

83083fc

fix mypy and check conditions only in most basic box property (Box.cx…

3dacf0d

… with respector to Box.corners)

pylint

deb2a30

mylibrar reviewed Jul 1, 2022

View reviewed changes

hepengfe added 3 commits July 1, 2022 13:43

remove unused DataPack attributes

bfed96f

fix docstring and add upper bound in get_grid_cell_center

4023891

Fix docstring

5e8c483

mylibrar requested a review from hunterhector July 1, 2022 21:12

hepengfe added 2 commits July 1, 2022 15:11

add more examples for box

618a974

pylint

e7e0cc4

hunterhector reviewed Jul 2, 2022

View reviewed changes

hepengfe added 9 commits July 6, 2022 17:46

add more docstring

7d7d04a

remove unsued variables

416cbbc

add docstring for grid

90612b7

add specifications for units

d2b10c0

pylint fix

d49a319

Merge branch 'master' into grid_improve

286428f

adjust image payload adding

8deb0c9

mypy error

884cb6f

add more units

d48b693

hepengfe mentioned this pull request Jul 8, 2022

Ocr showcase #866

Closed

hepengfe closed this Jul 12, 2022

		@@ -1082,7 +1080,6 @@ def add_image_annotation_raw(
		def add_grid_raw(

Grids Ontology Improvement #827

Grids Ontology Improvement #827

Uh oh!

Conversation

hepengfe commented Jun 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of changes

Possible influences of this PR.

Test Conducted

Uh oh!

hepengfe commented Jun 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hepengfe commented Jun 8, 2022

Uh oh!

hunterhector commented Jun 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hunterhector left a comment

Choose a reason for hiding this comment

Uh oh!

hunterhector left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mylibrar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hunterhector left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hunterhector Jul 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

hepengfe commented Jun 7, 2022 •

edited

Loading

hepengfe commented Jun 7, 2022 •

edited

Loading

codecov bot commented Jun 7, 2022 •

edited

Loading

hunterhector commented Jun 8, 2022 •

edited

Loading

hunterhector Jul 6, 2022 •

edited

Loading