Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Mosaic augmentation #1147

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

i-aki-y
Copy link
Contributor

@i-aki-y i-aki-y commented Mar 18, 2022

About PR

In this PR, I implemented a mosaic augmentation used in YOLO[1, 2].

I appreciate any comment and suggetsion.

[1]: "YOLOv 4 : Optimal speed and accuracy of object detection.", https://arxiv.org/pdf/2004.10934.pdf
[2]: YOLOv5 https://github.com/ultralytics/yolov5

Demo

This is a reproducable example:

import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import skimage
import albumentations as A

## define helper funcs
def add_bbox(ax, bbox, encoder):
    label = 0    
    if len(bbox) > 4:
        bbox, label = bbox[:4], bbox[4]
        label = encoder[label]
    bbox_color = plt.get_cmap("tab10").colors[label]
    x_min, y_min, x_max, y_max = bbox
    w, h = x_max - x_min, y_max - y_min
    pat = Rectangle(xy=(x_min, y_min), width=w, height=h, fill=False, lw=3, color=bbox_color)
    ax.add_patch(pat)

def plot_image_and_bboxes(image, bboxes, encoder, ax):
    ax.imshow(image)
    for i in range(len(bboxes)):
        add_bbox(ax, bboxes[i], encoder)


## data setup
encoder = {"face": 0, "rocket": 1, "other": 2}
image_list = [skimage.data.astronaut(), skimage.data.cat(), skimage.data.coffee(), skimage.data.rocket()]
bboxes_list = [
    [[170, 30, 280, 180, "face"], [350, 80, 460, 290, "rocket"], [140, 350, 200, 420, "other"]],
    [[50, 0, 350, 280, "face"]],
    [[160, 15, 420, 210, "other"]],
    [[300, 120, 340, 420, "rocket"]],
]

## define pipeline
bbox_format = 'pascal_voc'
transform = A.Compose([
    A.Mosaic(height=2*512, width=2*512, shift_limit_x=0.0, shift_limit_y=0.0, replace=False, p=1.0, fill_value=114, bboxes_format=bbox_format),
    A.RandomResizedCrop(height=512, width=512, scale=(0.4, 1.0)),
], bbox_params=A.BboxParams(format=bbox_format))


## show input images
fig, axes = plt.subplots(2, 2, figsize=(6, 6))
axes = axes.flatten()

for i in range(len(image_list)):
    ax = axes[i]
    ax.set_title(f"input{i}")
    image = image_list[i]
    bboxes = bboxes_list[i]
    plot_image_and_bboxes(image, bboxes, encoder, ax)

plt.show()   
#plt.savefig("mosaic_input.jpg", bbox_inches='tight')

fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes = axes.flatten()
for ax in axes:
    data = transform(image=image_list[0], image_cache=image_list[1:], bboxes=bboxes_list[0], bboxes_cache=bboxes_list[1:])
    image = data["image"]
    bboxes = data["bboxes"]    
    plot_image_and_bboxes(image, bboxes, encoder, ax)

plt.show()       
#plt.savefig("mosaic_output.jpg", bbox_inches='tight')

Input

mosaic_input

Some Results

mosaic_output

Notes

Since current albumentations do not support multiple image sources, I introduced helper targets, image_cache, bboxes_cache, as additional data sources. The user needs to set additional images and bboxes these helper targets. So, it is up to users to decide how to prepare and manage multiple image data. This means that users can set all images to the image_cache when the user has sufficient memory, or the dataset is small. On the other hand, the user can read a small number of images for each iteration.

Note that this PR version does not support the labels_cache target. This means that the user should embed the label information inside the bounding boxes like [xmin, ymin, xmax, ymax, label] (when bboxes_format=pascal_voc).

Another limitation is that the Mosaic augmentation should be placed at the first transform of the pipeline like the above example. Because the transforms placed before the mosaic would be applied only to the image set to image target while the additional images set to image_cache are just ignored. This means images set to image and image_cache will have different augmentation histories. I think this is not what users expect. For example, with the following pipeline, Normalize and RandomResizedCrop will be applied only to the image, not any image_cache.

transform = A.Compose([
    A.Normalize(...),
    A.RandomResizedCrop(...),
    A.Mosaic(...),
    ...
])
data = transform(image=image, image_cache=image_cache)

I think this is not a serious limitation because users can prepare two pipelines and apply them separately if needed.

preprocess = A.Compose([    
    A.Normalize(...),
    A.RandomResizedCrop(...),
])

transform = A.Compose([
    A.Mosaic(...),
])

batch = [preprocess(image=image_batch[i], bboxes=bboxes_batch[i]) for i in range(n)]
image_batch = [data["image"] for data in batch]
bboxes_batch = [data["bboxes"] for data in batch]
data = transform(image=image, image_cache=image_cache, ...)

The same strategy can be applied for other multiple image augmentation like MixUp.
For example, I think a similar augmentation used in YOLO5 will be given in the following way.

mosaic = A.Compose([
    A.Mosaic(...),
    A.Affine(...),
    A.RandomResizedCrop(...),
])
mixup = A.Compose([
    A.MixUp(...),  # not included in this PR
])

mosaic1 = mosaic_aug(image=image1, image_cache=image_cache, bboxes=bboxes1, bboxes_cache=bboxes_cache)
mosaic2 = mosaic_aug(image=image2, image_cache=image_cache, bboxes=bboxes2, bboxes_cache=bboxes_cache)
mosaic_mixup = mixup_aug(image=mosaic1["image"], bboxes=mosaic1["bboxes"], image_cache=mosaic2["image"], bboxes_cache=mosaic2["bboxes"])

Implementation Notes

The target_dependence property is used

I used target_dependence property to pass the helper targets to the apply_xxx functions instead of get_params_dependent_on_targets.

This is because the returned values of get_parameters and get_params_dependent_on_targets will become the targets of serialization, which is a mechanism used for "replay". Since I think that these helper targets are not appropriate for serialization, I used the target_dependence property mechanism instead.

Mosaic center is fixed

YOLO5 implementation includes randomization of the mosaic center position.

https://github.com/ultralytics/yolov5/blob/7c6a33564a84a0e78ec19da66ea6016d51c32e0a/utils/datasets.py#L653

I excluded this feature from the PR version because the same effect can be obtained by applying RandomResizedCrop just after the Mosaic as the above demo example.

TODO

  • implement apply_to_keypoints
  • write tests
  • bboxes_cache preprocessing should be done in Compose if possible.

@Dipet Dipet added the WIP label Jun 11, 2022
@ternaus ternaus added the enhancement New feature or request label Jul 6, 2022
@mikel-brostrom
Copy link

Any updates on this?

@mikel-brostrom
Copy link

mikel-brostrom commented Feb 20, 2023

I tried this out together with some rotation augmentations and seems to work @i-aki-y .

proc

However, from time to time this error arise:

ValueError: y_max is less than or equal to y_min for bbox

when using COCO. Any idea how to fix this @i-aki-y?

Moreover, the structure of the repo must have changed since the PR was created as some refactoring was needed.

@mikel-brostrom
Copy link

mikel-brostrom commented Feb 20, 2023

Since current albumentations do not support multiple image sources, I introduced helper targets, image_cache, bboxes_cache, as additional data sources

Couldn't this be solved by using additional_targets like here?
This would allow the loaded images to be augmented in different ways before mosaic 🚀

@mikel-brostrom
Copy link

Setting the width and height for Mosaic requires some basic knowledge regarding the dataset your are working with as most part of the image could be left outside otherwise. Maybe this should be reflected in the docstrings as well. I guess that a good set of initial values could be the average width and height of the dataset you work with @i-aki-y ?

@i-aki-y
Copy link
Contributor Author

i-aki-y commented Feb 22, 2023

@mikel-brostrom Sorry for delaying and thank you for your feedback!

ValueError: y_max is less than or equal to y_min for bbox

This means that some bboxes have zero or minus heights.
Did you get this error only when you used mosaic transform?

Moreover, the structure of the repo must have changed since the PR was created as some refactoring was needed.

OK. I will check and update the PR.

Couldn't this be solved by using additional_targets like here?
This would allow the loaded images to be augmented in different ways before mosaic 🚀

I missed this feature! I will look into this if I can use this for this transform.

@mikel-brostrom
Copy link

I have it working locally so I could create pull request with what I have @i-aki-y . I also have MixUP working as you suggested:

mosaic1 = mosaic_aug(image=image1, image_cache=image_cache, bboxes=bboxes1, bboxes_cache=bboxes_cache)
mosaic2 = mosaic_aug(image=image2, image_cache=image_cache, bboxes=bboxes2, bboxes_cache=bboxes_cache)
mosaic_mixup = mixup_aug(image=mosaic1["image"], bboxes=mosaic1["bboxes"], image_cache=mosaic2["image"], bboxes_cache=mosaic2["bboxes"])

Maybe this should go into a separate PR?

@mikel-brostrom
Copy link

mikel-brostrom commented Feb 22, 2023

This means that some bboxes have zero or minus heights.
Did you get this error only when you used mosaic transform?

Yes. I had multiple augmentations in my augmentation stack but it always appeared during Mosiac. The case was that when the error emerged y_max was always equal to y_min. This couldn't be fixed by setting min_area in A.BboxParams btw...

@i-aki-y
Copy link
Contributor Author

i-aki-y commented Feb 22, 2023

@mikel-brostrom

Yes. I had multiple augmentations in my augmentation stack but it always appeared during Mosiac. The case was that when the error emerged y_max was always equal to y_min. This couldn't be fixed by setting min_area in A.BboxParams btw...

Thanks, I will investigate it.

I have it working locally so I could create pull request with what I have @i-aki-y . I also have MixUP working as you suggested:

Great!

Maybe this should go into a separate PR?

Yes.

@mikel-brostrom
Copy link

mikel-brostrom commented Feb 23, 2023

Feel free to check out my MixUp implementation here @i-aki-y . Any feedback is appreciated. It works nicely with this Mosaic implementation 😄. I am going for CutMix 🚀

@mikel-brostrom
Copy link

Btw, I have implemented this in a slightly different manner.

class Mosaic(DualTransform):
    def __init__(
        self,
        height,
        width,
        replace=True,
        fill_value=0,
        bboxes_format="coco",
        always_apply=False,
        p=0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)
        self.height = height
        self.width = width
        self.replace = replace
        self.fill_value = fill_value
        self.bboxes_format = bboxes_format
        self.images = []
        self.bboxes = []

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("height", "width", "replace", "fill_value", "bboxes_format")

    def apply(self, image, **params):
        return mosaic4(self.images, self.height, self.width, self.fill_value)

    def apply_to_keypoint(self, **params):
        pass  # TODO
    
    def apply_to_bbox(self, bbox, image_shape, position, height, width, **params):
        rows, cols = image_shape[:2]
        return bbox_mosaic4(bbox, rows, cols, position, height, width)
    
    def apply_to_bboxes(self, bboxes, **params):
        new_bboxes = []
        for i, (bbox, im) in enumerate(zip(self.bboxes, self.images)):
            im_shape = im.shape
            h, w, _ = im_shape
            for b in bbox:
                new_bbox = self.apply_to_bbox(b, im_shape, i, self.height, self.width)
                new_bboxes.append(new_bbox)
        return new_bboxes

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        self.images = [params['image'], params['image1'], params['image2'], params['image3']]
        self.bboxes = [params['bboxes'], params['bboxes1'], params['bboxes2'], params['bboxes3']]
        images_bboxes = list(zip(self.images, self.bboxes))
        random.shuffle(images_bboxes)
        self.images, self.bboxes = zip(*images_bboxes)
        return {}
        
    @property
    def targets_as_params(self) -> List[str]:
        return [
            "image", "image1", "image2", "image3",
            "bboxes", "bboxes1", "bboxes2", "bboxes3"
        ]

Trying to follow the recommended way of working with multiple images and bboxes. I however see that apply and apply_to_bbox is called equally many times as there are targets. Any ideas on how to circumvent this @i-aki-y ?

@i-aki-y
Copy link
Contributor Author

i-aki-y commented Feb 27, 2023

@mikel-brostrom

ValueError: y_max is less than or equal to y_min for bbox

I found some coco annotations have bboxes with height == 0.0.
This is the cause of the error.

import json
import pathlib
coco_annot_path = pathlib.Path("coco/annotations/instances_train2017.json")
with open(coco_annot_path) as f:
    coco_annots = json.load(f)
for item in coco_annots["annotations"]:
    x, y, w, h = item["bbox"]
    if w == 0 or h == 0:
        print(item)

> {'segmentation': [[296.65, 388.33, 296.65, 388.33, 297.68, 388.33, 297.68, 388.33]], 'area': 0.0, 'iscrowd': 0, 'image_id': 200365, 'bbox': [296.65, 388.33, 1.03, 0.0], 'category_id': 58, 'id': 918}
> {'segmentation': [[9.98, 188.56, 15.52, 188.56, 15.52, 188.56, 11.09, 188.56]], 'area': 0.0, 'iscrowd': 0, 'image_id': 550395, 'bbox': [9.98, 188.56, 5.54, 0.0], 'category_id': 1, 'id': 2206849}

@mikel-brostrom
Copy link

mikel-brostrom commented Feb 27, 2023

But this should be avoided by bbox_params=A.BboxParams(format='coco', min_area=1) right @i-aki-y? I tried this but didn't work for me

@i-aki-y
Copy link
Contributor Author

i-aki-y commented Feb 27, 2023

@mikel-brostrom No, the filters are applied in the post-processing, while the error occurs in pre-processing validation.

I think they have different purposes.
The filters are necessary because some transforms make bbox with zero or tiny areas by design.
But invalid data in the input suggest something was wrong in the previous process, which should be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants