Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss of "iscrowd" annotations when converting COCO dataset to YOLOv8 dataset #12671

Closed
1 task done
astringfield opened this issue May 14, 2024 · 10 comments
Closed
1 task done
Labels
question Further information is requested

Comments

@astringfield
Copy link

Search before asking

Question

Hi,

I'm trying to convert a custom COCO-formatted dataset to YOLOv8 format so that I can train YOLOv8 using my custom data.

I've run the Ultralytics data conversion tool to convert from COCO to YOLOv8, but the output annotations don't contain any of the COCO annotations where iscrowd = True.

I checked out the ultralytics/data/converter.py code and indeed it does appear to skip any iscrowd annotations:

# @ line 287
for ann in anns:
    if ann.get("iscrowd", False):
        continue
    # The COCO box format is [top left x, top left y, width, height]
    box = np.array(ann["bbox"], dtype=np.float64)
    box[:2] += box[2:] / 2  # xy top-left corner to center
    ...

I've searched the YOLOv8 docs regarding the COCO dataset and haven't found any information on the handling of iscrowd pixiel-mask annotations.

Is there a way to include the pixel segmentation masks when converting to a YOLOv8 dataset?

Thanks!

Additional

No response

@astringfield astringfield added the question Further information is requested label May 14, 2024
Copy link

👋 Hello @astringfield, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

Ultralytics CI

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

Hi there!

The current implementation of the ultralytics/data/converter.py indeed skips over iscrowd annotations purposefully, as YOLO models typically don't utilize them due to the challenge they pose in individual object detection. However, if you are looking to include pixel segmentation masks for crowd instances in training, you might consider alternative approaches since iscrowd annotations are usually used for group labels and not for individual detections.

For instance, you could manually preprocess and separate out the iscrowd instances into a different dataset or mark them distinctively to train a segmentation model on those specific instances. Alternatively, you could use the Segment Anything Model (SAM) if your focus is on the segmentation aspect of the models Ultralytics provides.

Let me know if you need further guidance! Happy coding! 😊🚀

@astringfield
Copy link
Author

@glenn-jocher thank you! I appreciate your advice. I have a follow-up question if you wouldn't mind, please.

A few more details about my application:

  • Trying to train a model on a custom dataset for instance segmentation
  • In the dataset I'm trying to annotate things like people, vehicle, trees etc.

Below is a sample image from my dataset. It's feasible to annotate some individual tree instances, however, some of the tree objects in the image are tightly clustered with no clear border, and are difficult to annotate individually. I've outlined an example in the image below in orange.

Screenshot from 2024-05-15 10-58-42

This is where I planned to utilise pixel segmentation masks and make annotating these grouped objects more efficient.

Separating the grouped tree instances and individual tree instances into separate datasets as you suggested, I'm concerned I might "confuse" model training by having images where not all tree objects are labelled and achieve poor performance.

Do you have any recommendations on how these groups of trees could be annotated in a way that's compatible with the YOLOv8 format for instance segmentation?

I'd be grateful for any suggestions you have 🙂

@glenn-jocher
Copy link
Member

Hi there! 😊

Thank you for sharing more about your project. When dealing with clustered objects like trees where the boundaries aren't clear, one approach could be to annotate the entire cluster as a single object. This can be treated as a special class, say tree_cluster, alongside individual tree annotations. This way, you can teach your model to recognize both individual trees and tree clusters.

For instance segmentation with YOLOv8, you can continue using pixel segmentation masks which are quite effective for such scenarios. Here’s a snippet on how you might define these classes in your dataset annotations:

names:
  - tree  # individual trees
  - tree_cluster  # groups of trees

This approach should help maintain consistency in your annotations without confusing the model during training. Let me know if this helps or if you need more information! 🌳

@astringfield
Copy link
Author

astringfield commented May 16, 2024

@glenn-jocher thanks! The clustering idea is a fantastic suggestion and it'll apply to a few other classes in my application - I'll try it out.

I do have a couple of follow-up questions, please.

To first clarify the terminology I use:

  • Polygon segmentation mask - mask defined using polygon coordinates
  • Pixel segmentation mask - mask defined by "drawing" over pixels in an image, as is done often in semantic segmentation annotation

Q1.
You mentioned I could use pixel segmentation masks when working with YOLOv8 - is that still the case given the terminology above?

Q2.
Since the COCO format specifies instances as:

  • Polygon segmentations for iscrowd: False annotations; and
  • Only uses pixel segmentations for iscrowd: True annotations which are skipped by the YOLO converter,

how would I use pixel segmentation masks for objects in this dataset? Should I ensure all annotations are defined as polygons to make them compatible with the COCO->YOLO converter?

Q3.
Concretely, does YOLOv8 handle pixel segmentation masks (as I've defined them above), or only polygon segmentation masks?

Q4.
Regarding the YOLOv8 model pre-trained on the COCO dataset, I presume none of the "crowd" annotations were used in the training. Were COCO images discarded from training completely if they contained crowds, or were the individual instances from those images still used?

That's a lot of questions I know, I hope you don't mind!

Thanks again 🙂

@glenn-jocher
Copy link
Member

@astringfield hi there! 😄

Great to hear that the clustering idea resonates with your project! Regarding your questions:

Q1 & Q3: YOLOv8 primarily supports annotations in the form of bounding boxes and does not directly handle pixel segmentation masks. For traditional instance segmentation, polygons represented as masks (often converted to a specific bounding shape in YOLO) are more widely used. Pixel-wise segmentation masks are typically not utilized in the YOLO architecture.

Q2: For compatibility with the COCO->YOLO converter, you should indeed ensure all annotations are defined as polygons. This also aligns with YOLO's preference for geometric bounding annotations over pixel-wise masks.

Q4: In the training of YOLOv8 models on the COCO dataset, iscrowd: True annotations are typically skipped. The images themselves are not discarded; only the crowd annotations are ignored, and individual objects within those images are still used if annotated properly.

I hope this helps clarify your concerns! If you have any more questions or need further assistance, feel free to ask. Happy training! 🌲☀️

@astringfield
Copy link
Author

@glenn-jocher thank you for the detailed insights, the information you've provided is very helpful.

Regarding Q4 - with the iscrowd: True annotations skipped during training, I presume that sometimes training/validation images were fed into the model with instances of particular classes, say, people unlabelled.

Were there any issues during model training/validation/testing due to some images containing unlabelled instances? For example, did the model ever get "confused" (or perform poorly in some other way) because some of the images contained unlabelled person, bench, car, etc. instances?

@glenn-jocher
Copy link
Member

Hi there! 😊

I'm glad you found the information helpful! Regarding your question about the impact of skipping iscrowd: True annotations during training, it's a great point to consider.

In practice, skipping these annotations does mean that some objects like people in crowded scenes aren't labeled. However, YOLO models are generally robust to such scenarios. The training process is designed to handle a variety of data inconsistencies and still perform well. While there might be some instances where the model could potentially miss similar objects in dense areas, this is typically mitigated by the diversity and volume of the training data. The model learns to generalize well from the majority of the labeled data it sees.

If performance issues are noticed in specific scenarios, additional fine-tuning with more representative data of those scenarios can help improve the model's accuracy.

Let me know if you need more details or further assistance! Happy modeling! 🚀

@astringfield
Copy link
Author

@glenn-jocher That makes sense - It's great to know how crowd data can be excluded for efficiency without loss in model performance using suitable mitigation.

I really appreciate your time answering my questions - thanks again 😃

@glenn-jocher
Copy link
Member

@astringfield You're very welcome! I'm glad I could help clarify your queries. If you have any more questions down the line or need further assistance, feel free to reach out. Happy training! 😊🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants