diff --git a/README.md b/README.md index fb48d4d..8cb958b 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,178 @@ # 每日从arXiv中获取最新YOLO相关论文 +## ASMA: An Adaptive Safety Margin Algorithm for Vision\-Language Drone Navigation via Scene\-Aware Control Barrier Functions + +**发布日期**:2024-09-16 + +**作者**:Sourav Sanyal + +**摘要**:In the rapidly evolving field of vision\-language navigation \(VLN\), ensuring +robust safety mechanisms remains an open challenge. Control barrier functions +\(CBFs\) are efficient tools which guarantee safety by solving an optimal control +problem. In this work, we consider the case of a teleoperated drone in a VLN +setting, and add safety features by formulating a novel scene\-aware CBF using +ego\-centric observations obtained through an RGB\-D sensor. As a baseline, we +implement a vision\-language understanding module which uses the contrastive +language image pretraining \(CLIP\) model to query about a user\-specified \(in +natural language\) landmark. Using the YOLO \(You Only Look Once\) object +detector, the CLIP model is queried for verifying the cropped landmark, +triggering downstream navigation. To improve navigation safety of the baseline, +we propose ASMA \-\- an Adaptive Safety Margin Algorithm \-\- that crops the +drone's depth map for tracking moving object\(s\) to perform scene\-aware CBF +evaluation on\-the\-fly. By identifying potential risky observations from the +scene, ASMA enables real\-time adaptation to unpredictable environmental +conditions, ensuring optimal safety bounds on a VLN\-powered drone actions. +Using the robot operating system \(ROS\) middleware on a parrot bebop2 quadrotor +in the gazebo environment, ASMA offers 59.4% \- 61.8% increase in success rates +with insignificant 5.4% \- 8.2% increases in trajectory lengths compared to the +baseline CBF\-less VLN while recovering from unsafe situations. + + +**代码链接**:摘要中未找到代码链接。 + +**论文链接**:[阅读更多](http://arxiv.org/abs/2409.10283v1) + +--- + + +## Self\-Updating Vehicle Monitoring Framework Employing Distributed Acoustic Sensing towards Real\-World Settings + +**发布日期**:2024-09-16 + +**作者**:Xi Wang + +**摘要**:The recent emergence of Distributed Acoustic Sensing \(DAS\) technology has +facilitated the effective capture of traffic\-induced seismic data. The +traffic\-induced seismic wave is a prominent contributor to urban vibrations and +contain crucial information to advance urban exploration and governance. +However, identifying vehicular movements within massive noisy data poses a +significant challenge. In this study, we introduce a real\-time semi\-supervised +vehicle monitoring framework tailored to urban settings. It requires only a +small fraction of manual labels for initial training and exploits unlabeled +data for model improvement. Additionally, the framework can autonomously adapt +to newly collected unlabeled data. Before DAS data undergo object detection as +two\-dimensional images to preserve spatial information, we leveraged +comprehensive one\-dimensional signal preprocessing to mitigate noise. +Furthermore, we propose a novel prior loss that incorporates the shapes of +vehicular traces to track a single vehicle with varying speeds. To evaluate our +model, we conducted experiments with seismic data from the Stanford 2 DAS +Array. The results showed that our model outperformed the baseline model +Efficient Teacher and its supervised counterpart, YOLO \(You Only Look Once\), in +both accuracy and robustness. With only 35 labeled images, our model surpassed +YOLO's mAP 0.5:0.95 criterion by 18% and showed a 7% increase over Efficient +Teacher. We conducted comparative experiments with multiple update strategies +for self\-updating and identified an optimal approach. This approach surpasses +the performance of non\-overfitting training conducted with all data in a single +pass. + + +**代码链接**:摘要中未找到代码链接。 + +**论文链接**:[阅读更多](http://arxiv.org/abs/2409.10259v1) + +--- + + +## Tracking Virtual Meetings in the Wild: Re\-identification in Multi\-Participant Virtual Meetings + +**发布日期**:2024-09-15 + +**作者**:Oriel Perl + +**摘要**:In recent years, workplaces and educational institutes have widely adopted +virtual meeting platforms. This has led to a growing interest in analyzing and +extracting insights from these meetings, which requires effective detection and +tracking of unique individuals. In practice, there is no standardization in +video meetings recording layout, and how they are captured across the different +platforms and services. This, in turn, creates a challenge in acquiring this +data stream and analyzing it in a uniform fashion. Our approach provides a +solution to the most general form of video recording, usually consisting of a +grid of participants \(\\cref\{fig:videomeeting\}\) from a single video source with +no metadata on participant locations, while using the least amount of +constraints and assumptions as to how the data was acquired. Conventional +approaches often use YOLO models coupled with tracking algorithms, assuming +linear motion trajectories akin to that observed in CCTV footage. However, such +assumptions fall short in virtual meetings, where participant video feed window +can abruptly change location across the grid. In an organic video meeting +setting, participants frequently join and leave, leading to sudden, non\-linear +movements on the video grid. This disrupts optical flow\-based tracking methods +that depend on linear motion. Consequently, standard object detection and +tracking methods might mistakenly assign multiple participants to the same +tracker. In this paper, we introduce a novel approach to track and re\-identify +participants in remote video meetings, by utilizing the spatio\-temporal priors +arising from the data in our domain. This, in turn, increases tracking +capabilities compared to the use of general object tracking. Our approach +reduces the error rate by 95% on average compared to YOLO\-based tracking +methods as a baseline. + + +**代码链接**:摘要中未找到代码链接。 + +**论文链接**:[阅读更多](http://arxiv.org/abs/2409.09841v1) + +--- + + +## Stutter\-Solver: End\-to\-end Multi\-lingual Dysfluency Detection + +**发布日期**:2024-09-15 + +**作者**:Xuanru Zhou + +**摘要**:Current de\-facto dysfluency modeling methods utilize template matching +algorithms which are not generalizable to out\-of\-domain real\-world dysfluencies +across languages, and are not scalable with increasing amounts of training +data. To handle these problems, we propose Stutter\-Solver: an end\-to\-end +framework that detects dysfluency with accurate type and time transcription, +inspired by the YOLO object detection algorithm. Stutter\-Solver can handle +co\-dysfluencies and is a natural multi\-lingual dysfluency detector. To leverage +scalability and boost performance, we also introduce three novel dysfluency +corpora: VCTK\-Pro, VCTK\-Art, and AISHELL3\-Pro, simulating natural spoken +dysfluencies including repetition, block, missing, replacement, and +prolongation through articulatory\-encodec and TTS\-based methods. Our approach +achieves state\-of\-the\-art performance on all available dysfluency corpora. Code +and datasets are open\-sourced at https://github.com/eureka235/Stutter\-Solver + + +**代码链接**:https://github.com/eureka235/Stutter-Solver + +**论文链接**:[阅读更多](http://arxiv.org/abs/2409.09621v1) + +--- + + +## Self\-Prompting Polyp Segmentation in Colonoscopy using Hybrid Yolo\-SAM 2 Model + +**发布日期**:2024-09-14 + +**作者**:Mobina Mansoori + +**摘要**:Early diagnosis and treatment of polyps during colonoscopy are essential for +reducing the incidence and mortality of Colorectal Cancer \(CRC\). However, the +variability in polyp characteristics and the presence of artifacts in +colonoscopy images and videos pose significant challenges for accurate and +efficient polyp detection and segmentation. This paper presents a novel +approach to polyp segmentation by integrating the Segment Anything Model \(SAM +2\) with the YOLOv8 model. Our method leverages YOLOv8's bounding box +predictions to autonomously generate input prompts for SAM 2, thereby reducing +the need for manual annotations. We conducted exhaustive tests on five +benchmark colonoscopy image datasets and two colonoscopy video datasets, +demonstrating that our method exceeds state\-of\-the\-art models in both image and +video segmentation tasks. Notably, our approach achieves high segmentation +accuracy using only bounding box annotations, significantly reducing annotation +time and effort. This advancement holds promise for enhancing the efficiency +and scalability of polyp detection in clinical settings +https://github.com/sajjad\-sh33/YOLO\_SAM2. + + +**代码链接**:https://github.com/sajjad-sh33/YOLO_SAM2. + +**论文链接**:[阅读更多](http://arxiv.org/abs/2409.09484v1) + +--- + + ## Breaking reCAPTCHAv2 **发布日期**:2024-09-13 @@ -63,17 +235,17 @@ creation of guitar tabs from video recordings. **摘要**:Open\-vocabulary detection \(OVD\) aims to detect objects beyond a predefined set of categories. As a pioneering model incorporating the YOLO series into -OVD, YOLO\-World is well\-suited for scenarios prioritizing speed and -efficiency.However, its performance is hindered by its neck feature fusion -mechanism, which causes the quadratic complexity and the limited guided -receptive fields.To address these limitations, we present Mamba\-YOLO\-World, a -novel YOLO\-based OVD model employing the proposed MambaFusion Path Aggregation -Network \(MambaFusion\-PAN\) as its neck architecture. Specifically, we introduce -an innovative State Space Model\-based feature fusion mechanism consisting of a +OVD, YOLO\-World is well\-suited for scenarios prioritizing speed and efficiency. +However, its performance is hindered by its neck feature fusion mechanism, +which causes the quadratic complexity and the limited guided receptive fields. +To address these limitations, we present Mamba\-YOLO\-World, a novel YOLO\-based +OVD model employing the proposed MambaFusion Path Aggregation Network +\(MambaFusion\-PAN\) as its neck architecture. Specifically, we introduce an +innovative State Space Model\-based feature fusion mechanism consisting of a Parallel\-Guided Selective Scan algorithm and a Serial\-Guided Selective Scan algorithm with linear complexity and globally guided receptive fields. It leverages multi\-modal input sequences and mamba hidden states to guide the -selective scanning process.Experiments demonstrate that our model outperforms +selective scanning process. Experiments demonstrate that our model outperforms the original YOLO\-World on the COCO and LVIS benchmarks in both zero\-shot and fine\-tuning settings while maintaining comparable parameters and FLOPs. Additionally, it surpasses existing state\-of\-the\-art OVD methods with fewer @@ -82,7 +254,7 @@ parameters and FLOPs. **代码链接**:摘要中未找到代码链接。 -**论文链接**:[阅读更多](http://arxiv.org/abs/2409.08513v1) +**论文链接**:[阅读更多](http://arxiv.org/abs/2409.08513v2) --- @@ -150,189 +322,3 @@ advanced sensor fusion for improved navigation and collision avoidance. --- - -## A Semantic Segmentation Approach on Sweet Orange Leaf Diseases Detection Utilizing YOLO - -**发布日期**:2024-09-10 - -**作者**:Sabit Ahamed Preanto - -**摘要**:This research introduces an advanced method for diagnosing diseases in sweet -orange leaves by utilising advanced artificial intelligence models like YOLOv8 -. Due to their significance as a vital agricultural product, sweet oranges -encounter significant threats from a variety of diseases that harmfully affect -both their yield and quality. Conventional methods for disease detection -primarily depend on manual inspection which is ineffective and frequently leads -to errors, resulting in delayed treatment and increased financial losses. In -response to this challenge, the research utilized YOLOv8 , harnessing their -proficiencies in detecting objects and analyzing images. YOLOv8 is recognized -for its rapid and precise performance, while VIT is acknowledged for its -detailed feature extraction abilities. Impressively, during both the training -and validation stages, YOLOv8 exhibited a perfect accuracy of 80.4%, while VIT -achieved an accuracy of 99.12%, showcasing their potential to transform disease -detection in agriculture. The study comprehensively examined the practical -challenges related to the implementation of AI technologies in agriculture, -encompassing the computational demands and user accessibility, and offering -viable solutions for broader usage. Moreover, it underscores the environmental -considerations, particularly the potential for reduced pesticide usage, thereby -promoting sustainable farming and environmental conservation. These findings -provide encouraging insights into the application of AI in agriculture, -suggesting a transition towards more effective, sustainable, and -technologically advanced farming methods. This research not only highlights the -efficacy of YOLOv8 within a specific agricultural domain but also lays the -foundation for further studies that encompass a broader application in crop -management and sustainable agricultural practices. - - -**代码链接**:摘要中未找到代码链接。 - -**论文链接**:[阅读更多](http://arxiv.org/abs/2409.06671v1) - ---- - - -## An Attribute\-Enriched Dataset and Auto\-Annotated Pipeline for Open Detection - -**发布日期**:2024-09-10 - -**作者**:Pengfei Qi - -**摘要**:Detecting objects of interest through language often presents challenges, -particularly with objects that are uncommon or complex to describe, due to -perceptual discrepancies between automated models and human annotators. These -challenges highlight the need for comprehensive datasets that go beyond -standard object labels by incorporating detailed attribute descriptions. To -address this need, we introduce the Objects365\-Attr dataset, an extension of -the existing Objects365 dataset, distinguished by its attribute annotations. -This dataset reduces inconsistencies in object detection by integrating a broad -spectrum of attributes, including color, material, state, texture and tone. It -contains an extensive collection of 5.6M object\-level attribute descriptions, -meticulously annotated across 1.4M bounding boxes. Additionally, to validate -the dataset's effectiveness, we conduct a rigorous evaluation of YOLO\-World at -different scales, measuring their detection performance and demonstrating the -dataset's contribution to advancing object detection. - - -**代码链接**:摘要中未找到代码链接。 - -**论文链接**:[阅读更多](http://arxiv.org/abs/2409.06300v1) - ---- - - -## ALSS\-YOLO: An Adaptive Lightweight Channel Split and Shuffling Network for TIR Wildlife Detection in UAV Imagery - -**发布日期**:2024-09-10 - -**作者**:Ang He - -**摘要**:Unmanned aerial vehicles \(UAVs\) equipped with thermal infrared \(TIR\) cameras -play a crucial role in combating nocturnal wildlife poaching. However, TIR -images often face challenges such as jitter, and wildlife overlap, -necessitating UAVs to possess the capability to identify blurred and -overlapping small targets. Current traditional lightweight networks deployed on -UAVs struggle to extract features from blurry small targets. To address this -issue, we developed ALSS\-YOLO, an efficient and lightweight detector optimized -for TIR aerial images. Firstly, we propose a novel Adaptive Lightweight Channel -Split and Shuffling \(ALSS\) module. This module employs an adaptive channel -split strategy to optimize feature extraction and integrates a channel -shuffling mechanism to enhance information exchange between channels. This -improves the extraction of blurry features, crucial for handling jitter\-induced -blur and overlapping targets. Secondly, we developed a Lightweight Coordinate -Attention \(LCA\) module that employs adaptive pooling and grouped convolution to -integrate feature information across dimensions. This module ensures -lightweight operation while maintaining high detection precision and robustness -against jitter and target overlap. Additionally, we developed a single\-channel -focus module to aggregate the width and height information of each channel into -four\-dimensional channel fusion, which improves the feature representation -efficiency of infrared images. Finally, we modify the localization loss -function to emphasize the loss value associated with small objects to improve -localization accuracy. Extensive experiments on the BIRDSAI and ISOD TIR UAV -wildlife datasets show that ALSS\-YOLO achieves state\-of\-the\-art performance, -Our code is openly available at -https://github.com/helloworlder8/computer\_vision. - - -**代码链接**:https://github.com/helloworlder8/computer_vision. - -**论文链接**:[阅读更多](http://arxiv.org/abs/2409.06259v2) - ---- - - -## BFA\-YOLO: Balanced multiscale object detection network for multi\-view building facade attachments detection - -**发布日期**:2024-09-06 - -**作者**:Yangguang Chen - -**摘要**:Detection of building facade attachments such as doors, windows, balconies, -air conditioner units, billboards, and glass curtain walls plays a pivotal role -in numerous applications. Building facade attachments detection aids in -vbuilding information modeling \(BIM\) construction and meeting Level of Detail 3 -\(LOD3\) standards. Yet, it faces challenges like uneven object distribution, -small object detection difficulty, and background interference. To counter -these, we propose BFA\-YOLO, a model for detecting facade attachments in -multi\-view images. BFA\-YOLO incorporates three novel innovations: the Feature -Balanced Spindle Module \(FBSM\) for addressing uneven distribution, the Target -Dynamic Alignment Task Detection Head \(TDATH\) aimed at improving small object -detection, and the Position Memory Enhanced Self\-Attention Mechanism \(PMESA\) to -combat background interference, with each component specifically designed to -solve its corresponding challenge. Detection efficacy of deep network models -deeply depends on the dataset's characteristics. Existing open source datasets -related to building facades are limited by their single perspective, small -image pool, and incomplete category coverage. We propose a novel method for -building facade attachments detection dataset construction and construct the -BFA\-3D dataset for facade attachments detection. The BFA\-3D dataset features -multi\-view, accurate labels, diverse categories, and detailed classification. -BFA\-YOLO surpasses YOLOv8 by 1.8% and 2.9% in mAP@0.5 on the multi\-view BFA\-3D -and street\-view Facade\-WHU datasets, respectively. These results underscore -BFA\-YOLO's superior performance in detecting facade attachments. - - -**代码链接**:摘要中未找到代码链接。 - -**论文链接**:[阅读更多](http://arxiv.org/abs/2409.04025v1) - ---- - - -## YOLO\-CL cluster detection in the Rubin/LSST DC2 simulation - -**发布日期**:2024-09-05 - -**作者**:Kirill Grishin - -**摘要**:LSST will provide galaxy cluster catalogs up to z$\\sim$1 that can be used to -constrain cosmological models once their selection function is well\-understood. -We have applied the deep convolutional network YOLO for CLuster detection -\(YOLO\-CL\) to LSST simulations from the Dark Energy Science Collaboration Data -Challenge 2 \(DC2\), and characterized the LSST YOLO\-CL cluster selection -function. We have trained and validated the network on images from a hybrid -sample of \(1\) clusters observed in the Sloan Digital Sky Survey and detected -with the red\-sequence Matched\-filter Probabilistic Percolation, and \(2\) -simulated DC2 dark matter haloes with masses $M\_\{200c\} > 10^\{14\} M\_\{\\odot\}$. We -quantify the completeness and purity of the YOLO\-CL cluster catalog with -respect to DC2 haloes with $M\_\{200c\} > 10^\{14\} M\_\{\\odot\}$. The YOLO\-CL cluster -catalog is 100% and 94% complete for halo mass $M\_\{200c\} > 10^\{14.6\} M\_\{\\odot\}$ -at $0.2 10^\{14\} M\_\{\\odot\}$ and redshift $z \\lesssim 1$, -respectively, with only 6% false positive detections. All the false positive -detections are dark matter haloes with $ 10^\{13.4\} M\_\{\\odot\} \\lesssim M\_\{200c\} -\\lesssim 10^\{14\} M\_\{\\odot\}$. The YOLO\-CL selection function is almost flat with -respect to the halo mass at $0.2 \\lesssim z \\lesssim 0.9$. The overall -performance of YOLO\-CL is comparable or better than other cluster detection -methods used for current and future optical and infrared surveys. YOLO\-CL shows -better completeness for low mass clusters when compared to current detections -in surveys using the Sunyaev Zel'dovich effect, and detects clusters at higher -redshifts than X\-ray\-based catalogs. The strong advantage of YOLO\-CL over -traditional galaxy cluster detection techniques is that it works directly on -images and does not require photometric and photometric redshift catalogs, nor -does it need to mask stellar sources and artifacts. - - -**代码链接**:摘要中未找到代码链接。 - -**论文链接**:[阅读更多](http://arxiv.org/abs/2409.03333v1) - ---- -