|
1 | 1 | # 每日从arXiv中获取最新YOLO相关论文
|
2 | 2 |
|
3 | 3 |
|
| 4 | +## YoloTag: Vision\-based Robust UAV Navigation with Fiducial Markers |
| 5 | + |
| 6 | +**发布日期**:2024-09-03 |
| 7 | + |
| 8 | +**作者**:Sourav Raxit |
| 9 | + |
| 10 | +**摘要**:By harnessing fiducial markers as visual landmarks in the environment, |
| 11 | +Unmanned Aerial Vehicles \(UAVs\) can rapidly build precise maps and navigate |
| 12 | +spaces safely and efficiently, unlocking their potential for fluent |
| 13 | +collaboration and coexistence with humans. Existing fiducial marker methods |
| 14 | +rely on handcrafted feature extraction, which sacrifices accuracy. On the other |
| 15 | +hand, deep learning pipelines for marker detection fail to meet real\-time |
| 16 | +runtime constraints crucial for navigation applications. In this work, we |
| 17 | +propose YoloTag \\textemdash a real\-time fiducial marker\-based localization |
| 18 | +system. YoloTag uses a lightweight YOLO v8 object detector to accurately detect |
| 19 | +fiducial markers in images while meeting the runtime constraints needed for |
| 20 | +navigation. The detected markers are then used by an efficient |
| 21 | +perspective\-n\-point algorithm to estimate UAV states. However, this |
| 22 | +localization system introduces noise, causing instability in trajectory |
| 23 | +tracking. To suppress noise, we design a higher\-order Butterworth filter that |
| 24 | +effectively eliminates noise through frequency domain analysis. We evaluate our |
| 25 | +algorithm through real\-robot experiments in an indoor environment, comparing |
| 26 | +the trajectory tracking performance of our method against other approaches in |
| 27 | +terms of several distance metrics. |
| 28 | + |
| 29 | + |
| 30 | +**代码链接**:摘要中未找到代码链接。 |
| 31 | + |
| 32 | +**论文链接**:[阅读更多](http://arxiv.org/abs/2409.02334v1) |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | + |
| 37 | +## DS MYOLO: A Reliable Object Detector Based on SSMs for Driving Scenarios |
| 38 | + |
| 39 | +**发布日期**:2024-09-02 |
| 40 | + |
| 41 | +**作者**:Yang Li |
| 42 | + |
| 43 | +**摘要**:Accurate real\-time object detection enhances the safety of advanced |
| 44 | +driver\-assistance systems, making it an essential component in driving |
| 45 | +scenarios. With the rapid development of deep learning technology, CNN\-based |
| 46 | +YOLO real\-time object detectors have gained significant attention. However, the |
| 47 | +local focus of CNNs results in performance bottlenecks. To further enhance |
| 48 | +detector performance, researchers have introduced Transformer\-based |
| 49 | +self\-attention mechanisms to leverage global receptive fields, but their |
| 50 | +quadratic complexity incurs substantial computational costs. Recently, Mamba, |
| 51 | +with its linear complexity, has made significant progress through global |
| 52 | +selective scanning. Inspired by Mamba's outstanding performance, we propose a |
| 53 | +novel object detector: DS MYOLO. This detector captures global feature |
| 54 | +information through a simplified selective scanning fusion block \(SimVSS Block\) |
| 55 | +and effectively integrates the network's deep features. Additionally, we |
| 56 | +introduce an efficient channel attention convolution \(ECAConv\) that enhances |
| 57 | +cross\-channel feature interaction while maintaining low computational |
| 58 | +complexity. Extensive experiments on the CCTSDB 2021 and VLD\-45 driving |
| 59 | +scenarios datasets demonstrate that DS MYOLO exhibits significant potential and |
| 60 | +competitive advantage among similarly scaled YOLO series real\-time object |
| 61 | +detectors. |
| 62 | + |
| 63 | + |
| 64 | +**代码链接**:摘要中未找到代码链接。 |
| 65 | + |
| 66 | +**论文链接**:[阅读更多](http://arxiv.org/abs/2409.01093v1) |
| 67 | + |
| 68 | +--- |
| 69 | + |
| 70 | + |
| 71 | +## A method for detecting dead fish on large water surfaces based on improved YOLOv10 |
| 72 | + |
| 73 | +**发布日期**:2024-08-31 |
| 74 | + |
| 75 | +**作者**:Qingbin Tian |
| 76 | + |
| 77 | +**摘要**:Dead fish frequently appear on the water surface due to various factors. If |
| 78 | +not promptly detected and removed, these dead fish can cause significant issues |
| 79 | +such as water quality deterioration, ecosystem damage, and disease |
| 80 | +transmission. Consequently, it is imperative to develop rapid and effective |
| 81 | +detection methods to mitigate these challenges. Conventional methods for |
| 82 | +detecting dead fish are often constrained by manpower and time limitations, |
| 83 | +struggling to effectively manage the intricacies of aquatic environments. This |
| 84 | +paper proposes an end\-to\-end detection model built upon an enhanced YOLOv10 |
| 85 | +framework, designed specifically to swiftly and precisely detect deceased fish |
| 86 | +across extensive water surfaces.Key enhancements include: \(1\) Replacing |
| 87 | +YOLOv10's backbone network with FasterNet to reduce model complexity while |
| 88 | +maintaining high detection accuracy; \(2\) Improving feature fusion in the Neck |
| 89 | +section through enhanced connectivity methods and replacing the original C2f |
| 90 | +module with CSPStage modules; \(3\) Adding a compact target detection head to |
| 91 | +enhance the detection performance of smaller objects. Experimental results |
| 92 | +demonstrate significant improvements in P\(precision\), R\(recall\), and AP\(average |
| 93 | +precision\) compared to the baseline model YOLOv10n. Furthermore, our model |
| 94 | +outperforms other models in the YOLO series by significantly reducing model |
| 95 | +size and parameter count, while sustaining high inference speed and achieving |
| 96 | +optimal AP performance. The model facilitates rapid and accurate detection of |
| 97 | +dead fish in large\-scale aquaculture systems. Finally, through ablation |
| 98 | +experiments, we systematically analyze and assess the contribution of each |
| 99 | +model component to the overall system performance. |
| 100 | + |
| 101 | + |
| 102 | +**代码链接**:摘要中未找到代码链接。 |
| 103 | + |
| 104 | +**论文链接**:[阅读更多](http://arxiv.org/abs/2409.00388v1) |
| 105 | + |
| 106 | +--- |
| 107 | + |
| 108 | + |
4 | 109 | ## FA\-YOLO: Research On Efficient Feature Selection YOLO Improved Algorithm Based On FMDS and AGMF Modules
|
5 | 110 |
|
6 | 111 | **发布日期**:2024-08-29
|
@@ -226,109 +331,3 @@ and contrast.
|
226 | 331 |
|
227 | 332 | ---
|
228 | 333 |
|
229 |
| - |
230 |
| -## VFM\-Det: Towards High\-Performance Vehicle Detection via Large Foundation Models |
231 |
| - |
232 |
| -**发布日期**:2024-08-23 |
233 |
| - |
234 |
| -**作者**:Wentao Wu |
235 |
| - |
236 |
| -**摘要**:Existing vehicle detectors are usually obtained by training a typical |
237 |
| -detector \(e.g., YOLO, RCNN, DETR series\) on vehicle images based on a |
238 |
| -pre\-trained backbone \(e.g., ResNet, ViT\). Some researchers also exploit and |
239 |
| -enhance the detection performance using pre\-trained large foundation models. |
240 |
| -However, we think these detectors may only get sub\-optimal results because the |
241 |
| -large models they use are not specifically designed for vehicles. In addition, |
242 |
| -their results heavily rely on visual features, and seldom of they consider the |
243 |
| -alignment between the vehicle's semantic information and visual |
244 |
| -representations. In this work, we propose a new vehicle detection paradigm |
245 |
| -based on a pre\-trained foundation vehicle model \(VehicleMAE\) and a large |
246 |
| -language model \(T5\), termed VFM\-Det. It follows the region proposal\-based |
247 |
| -detection framework and the features of each proposal can be enhanced using |
248 |
| -VehicleMAE. More importantly, we propose a new VAtt2Vec module that predicts |
249 |
| -the vehicle semantic attributes of these proposals and transforms them into |
250 |
| -feature vectors to enhance the vision features via contrastive learning. |
251 |
| -Extensive experiments on three vehicle detection benchmark datasets thoroughly |
252 |
| -proved the effectiveness of our vehicle detector. Specifically, our model |
253 |
| -improves the baseline approach by $\+5.1\\%$, $\+6.2\\%$ on the $AP\_\{0.5\}$, |
254 |
| -$AP\_\{0.75\}$ metrics, respectively, on the Cityscapes dataset.The source code of |
255 |
| -this work will be released at https://github.com/Event\-AHU/VFM\-Det. |
256 |
| - |
257 |
| - |
258 |
| -**代码链接**:https://github.com/Event-AHU/VFM-Det. |
259 |
| - |
260 |
| -**论文链接**:[阅读更多](http://arxiv.org/abs/2408.13031v1) |
261 |
| - |
262 |
| ---- |
263 |
| - |
264 |
| - |
265 |
| -## Enhanced Parking Perception by Multi\-Task Fisheye Cross\-view Transformers |
266 |
| - |
267 |
| -**发布日期**:2024-08-22 |
268 |
| - |
269 |
| -**作者**:Antonyo Musabini |
270 |
| - |
271 |
| -**摘要**:Current parking area perception algorithms primarily focus on detecting |
272 |
| -vacant slots within a limited range, relying on error\-prone homographic |
273 |
| -projection for both labeling and inference. However, recent advancements in |
274 |
| -Advanced Driver Assistance System \(ADAS\) require interaction with end\-users |
275 |
| -through comprehensive and intelligent Human\-Machine Interfaces \(HMIs\). These |
276 |
| -interfaces should present a complete perception of the parking area going from |
277 |
| -distinguishing vacant slots' entry lines to the orientation of other parked |
278 |
| -vehicles. This paper introduces Multi\-Task Fisheye Cross View Transformers \(MT |
279 |
| -F\-CVT\), which leverages features from a four\-camera fisheye Surround\-view |
280 |
| -Camera System \(SVCS\) with multihead attentions to create a detailed Bird\-Eye |
281 |
| -View \(BEV\) grid feature map. Features are processed by both a segmentation |
282 |
| -decoder and a Polygon\-Yolo based object detection decoder for parking slots and |
283 |
| -vehicles. Trained on data labeled using LiDAR, MT F\-CVT positions objects |
284 |
| -within a 25m x 25m real open\-road scenes with an average error of only 20 cm. |
285 |
| -Our larger model achieves an F\-1 score of 0.89. Moreover the smaller model |
286 |
| -operates at 16 fps on an Nvidia Jetson Orin embedded board, with similar |
287 |
| -detection results to the larger one. MT F\-CVT demonstrates robust |
288 |
| -generalization capability across different vehicles and camera rig |
289 |
| -configurations. A demo video from an unseen vehicle and camera rig is available |
290 |
| -at: https://streamable.com/jjw54x. |
291 |
| - |
292 |
| - |
293 |
| -**代码链接**:https://streamable.com/jjw54x. |
294 |
| - |
295 |
| -**论文链接**:[阅读更多](http://arxiv.org/abs/2408.12575v1) |
296 |
| - |
297 |
| ---- |
298 |
| - |
299 |
| - |
300 |
| -## OVA\-DETR: Open Vocabulary Aerial Object Detection Using Image\-Text Alignment and Fusion |
301 |
| - |
302 |
| -**发布日期**:2024-08-22 |
303 |
| - |
304 |
| -**作者**:Guoting Wei |
305 |
| - |
306 |
| -**摘要**:Aerial object detection has been a hot topic for many years due to its wide |
307 |
| -application requirements. However, most existing approaches can only handle |
308 |
| -predefined categories, which limits their applicability for the open scenarios |
309 |
| -in real\-world. In this paper, we extend aerial object detection to open |
310 |
| -scenarios by exploiting the relationship between image and text, and propose |
311 |
| -OVA\-DETR, a high\-efficiency open\-vocabulary detector for aerial images. |
312 |
| -Specifically, based on the idea of image\-text alignment, we propose region\-text |
313 |
| -contrastive loss to replace the category regression loss in the traditional |
314 |
| -detection framework, which breaks the category limitation. Then, we propose |
315 |
| -Bidirectional Vision\-Language Fusion \(Bi\-VLF\), which includes a dual\-attention |
316 |
| -fusion encoder and a multi\-level text\-guided Fusion Decoder. The dual\-attention |
317 |
| -fusion encoder enhances the feature extraction process in the encoder part. The |
318 |
| -multi\-level text\-guided Fusion Decoder is designed to improve the detection |
319 |
| -ability for small objects, which frequently appear in aerial object detection |
320 |
| -scenarios. Experimental results on three widely used benchmark datasets show |
321 |
| -that our proposed method significantly improves the mAP and recall, while |
322 |
| -enjoying faster inference speed. For instance, in zero shot detection |
323 |
| -experiments on DIOR, the proposed OVA\-DETR outperforms DescReg and YOLO\-World |
324 |
| -by 37.4% and 33.1%, respectively, while achieving 87 FPS inference speed, which |
325 |
| -is 7.9x faster than DescReg and 3x faster than YOLO\-world. The code is |
326 |
| -available at https://github.com/GT\-Wei/OVA\-DETR. |
327 |
| - |
328 |
| - |
329 |
| -**代码链接**:https://github.com/GT-Wei/OVA-DETR. |
330 |
| - |
331 |
| -**论文链接**:[阅读更多](http://arxiv.org/abs/2408.12246v1) |
332 |
| - |
333 |
| ---- |
334 |
| - |
0 commit comments