Skip to content

Commit cb7b316

Browse files
committed
Update README.md with latest arXiv papers
1 parent baf5e1d commit cb7b316

File tree

1 file changed

+105
-106
lines changed

1 file changed

+105
-106
lines changed

README.md

Lines changed: 105 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,111 @@
11
# 每日从arXiv中获取最新YOLO相关论文
22

33

4+
## YoloTag: Vision\-based Robust UAV Navigation with Fiducial Markers
5+
6+
**发布日期**:2024-09-03
7+
8+
**作者**:Sourav Raxit
9+
10+
**摘要**:By harnessing fiducial markers as visual landmarks in the environment,
11+
Unmanned Aerial Vehicles \(UAVs\) can rapidly build precise maps and navigate
12+
spaces safely and efficiently, unlocking their potential for fluent
13+
collaboration and coexistence with humans. Existing fiducial marker methods
14+
rely on handcrafted feature extraction, which sacrifices accuracy. On the other
15+
hand, deep learning pipelines for marker detection fail to meet real\-time
16+
runtime constraints crucial for navigation applications. In this work, we
17+
propose YoloTag \\textemdash a real\-time fiducial marker\-based localization
18+
system. YoloTag uses a lightweight YOLO v8 object detector to accurately detect
19+
fiducial markers in images while meeting the runtime constraints needed for
20+
navigation. The detected markers are then used by an efficient
21+
perspective\-n\-point algorithm to estimate UAV states. However, this
22+
localization system introduces noise, causing instability in trajectory
23+
tracking. To suppress noise, we design a higher\-order Butterworth filter that
24+
effectively eliminates noise through frequency domain analysis. We evaluate our
25+
algorithm through real\-robot experiments in an indoor environment, comparing
26+
the trajectory tracking performance of our method against other approaches in
27+
terms of several distance metrics.
28+
29+
30+
**代码链接**:摘要中未找到代码链接。
31+
32+
**论文链接**[阅读更多](http://arxiv.org/abs/2409.02334v1)
33+
34+
---
35+
36+
37+
## DS MYOLO: A Reliable Object Detector Based on SSMs for Driving Scenarios
38+
39+
**发布日期**:2024-09-02
40+
41+
**作者**:Yang Li
42+
43+
**摘要**:Accurate real\-time object detection enhances the safety of advanced
44+
driver\-assistance systems, making it an essential component in driving
45+
scenarios. With the rapid development of deep learning technology, CNN\-based
46+
YOLO real\-time object detectors have gained significant attention. However, the
47+
local focus of CNNs results in performance bottlenecks. To further enhance
48+
detector performance, researchers have introduced Transformer\-based
49+
self\-attention mechanisms to leverage global receptive fields, but their
50+
quadratic complexity incurs substantial computational costs. Recently, Mamba,
51+
with its linear complexity, has made significant progress through global
52+
selective scanning. Inspired by Mamba's outstanding performance, we propose a
53+
novel object detector: DS MYOLO. This detector captures global feature
54+
information through a simplified selective scanning fusion block \(SimVSS Block\)
55+
and effectively integrates the network's deep features. Additionally, we
56+
introduce an efficient channel attention convolution \(ECAConv\) that enhances
57+
cross\-channel feature interaction while maintaining low computational
58+
complexity. Extensive experiments on the CCTSDB 2021 and VLD\-45 driving
59+
scenarios datasets demonstrate that DS MYOLO exhibits significant potential and
60+
competitive advantage among similarly scaled YOLO series real\-time object
61+
detectors.
62+
63+
64+
**代码链接**:摘要中未找到代码链接。
65+
66+
**论文链接**[阅读更多](http://arxiv.org/abs/2409.01093v1)
67+
68+
---
69+
70+
71+
## A method for detecting dead fish on large water surfaces based on improved YOLOv10
72+
73+
**发布日期**:2024-08-31
74+
75+
**作者**:Qingbin Tian
76+
77+
**摘要**:Dead fish frequently appear on the water surface due to various factors. If
78+
not promptly detected and removed, these dead fish can cause significant issues
79+
such as water quality deterioration, ecosystem damage, and disease
80+
transmission. Consequently, it is imperative to develop rapid and effective
81+
detection methods to mitigate these challenges. Conventional methods for
82+
detecting dead fish are often constrained by manpower and time limitations,
83+
struggling to effectively manage the intricacies of aquatic environments. This
84+
paper proposes an end\-to\-end detection model built upon an enhanced YOLOv10
85+
framework, designed specifically to swiftly and precisely detect deceased fish
86+
across extensive water surfaces.Key enhancements include: \(1\) Replacing
87+
YOLOv10's backbone network with FasterNet to reduce model complexity while
88+
maintaining high detection accuracy; \(2\) Improving feature fusion in the Neck
89+
section through enhanced connectivity methods and replacing the original C2f
90+
module with CSPStage modules; \(3\) Adding a compact target detection head to
91+
enhance the detection performance of smaller objects. Experimental results
92+
demonstrate significant improvements in P\(precision\), R\(recall\), and AP\(average
93+
precision\) compared to the baseline model YOLOv10n. Furthermore, our model
94+
outperforms other models in the YOLO series by significantly reducing model
95+
size and parameter count, while sustaining high inference speed and achieving
96+
optimal AP performance. The model facilitates rapid and accurate detection of
97+
dead fish in large\-scale aquaculture systems. Finally, through ablation
98+
experiments, we systematically analyze and assess the contribution of each
99+
model component to the overall system performance.
100+
101+
102+
**代码链接**:摘要中未找到代码链接。
103+
104+
**论文链接**[阅读更多](http://arxiv.org/abs/2409.00388v1)
105+
106+
---
107+
108+
4109
## FA\-YOLO: Research On Efficient Feature Selection YOLO Improved Algorithm Based On FMDS and AGMF Modules
5110

6111
**发布日期**:2024-08-29
@@ -226,109 +331,3 @@ and contrast.
226331

227332
---
228333

229-
230-
## VFM\-Det: Towards High\-Performance Vehicle Detection via Large Foundation Models
231-
232-
**发布日期**:2024-08-23
233-
234-
**作者**:Wentao Wu
235-
236-
**摘要**:Existing vehicle detectors are usually obtained by training a typical
237-
detector \(e.g., YOLO, RCNN, DETR series\) on vehicle images based on a
238-
pre\-trained backbone \(e.g., ResNet, ViT\). Some researchers also exploit and
239-
enhance the detection performance using pre\-trained large foundation models.
240-
However, we think these detectors may only get sub\-optimal results because the
241-
large models they use are not specifically designed for vehicles. In addition,
242-
their results heavily rely on visual features, and seldom of they consider the
243-
alignment between the vehicle's semantic information and visual
244-
representations. In this work, we propose a new vehicle detection paradigm
245-
based on a pre\-trained foundation vehicle model \(VehicleMAE\) and a large
246-
language model \(T5\), termed VFM\-Det. It follows the region proposal\-based
247-
detection framework and the features of each proposal can be enhanced using
248-
VehicleMAE. More importantly, we propose a new VAtt2Vec module that predicts
249-
the vehicle semantic attributes of these proposals and transforms them into
250-
feature vectors to enhance the vision features via contrastive learning.
251-
Extensive experiments on three vehicle detection benchmark datasets thoroughly
252-
proved the effectiveness of our vehicle detector. Specifically, our model
253-
improves the baseline approach by $\+5.1\\%$, $\+6.2\\%$ on the $AP\_\{0.5\}$,
254-
$AP\_\{0.75\}$ metrics, respectively, on the Cityscapes dataset.The source code of
255-
this work will be released at https://github.com/Event\-AHU/VFM\-Det.
256-
257-
258-
**代码链接**https://github.com/Event-AHU/VFM-Det.
259-
260-
**论文链接**[阅读更多](http://arxiv.org/abs/2408.13031v1)
261-
262-
---
263-
264-
265-
## Enhanced Parking Perception by Multi\-Task Fisheye Cross\-view Transformers
266-
267-
**发布日期**:2024-08-22
268-
269-
**作者**:Antonyo Musabini
270-
271-
**摘要**:Current parking area perception algorithms primarily focus on detecting
272-
vacant slots within a limited range, relying on error\-prone homographic
273-
projection for both labeling and inference. However, recent advancements in
274-
Advanced Driver Assistance System \(ADAS\) require interaction with end\-users
275-
through comprehensive and intelligent Human\-Machine Interfaces \(HMIs\). These
276-
interfaces should present a complete perception of the parking area going from
277-
distinguishing vacant slots' entry lines to the orientation of other parked
278-
vehicles. This paper introduces Multi\-Task Fisheye Cross View Transformers \(MT
279-
F\-CVT\), which leverages features from a four\-camera fisheye Surround\-view
280-
Camera System \(SVCS\) with multihead attentions to create a detailed Bird\-Eye
281-
View \(BEV\) grid feature map. Features are processed by both a segmentation
282-
decoder and a Polygon\-Yolo based object detection decoder for parking slots and
283-
vehicles. Trained on data labeled using LiDAR, MT F\-CVT positions objects
284-
within a 25m x 25m real open\-road scenes with an average error of only 20 cm.
285-
Our larger model achieves an F\-1 score of 0.89. Moreover the smaller model
286-
operates at 16 fps on an Nvidia Jetson Orin embedded board, with similar
287-
detection results to the larger one. MT F\-CVT demonstrates robust
288-
generalization capability across different vehicles and camera rig
289-
configurations. A demo video from an unseen vehicle and camera rig is available
290-
at: https://streamable.com/jjw54x.
291-
292-
293-
**代码链接**https://streamable.com/jjw54x.
294-
295-
**论文链接**[阅读更多](http://arxiv.org/abs/2408.12575v1)
296-
297-
---
298-
299-
300-
## OVA\-DETR: Open Vocabulary Aerial Object Detection Using Image\-Text Alignment and Fusion
301-
302-
**发布日期**:2024-08-22
303-
304-
**作者**:Guoting Wei
305-
306-
**摘要**:Aerial object detection has been a hot topic for many years due to its wide
307-
application requirements. However, most existing approaches can only handle
308-
predefined categories, which limits their applicability for the open scenarios
309-
in real\-world. In this paper, we extend aerial object detection to open
310-
scenarios by exploiting the relationship between image and text, and propose
311-
OVA\-DETR, a high\-efficiency open\-vocabulary detector for aerial images.
312-
Specifically, based on the idea of image\-text alignment, we propose region\-text
313-
contrastive loss to replace the category regression loss in the traditional
314-
detection framework, which breaks the category limitation. Then, we propose
315-
Bidirectional Vision\-Language Fusion \(Bi\-VLF\), which includes a dual\-attention
316-
fusion encoder and a multi\-level text\-guided Fusion Decoder. The dual\-attention
317-
fusion encoder enhances the feature extraction process in the encoder part. The
318-
multi\-level text\-guided Fusion Decoder is designed to improve the detection
319-
ability for small objects, which frequently appear in aerial object detection
320-
scenarios. Experimental results on three widely used benchmark datasets show
321-
that our proposed method significantly improves the mAP and recall, while
322-
enjoying faster inference speed. For instance, in zero shot detection
323-
experiments on DIOR, the proposed OVA\-DETR outperforms DescReg and YOLO\-World
324-
by 37.4% and 33.1%, respectively, while achieving 87 FPS inference speed, which
325-
is 7.9x faster than DescReg and 3x faster than YOLO\-world. The code is
326-
available at https://github.com/GT\-Wei/OVA\-DETR.
327-
328-
329-
**代码链接**https://github.com/GT-Wei/OVA-DETR.
330-
331-
**论文链接**[阅读更多](http://arxiv.org/abs/2408.12246v1)
332-
333-
---
334-

0 commit comments

Comments
 (0)