This repository contains the code and resources accompanying the paper "Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs". The study explores the application of vision-language models (VLMs) to automate the recognition of various manufacturing features in CAD designs without extensive training datasets or predefined rules.
Dataset Used: The dataset used in this study is available in the Dataset
folder.
Authors: Chen Lequn, Muhammad Tayyab Khan
Automatic Feature Recognition (AFR) is crucial for converting design knowledge into actionable manufacturing information. Traditional AFR methods often rely on predefined geometric rules and large datasets, which can be time-consuming and may lack generalizability across different manufacturing features. This project investigates the use of VLMs, employing prompt engineering techniques such as multi-view query images, few-shot learning, sequential reasoning, and chain-of-thought, to recognize a wide range of manufacturing features in CAD designs.
Figure 1: Overview of VLM-based AFR approach
The study utilizes the MFCAD-VLM dataset, a comprehensive collection designed to advance research in CAD and AFR. The dataset includes:
- STEP Files: CAD models in STEP format, representing various parts with distinct manufacturing features, categorized by complexity levels (easy, medium, and hard).
- Ground Truth JSON Files: Expert-annotated JSON files corresponding to each STEP file, detailing manufacturing feature types, quantities, and specifics essential for accurate AFR assessment.
- Multi-View Isometric Images: Three isometric-view snapshots per CAD model, generated via Python, capturing different viewing angles to aid feature recognition tasks.
- Python >= 3.9
- PyTorch >= 1.8.0
- CUDA-compatible GPU (recommended) with appropriate CUDA drivers installed
To set up the environment, follow these steps:
-
Clone the repository:
git clone https://github.com/Davidlequnchen/VLM-CADFeatureRecognition.git cd VLM-CADFeatureRecognition
-
Create and activate a conda environment:
conda create -n vlm_afr python=3.9 conda activate vlm_afr
-
Install pyocc using conda:
conda install -c conda-forge pythonocc-core=7.8.1
-
Install PyTorch:
To check your CUDA version:
nvidia-smi
IMPORTANT: PyTorch installation depends on your CUDA version. Choose the appropriate command:
For CUDA 12.1:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
For CUDA 11.8:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
For CPU only:
pip install torch torchvision torchaudio
-
Install other requirements:
pip install -r requirements.txt
-
Download the MFCAD-VLM Dataset:
Access and download the dataset from Zenodo. Extract the contents to
"Dataset"
directory. -
Configure the Dataset Path:
Set the path to the directory where the MFCAD-VLM dataset is located.
-
Run Experiments:
Execute the provided scripts to perform feature recognition tasks as described in the paper.
If you find this repository or the MFCAD-VLM dataset useful in your research, please cite the following paper:
@article{
title={Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs},
author={Khan, Muhammad Tayyab and Chen, Lequn and Ng, Ye Han and Feng, Wenhe and Tan, Nicholas Yew Jin and Moon, Seung Ki},
journal={arXiv preprint arXiv:2411.02810},
year={2024}
}
This project is licensed under the MIT License. See the LICENSE file for details.
We acknowledge the support from the Singapore Institute of Manufacturing Technology (SIMTech), the Advanced Remanufacturing and Technology Centre (ARTC), and Nanyang Technological University (NTU).