Skip to content

​This repository provides code and resources for automating manufacturing feature recognition in CAD designs using vision-language models.

License

Notifications You must be signed in to change notification settings

Davidlequnchen/VLM-CADFeatureRecognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VLM-CADFeatureRecognition

Dataset

This repository contains the code and resources accompanying the paper "Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs". The study explores the application of vision-language models (VLMs) to automate the recognition of various manufacturing features in CAD designs without extensive training datasets or predefined rules.

Dataset Used: The dataset used in this study is available in the Dataset folder.

Authors: Chen Lequn, Muhammad Tayyab Khan

Overview

Automatic Feature Recognition (AFR) is crucial for converting design knowledge into actionable manufacturing information. Traditional AFR methods often rely on predefined geometric rules and large datasets, which can be time-consuming and may lack generalizability across different manufacturing features. This project investigates the use of VLMs, employing prompt engineering techniques such as multi-view query images, few-shot learning, sequential reasoning, and chain-of-thought, to recognize a wide range of manufacturing features in CAD designs.

Methodology Overview
Figure 1: Overview of VLM-based AFR approach

MFCAD-VLM Dataset

The study utilizes the MFCAD-VLM dataset, a comprehensive collection designed to advance research in CAD and AFR. The dataset includes:

  • STEP Files: CAD models in STEP format, representing various parts with distinct manufacturing features, categorized by complexity levels (easy, medium, and hard).
  • Ground Truth JSON Files: Expert-annotated JSON files corresponding to each STEP file, detailing manufacturing feature types, quantities, and specifics essential for accurate AFR assessment.
  • Multi-View Isometric Images: Three isometric-view snapshots per CAD model, generated via Python, capturing different viewing angles to aid feature recognition tasks.

Installation

Requirements

  • Python >= 3.9
  • PyTorch >= 1.8.0
  • CUDA-compatible GPU (recommended) with appropriate CUDA drivers installed

To set up the environment, follow these steps:

  1. Clone the repository:

    git clone https://github.com/Davidlequnchen/VLM-CADFeatureRecognition.git
    cd VLM-CADFeatureRecognition
  2. Create and activate a conda environment:

    conda create -n vlm_afr python=3.9
    conda activate vlm_afr
  3. Install pyocc using conda:

    conda install -c conda-forge pythonocc-core=7.8.1
  4. Install PyTorch:

    To check your CUDA version:

    nvidia-smi

    IMPORTANT: PyTorch installation depends on your CUDA version. Choose the appropriate command:

    For CUDA 12.1:

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

    For CUDA 11.8:

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

    For CPU only:

    pip install torch torchvision torchaudio
  5. Install other requirements:

    pip install -r requirements.txt

Usage

  1. Download the MFCAD-VLM Dataset:

    Access and download the dataset from Zenodo. Extract the contents to "Dataset" directory.

  2. Configure the Dataset Path:

    Set the path to the directory where the MFCAD-VLM dataset is located.

  3. Run Experiments:

    Execute the provided scripts to perform feature recognition tasks as described in the paper.

Citation

If you find this repository or the MFCAD-VLM dataset useful in your research, please cite the following paper:

@article{
  title={Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs},
  author={Khan, Muhammad Tayyab and Chen, Lequn and Ng, Ye Han and Feng, Wenhe and Tan, Nicholas Yew Jin and Moon, Seung Ki},
  journal={arXiv preprint arXiv:2411.02810},
  year={2024}
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

We acknowledge the support from the Singapore Institute of Manufacturing Technology (SIMTech), the Advanced Remanufacturing and Technology Centre (ARTC), and Nanyang Technological University (NTU).

About

​This repository provides code and resources for automating manufacturing feature recognition in CAD designs using vision-language models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •