Skip to content

angusleung100/Solidity-Contract-Vulnerability

Repository files navigation

Solidity Smart Contract Vulnerability Fine-tuning

About Project

The focus of this project is to fine-tune smaller language models on Ethereum smart contract vulnerabilities to demonstrate comprehension of Solidity syntax.

We want to compare different small code generation models to determine which models perform better in understanding Solidity syntax for vulnerability detection.

We wanted to do something that involved:

  • Determining if smart contract code has any vulnerabilities or not
  • Which light-weight AI models are best for Solidity syntax and understanding given Solidity code -> Benchmarking competing models in the shortlist

Shortlisted Models

Model choosing requirements:

  • Low GPU-usage AI models
  • Small models < 500 million parameters
Model Name Size
GraphCodeBERT 125M
CodeBERT 125M
CodeT5 60M-220M
PolyCoder 160M

Models Needed For Project

  • Smart Contract Code Vulnerability Detection

Datasets Used

Solidity Vulnerability Detection Model


The 'small-plain-text' was used and was filtered from 14k+ rows to 5k+ rows by merging with the Verified Smart Contracts dataset subset of address-license. Again, only Unlicense type source code was used. A total of ~5k rows were used for fine-tuning.

About

Determining model performance for Solidity syntax comprehension through vulnerability detection.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published