Skip to content
/ cam Public
forked from yegor256/cam

Classes and Metris (CaM): a dataset of Java classes from public open-source GitHub repositories

License

Notifications You must be signed in to change notification settings

rliskunov/cam

 
 

Repository files navigation

make License Docker Cloud Automated build

This is a dataset of open source Java classes and some metrics on them. Every now and then I make a new version of it using the scripts in this repository. You are welcome to use it in your researches. Each release has a fixed version. By referring to it in your research you avoid ambiguity and guarantees repeatability of your experiments.

The latest ZIP archive with the dataset is here: cam-2021-07-08.zip (387Mb). It is the result of the analysis of 1000 Java classes against eight metrics: lines of code, lines of comments, blank lines, NCSS, cyclomatic complexity, number of attributes, number of static attributes, number of constructors, number of methods, number of static methods.

If you want to create a new dataset, just run this and the entire dataset will be built (you need to have Docker installed), where 1000 is the number of repositories to fetch from GitHub:

$ docker run --rm -v "$(pwd):/w" -e "TOTAL=1000" -e "TARGET=/w/dataset" yegor256/cam

The dataset will be created in the ./dataset directory (may take some time, maybe a few days!), and a .zip archive will also be there.

You can also run it without Docker:

$ make TOTAL=100

Should work.

About

Classes and Metris (CaM): a dataset of Java classes from public open-source GitHub repositories

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Makefile 29.2%
  • Python 22.1%
  • Shell 22.0%
  • Ruby 10.6%
  • TeX 8.8%
  • Dockerfile 7.3%