Reproducibility -- include datasets from paper in repo #20

ulupo · 2021-08-06T08:30:55Z

I think it would be good to include the datasets we used in the benchmarks and reported on in the paper. Perhaps an extra folder benchmarks can be created with scripts that people can use to test performance, and at least the datasets we used in the paper? I think that would be a good service for people interested in the library and possibly in finding any remaining bottlenecks.

The text was updated successfully, but these errors were encountered:

MonkeyBreaker · 2021-08-06T08:43:50Z

I think it is a good idea, but I would not put it directly in main branch.
Maybe we could create a benchmark branch and put there all the data-sets we want, what do you think ?
The reason is that I think the data should not be directly present in the package or at least in the main branch of the package.

ulupo · 2021-08-06T08:48:12Z

I think it is a good idea, but I would not put it directly in main branch.

Hmm, I'm not sure I agree. Though I think I see why you say this ("only code in main"), including benchmarks scripts and data is the approach in scikit-learn for example: https://github.com/scikit-learn/scikit-learn (see benchmarks folder with scripts, and the datasets subpackage with the data itself). Additionally, the data would be saved as text files, not as binary.

MonkeyBreaker · 2021-08-06T09:26:30Z

I was not aware of scikit-learn practices, and because we try to follow them on giotto-tda, we should also follow them here.

Though I think I see why you say this ("only code in main")

Exactly, but it is more a matter of taste (mine in this case) 😛

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility -- include datasets from paper in repo #20

Reproducibility -- include datasets from paper in repo #20

ulupo commented Aug 6, 2021

MonkeyBreaker commented Aug 6, 2021

ulupo commented Aug 6, 2021 •

edited

Loading

MonkeyBreaker commented Aug 6, 2021

Reproducibility -- include datasets from paper in repo #20

Reproducibility -- include datasets from paper in repo #20

Comments

ulupo commented Aug 6, 2021

MonkeyBreaker commented Aug 6, 2021

ulupo commented Aug 6, 2021 • edited Loading

MonkeyBreaker commented Aug 6, 2021

ulupo commented Aug 6, 2021 •

edited

Loading