A collection of dummy / public datasets for apps to use.
- atherosclerosis.db: Atherosclerosis is a longitudinal 20 years lasting primary preventive study of middle-aged men. CTU
- card_transactions.csv: 10,000 rows of fake card transaction data. Generated by Straive.
- craftbeer.db: Craft beers labeled by styles and composition. A separate dataset lists breweries by state. Kaggle
- data_scientist_jobstreet_scraped_v2.csv: from Data Scientist Job Dataset
- ehr.csv: 28 rows of electronic health records of patient demographics, co-morbidities, adherence, and clinical notes. Useful to explore preference for injectables vs pills. Generated by Straive.
- employee_data.csv: 2,000 rows of fake employee data generated by employee_data.py
- got_book1.csv: from game_of_thrones_dataset
- nba.db: A database with information about basketball matches from the National Basketball Association. Lists Players, Teams, and matches with action counts for each player. CTU
- supply_chain.csv: 10,000 rows of fake supply chain data. Generated by Straive.
- tourists.csv: 100 rows of supply chain data from a Fashion and Beauty startup chain of makeup products. Kaggle
- world.db: A database of 239 states and their cities. CTU
This repo hosts:
- Fake data. (Real data is usually in own sensible repo.)
- Medium-sized data. (GitHub isn't ideal for large data. Small data is too trivial to host.)
- Data used by multiple apps. (Single-app datasets should be in the app's repo.)