Skip to content

A template for science projects. πŸ§‘β€πŸ”¬πŸ“Š

License

Notifications You must be signed in to change notification settings

tgoelles/cookiecutter_science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Cookiecutter for Science Projects

A cookiecutter template for science and data science projects that include data, code, and dissemination.

  • Optimized for data-based publications
  • Optimized for use with VS Code
  • Docker-based, version-controlled environment using VS Code Dev Containers
  • uv based environment inside the Dev Container
  • to add a package just follow to uv workflow: use the VS code terminal and to go the code folder and run: uv add pandas
  • use of Dev container Features with pre-installed, Python andLaTeX
  • Setup for use with Python but could also be addapted for Julia, and R
  • Make commands for: collecting data, generating, figures, typsetting latex, clean temp files, clean demo files
  • use of VS Code tasks to trigger data collection, plotting and paper compilation
  • LaTeX-based paper
  • Added path definitions in the project_package Python module
  • Kedro-inspired data folder structure
  • filled with a demo - which can be cleaned with "make delete_demo"
  • used in at least 5 papers

For more detailed information, please see the README of the resulting project.

Quick Start

cookiecutter https://github.com/tgoelles/cookiecutter_science

File Structure

β”œβ”€β”€ Makefile                        	    #  Automation script for common tasks
β”œβ”€β”€ README.md                       	    #  Project overview and instructions
β”œβ”€β”€ code                                   #  Python Source code and notebooks
β”‚Β Β  β”œβ”€β”€ notebooks                          #  Jupyter notebooks for analysis
β”‚Β Β  β”‚Β Β  └── exploratory                    #  Exploratory data analysis
β”‚Β Β  β”‚Β Β      └── 1.0-tg-example.ipynb       #  Example exploratory notebook
β”‚Β Β  └── project_package                    #  The project package where refined code goes
β”‚Β Β      β”œβ”€β”€ pyproject.toml                 #  project_package dependencies and configuration
β”‚Β Β      └── src                            #  Source code directory
β”‚Β Β          └── project_package      	    #
β”‚Β Β              β”œβ”€β”€ __init__.py            #
β”‚Β Β              β”œβ”€β”€ data                   #  Data processing module and scripts
β”‚Β Β              β”‚Β Β  β”œβ”€β”€ __init__.py        #
β”‚Β Β              β”‚Β Β  β”œβ”€β”€ config.py          #  Configuration settings
β”‚Β Β              β”‚Β Β  β”œβ”€β”€ example.py         #  Example script
β”‚Β Β              β”‚Β Β  β”œβ”€β”€ import_data.py     #  Data import functions
β”‚Β Β              β”‚Β Β  └── make_dataset.py    #  Dataset creation script, used by make data
β”‚Β Β              β”œβ”€β”€ tools                  #  Utility scripts
β”‚Β Β              β”‚Β Β  β”œβ”€β”€ __init__.py        #
β”‚Β Β              β”‚Β Β  └── convert_latex.py   #  LaTeX conversion script
β”‚Β Β              └── visualization          #  Visualization module and scripts
β”‚Β Β                  β”œβ”€β”€ __init__.py        #
β”‚Β Β                  β”œβ”€β”€ make_plots.py      #  Plot generation functions
β”‚Β Β                  └── visualize.py       #  Data visualization utilities
β”œβ”€β”€ data                                   #
β”‚Β Β  β”œβ”€β”€ 01_raw                             #  Raw data, do not change the data in there
β”‚Β Β  β”‚Β Β  └── demo.csv                       #  Example raw data file
β”‚Β Β  β”œβ”€β”€ 02_intermediate                    #  Processed but unrefined data
β”‚Β Β  β”‚Β Β  └── demo_clean.csv                 #  Example cleaned data file
β”‚Β Β  β”œβ”€β”€ 03_primary                         #  Primary processed datasets
β”‚Β Β  β”œβ”€β”€ 04_feature                         #  Feature-engineered datasets
β”‚Β Β  β”œβ”€β”€ 05_model_input                     #  Data ready for modeling
β”‚Β Β  β”œβ”€β”€ 06_models                          #  Trained models
β”‚Β Β  β”œβ”€β”€ 07_model_output                    #  Model predictions/results
β”‚Β Β  └── 08_reporting                       #  Reports and summaries
β”œβ”€β”€ dissemination                          #  Outputs for publication/presentation
β”‚Β Β  β”œβ”€β”€ figures                            #  Figures and plots go in here
β”‚Β Β  β”‚Β Β  └── demo.png                       #  Example figure
β”‚Β Β  β”œβ”€β”€ papers                             #  LaTeX desimition for paper or Thesis
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ paper.pdf                      #  Final paper output
β”‚Β Β  β”‚Β Β  └── paper.tex                      #  LaTeX source for the paper
β”‚Β Β  └── presentations                      #  Presentation slides and materials
β”œβ”€β”€ literature                             #  References and related work
β”‚Β Β  └── references.bib                     #  Bibliography file
β”œβ”€β”€ pyproject.toml                         #  All Project dependencie and tool settings, managed by uv
└── uv.lock                                #  Dependency lock file for reproducibility

Tasks

Use of VS Code tasks:

VS code Tasks

Requirements

  • Git: Should be part of your OS or install it here
  • GitHub account
  • GitHub CLI: Install from here
  • Docker Desktop: Install from here
  • VS Code: Install from here
  • VS Code Extension: Remote Development: Install from here
  • Cookiecutter Python package: Install like this:
pip install cookiecutter

For Mac users:

brew install cookiecutter

Getting Started

  1. Navigate to the folder where you want to create the project (on your local drive):

    cookiecutter https://github.com/tgoelles/cookiecutter_science
  2. Answer the questions prompted by cookiecutter.

  3. A new VS Code window will open automatically.

  4. Click "OK" to reopen the folder in a container (only asked the first time).

  5. Read the README.md in the generated project folder.

Git and GitHub

Cookiecutter can generate a GitHub repository for you. This initializes the git repo and pushes it to GitHub. You can then invite your team members to join the project.

  • Each team member works on their local version of the project, regularly committing and pushing changes.
  • Avoid working on the same folder over a network.

Note for Windows Users

If you want to use git inside the container (recommended), you need to clone the repo from WSL, as Windows may mess up the .git folder. Git inside the container uses the same .gitconfig as Windows, which is copied into the container.

Ensure user.email and user.name are set (in PowerShell):

git config --global user.name "your_name"
git config --global user.email "[email protected]"