GitHub - AgrDataSci/template-repo-data-analysis: Reusable, project-oriented data analysis template designed to align with the FAIR principles. The template offers a structured and scalable approach for managing scientific data and code, particularly suited for collaborative and open science environments.

A FAIR and project-oriented template for open science data workflows

Kauê de Sousa, Marie-Angélique Laporte

Here we introduce a reusable, project-oriented data analysis template designed to align with the FAIR principles (Findable, Accessible, Interoperable, and Reusable). The template offers a structured and scalable approach for managing scientific data and code, particularly suited for collaborative and open science environments. It integrates good practices from R-based workflows, GitHub–Zenodo integration, and metadata standards including DataCite and CFF.

Objectives

The template aims to:

Improve consistency and reproducibility across research projects.
Ensure research outputs are FAIR and ready for long-term archiving.
Provide a ready-to-use metadata structure for data and code publication.
Support automated workflows and interoperability with platforms like Zenodo, GitHub, and institutional repositories.

Repository structure

The project follows a project-oriented layout inspired by best practices in the R community.

template-repo-data-analysis/
├── data/                                       # Anonimized raw and cleaned datasets
├── docs/                                       # Reports or additional documentation
├── metadata/                                   # Metadata files and templates
│   ├── project-metadata.xlsx                   # Excel file with project metadata
│   ├── project-metadata.json                   # JSON file with project metadata for DataCite
│   ├── example-metadata-data-mip-uganda.csv    # Description of the dataset used as example
│   └── README.md    
├── output/                                     # Model results, figures, tables 
├── script/                                     # Scripts for validation, metadata generation, etc.
├── .gitignore                                  # Indicates which files or folders to exclude from version control
├── LICENSE                                     # A valid license file stablishing the rights to use the data
├── CITATION.cff                                # Used by GitHub and Zenodo to generate citation metadata
├── template-repo-data-analysis.Rproj           # RStudio file to set up the environment (must be renamed)
└── README.md                                   # Project overview

Best practices incorporated

Project-oriented structure.
FAIR-compliant metadata with Zenodo and DataCite standards.
Automation via R scripts for JSON conversion and validation.
Citation support with CITATION.cff file.
Clear separation of raw data, outputs, scripts, and documentation.
GitHub–Zenodo integration for DOI minting.

Metadata management

The metadata/ folder contains:

metadata.xlsx: Main spreadsheet with sheets for general metadata, authors, funders, dates, and communities.
metadata.json: DataCite-compliant metadata for institutional or repository submission.
CITATION.cff: Used by GitHub and Zenodo to generate citation metadata (it must be placed on the main root).
README.md: Guide to using the metadata tools.

How to use this template

Clone or fork the repository and open it in RStudio.
Complete metadata.xlsx using the provided structure.
Add your data
Run your analysis
Run the R scripts from the script/ folder to:
- Validate metadata
- Convert to JSON
Publish to Zenodo:
- Enable the GitHub repo in your Zenodo GitHub settings.
- Create a GitHub release. Zenodo will use the CITATION.cff to generate the DOI metadata.

Reusability and extensions

This template is suitable for:

Research projects requiring data publication.
Collaborative projects across CGIAR and university partners.
Students and researchers new to reproducible data workflows.

The structure is extensible and can support:

Additional metadata schemas and controled vocabularies (e.g., Dublin Core, DCAT)
Workflow automation using GitHub Actions or R scripts
Integration with institutional data catalogs

Contribute

This template is open for improvement! You can:

Suggest edits via GitHub Issues
Fork the repository for your own project
Contribute back improvements via pull request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A FAIR and project-oriented template for open science data workflows

Objectives

Repository structure

Best practices incorporated

Metadata management

How to use this template

Reusability and extensions

Contribute

References

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
data		data
docs		docs
metadata		metadata
output		output
script		script
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
template-repo-data-analysis.Rproj		template-repo-data-analysis.Rproj

License

AgrDataSci/template-repo-data-analysis

Folders and files

Latest commit

History

Repository files navigation

A FAIR and project-oriented template for open science data workflows

Objectives

Repository structure

Best practices incorporated

Metadata management

How to use this template

Reusability and extensions

Contribute

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages