Skip to content

OxfordIHTM/seystats

Repository files navigation

seystats: Curating Seychelles data and statistics from publicly-available sources

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. seystats License for code License for text License for data test workflow DOI

This repository is a docker-containerised, {targets}-based, {renv}-enabled R workflow for the seystats project on curating Seychelles data and statistics from various publicly-available sources.

About the project

In 2022, the Seychelles-Oxford Partnership on research capacity building leveraged the vast research experience and skills of University of Oxford research partners for training and upskilling data analysts from the Ministry of Health (MOH) Seychelles.

From this process, the partnership identified that while the Seychelles is one of the few sub-Saharan African countries with efficient, accurate, and comprehensive data collection and disaggregation across a variety of sectors and across a variety of metrics, most of this data are not in the format, shape, and structure that are ready for analysis. A good amount of these rich data are still on paper or ledgers. For the data that are electronic, they are either stored/distributed in formats that are not readily readable by machines for analysis (e.g., portable document format or PDF) or are in proprietary spreadsheet format (i.e. Microsoft Excel) structured into presentational tables meant for reports rather than for actual analysis.

It is from this context that the initial ideas and motivation around this project began. The partnership involves mostly individuals whose roles and responsibilities were related to health. As such, ad hoc plans prioritised health-related data. During this time, rough and informal plans were drawn as to how the various steps will be implemented and how the different technologies required will be resourced. Alongside these, ongoing capacity-building on data management and analysis related to research continued within the partnership.

By 2025, three years on from the start of the partnership, very little has progressed and has been implemented from these informal, ad hoc plans whilst the partnership continued to more research capacity-building focusing on other types of research skills (e.g. qualitative research), on student placement projects for University of Oxford Masters students, and other research efforts (e.g. cancer screening, cancer awareness, cancer quality-of-care). During this period and in all these activities, the same challenges and issues related to data identified in 2022 keep propping up.

It is within this background that the seystats project is being (re-)launched. The current motivation is to try to get moving in a more productive direction on the ideas generated in 2022 and to be able to demonstrate the stated advantages of data that is accessible, persistent, and machine-readable/machine-actionable to catalysing research efforts in Seychelles.

Current sources of data

The project uses only officially-released publicly-available sources of data on Seychelles. Such data are primarily sourced from Seychelles government websites either as file downloads or data embedded onto the webpages themselves. Other sources are official government publications not released online.

Currently, the available datasets from seystats are from the Seychelles National Bureau of Statistics (NBS) which provides downloads of various official statistics for/about Seychelles. This current release includes data from the NBS Statistical Bulletin on Population and Vital Statistics.

Current available datasets

The currently available datasets from the seystats project are listed and described in the table below.

Description Time Interval Filename Data URL
Registered births by age of mother and birth order Yearly births_by_age_birth_order.csv file
Registered births by age of mother and mother’s district of residence Yearly births_by_age_district.csv file
Registered births by age of mother Yearly births_by_age.csv file
Registered births by birth order Yearly births_by_birth_order.csv file
Registered births by mother’s district of residence Yearly births_by_district.csv file
Registered births by month of birth registration Monthly births_by_month.csv file
Registered births by sex of child Yearly births_by_sex.csv file
Registered births total Yearly births_total.csv file
Registered deaths by age and sex Yearly deaths_by_age_sex.csv file
Registered deaths of infants total Yearly deaths_infant_total.csv file
Registered deaths total Yearly deaths_total.csv file
Population midyear by age and sex Yearly population_midyear_by_age_sex.csv file
Population midyear by age Yearly population_midyear_by_age.csv file
Population midyear by district of residence Yearly population_midyear_by_district.csv file
Population midyear total Yearly population_midyear_total.csv file

All available datasets can be found in the data folder of this repository. Other modes of distribution (e.g. Dolthub SQL database, Zenodo, Figshare, etc.) are currently in development and would be available soon.

Accessing the datasets

The datasets curated by seystats can be accessed through the following methods:

Forking and then cloning the project repository

Fork a copy of the project repository into your own GitHub account then clone your copy of the project repository into your local machine. This requires a GitHub account and knowledge of git processes. This approach will give you a copy of the entire repository of which the data directory contains all the datasets listed above. This approach would be ideal for those who would like to access the datasets but also would like to potentially contribute to the source code for the curation of the datasets.

Manually download from GitHub

Go to the project repository, then to the data directory and then select and click the dataset CSV file you want to download. On the upper right hand corner you will see an downward pointing arrow icon. Click on this icon to download the selected dataset CSV file.

Programmatically download from GitHub

Using the data URL indicated in the table above, one can programmatically download the dataset of interest using your choice of programming tool.

In Terminal:

curl -OL https://raw.githubusercontent.com/OxfordIHTM/seystats/refs/heads/main/data/births_by_age.csv

In R:

download.file(
  url = "https://raw.githubusercontent.com/OxfordIHTM/seystats/refs/heads/main/data/births_by_age.csv"
  destfile = "data/births_by_age.csv"
)

In the release version of seystats, we will be distributing the datasets in repositories and archives that have more straightforward user interfaces for downloading the datasets.

Repository Structure

The project repository is structured as follows:

seystats
    |-- .git-crypt/
    |-- .github/
    |-- auth/
    |-- data/
    |-- maps/
    |-- outputs/
    |-- pdf/
    |-- R/
    |-- renv
    |-- reports
    |-- schema/
    |-- _targets.R
    |-- .env
    |-- .gitattributes
    |-- .Rprofile
    |-- packages.R
    |-- renv.lock
  • .git-crypt/ contains git-crypt software specific files to manage encryption of specific files and folders in the repository.

  • .github contains project testing and automated deployment of outputs workflows via continuous integration and continuous deployment (CI/CD) using Github Actions.

  • auth contains encrypted authentication keys used in this workflow.

  • data/ contains comma-separated value (CSV) files of the various datasets curated by the project.

  • maps/ contains Seychelles map data files downloaded by the workflow.

  • outputs/ contains compiled reports and figures produced by the workflow.

  • pdf/ contains PDF files downloaded by the workflow for data extraction.

  • R/ contains R functions developed/created specifically for use in this project.

  • renv/ contains renv package specific files and directories used by the package for maintaining R package dependencies within the project. The directory renv/library, is a library that contains all packages currently used by the project. This directory, and all files and sub-directories within it, are all generated and managed by the renv package. Users should not change/edit these manually.

  • reports/ contains literate code for R Markdown and/or Quarto reports rendered in the workflow.

  • schema/ contains .sql code used for creating and deploying the project SQL database in DoltHub.

  • _targets.R file defines the steps in the workflow’s data ingest, data processing, data outputs, and reporting pipeline.

  • .env is an encrypted file that contains environment variables used in this project.

  • .gitattributes file contains information used by git-crypt to determine which files and/or folders in the repository to encrypt.

  • .Rprofile file is a project R profile generated when initiating renv for the first time. This file is run automatically every time R is run within this project, and renv uses it to configure the R session to use the renv project library.

  • packages.R file lists out and loads all R package dependencies required by the workflow.

  • renv.lock file is the renv lockfile which records enough metadata about every package used in this project that it can be re-installed on a new machine. This file is generated by the renv package and should not be changed/edited manually.

Reproducibility

R version

This project is built using R 4.5.1. To manage R versions, it is recommended to use rig - an R installation manager - to be able to install multiple versions of R and switch between them as needed.

R package dependencies

This project uses the {renv} framework to record R package dependencies and versions. Packages and versions used are recorded in renv.lock and code used to manage dependencies is in the renv directory and other files in the root project directory.

On starting an R session in the working directory of this repository, first run

renv::restore()

to install R package dependencies. This is only done once when the project is being initiated for the first time by a user.

Encryption

This project uses encrypted environment variables and authentication keys for data retrieval managed using git-crypt. Collaborators will need to install git-crypt and then provide their GPG key to the authors to be added as an authorised user within the repository. To get a GPG key, download and install GPG and then generate your GPG key pair. Then provide your GPG key id to the authors.

Once given permission into the project and GPG key id added to the repository, update your local version of the repository by doing a git pull and then unlock the encrypted files/folders of the repository by running the following command in Terminal from within the project directory:

git-crypt unlock

The encrypted components of the repository will now be decrypted and accessible for running the workflow (described below).

The workflow

The current workflow has the following steps:

graph LR
  style Graph fill:#FFFFFF00,stroke:#000000;
  subgraph Graph
    direction LR
    x39c53f3806f354bf["births_by_district_pages"]:::skipped --> xb5d471b223f71093["births_by_age"]:::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> xb5d471b223f71093["births_by_age"]:::skipped
    x98cd2c1d9abf872a["births_by_birth_order_pages"]:::skipped --> xfccd4550cd074700["births_by_age_birth_order"]:::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> xfccd4550cd074700["births_by_age_birth_order"]:::skipped
    xfccd4550cd074700["births_by_age_birth_order"]:::skipped --> x0c11dddedf200cf6(["births_by_age_birth_order_csv"]):::skipped
    xb5d471b223f71093["births_by_age"]:::skipped --> xbbabd51f8df64492(["births_by_age_csv"]):::skipped
    x39c53f3806f354bf["births_by_district_pages"]:::skipped --> x4094d4f6d0f8f35a["births_by_age_district"]:::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x4094d4f6d0f8f35a["births_by_age_district"]:::skipped
    x4094d4f6d0f8f35a["births_by_age_district"]:::skipped --> x11faadcda3280dc2(["births_by_age_district_csv"]):::skipped
    x98cd2c1d9abf872a["births_by_birth_order_pages"]:::skipped --> x8f0b9b44ecb1c0ea["births_by_birth_order"]:::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x8f0b9b44ecb1c0ea["births_by_birth_order"]:::skipped
    x8f0b9b44ecb1c0ea["births_by_birth_order"]:::skipped --> x516b8a2e1c1f1ca7(["births_by_birth_order_csv"]):::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x98cd2c1d9abf872a["births_by_birth_order_pages"]:::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x0c2c53d9eeb365ed["births_by_district"]:::skipped
    x39c53f3806f354bf["births_by_district_pages"]:::skipped --> x0c2c53d9eeb365ed["births_by_district"]:::skipped
    x0c2c53d9eeb365ed["births_by_district"]:::skipped --> x45f5b18e27a4d0fe(["births_by_district_csv"]):::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x39c53f3806f354bf["births_by_district_pages"]:::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x2c584c9caafc1be8["births_by_month"]:::skipped
    x5c7646da106bc2f6["births_endyear_monthly_pages"]:::skipped --> x2c584c9caafc1be8["births_by_month"]:::skipped
    x2c584c9caafc1be8["births_by_month"]:::skipped --> xa87bb9563f27e00c(["births_by_month_csv"]):::skipped
    x5c7646da106bc2f6["births_endyear_monthly_pages"]:::skipped --> x1e485a7f2384826f["births_by_sex"]:::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x1e485a7f2384826f["births_by_sex"]:::skipped
    x1e485a7f2384826f["births_by_sex"]:::skipped --> xa6b75ce42a7aa79e(["births_by_sex_csv"]):::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x5c7646da106bc2f6["births_endyear_monthly_pages"]:::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> xe648a7801cd2da2c["births_endyear_pages"]:::skipped
    xe648a7801cd2da2c["births_endyear_pages"]:::skipped --> x2f4500c2756065a9["births_total"]:::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x2f4500c2756065a9["births_total"]:::skipped
    x2f4500c2756065a9["births_total"]:::skipped --> xd95928afea598d8e(["births_total_csv"]):::skipped
    x344a2780ffaeb7bd["deaths_endyear_pages"]:::skipped --> xae333981c466810a["deaths_by_age_sex"]:::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> xae333981c466810a["deaths_by_age_sex"]:::skipped
    xae333981c466810a["deaths_by_age_sex"]:::skipped --> xaf8165dda7ea936b(["deaths_by_age_sex_csv"]):::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x344a2780ffaeb7bd["deaths_endyear_pages"]:::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x1d5c00f1e4e41ab7["deaths_infant_total"]:::skipped
    xe648a7801cd2da2c["births_endyear_pages"]:::skipped --> x1d5c00f1e4e41ab7["deaths_infant_total"]:::skipped
    x1d5c00f1e4e41ab7["deaths_infant_total"]:::skipped --> x548dbfa0844427dc(["deaths_infant_total_csv"]):::skipped
    xe648a7801cd2da2c["births_endyear_pages"]:::skipped --> xfd0ff0d9529d4bbd["deaths_total"]:::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> xfd0ff0d9529d4bbd["deaths_total"]:::skipped
    xfd0ff0d9529d4bbd["deaths_total"]:::skipped --> xe42488d3267f69ff(["deaths_total_csv"]):::skipped
    x8e509dc7997a12f8(["map_download_files"]):::skipped --> x4851b2941f7c62fc(["map_adm0"]):::skipped
    x8e509dc7997a12f8(["map_download_files"]):::skipped --> xccb26dd891c9a035(["map_adm1"]):::skipped
    x8e509dc7997a12f8(["map_download_files"]):::skipped --> x30d02f8bff8e7f8d(["map_adm2"]):::skipped
    x8e509dc7997a12f8(["map_download_files"]):::skipped --> x7b26bed1fc581742(["map_adm3"]):::skipped
    x90a781ac8daf46c6(["population_bulletin_download_links"]):::skipped --> x303dcb35f327bc97(["population_bulletin_download_files"]):::skipped
    xa7b0e3bf1c25b597(["categories_download_links"]):::skipped --> x90a781ac8daf46c6(["population_bulletin_download_links"]):::skipped
    x303dcb35f327bc97(["population_bulletin_download_files"]):::skipped --> xeea0ec396e5de5da(["population_endyear_bulletin_files"]):::skipped
    xeea0ec396e5de5da(["population_endyear_bulletin_files"]):::skipped --> x068156701a18b444["population_endyear_bulletin_text"]:::skipped
    x0d01604fe60472ef["population_midyear_bulletin_text"]:::skipped --> x9f6b6d2ed74a37b4["population_midyear_bulletin_district_pages"]:::skipped
    x303dcb35f327bc97(["population_bulletin_download_files"]):::skipped --> x52e965cb7c1cd1fc(["population_midyear_bulletin_files"]):::skipped
    x0d01604fe60472ef["population_midyear_bulletin_text"]:::skipped --> xefa1a60843915f9f["population_midyear_bulletin_pages"]:::skipped
    x52e965cb7c1cd1fc(["population_midyear_bulletin_files"]):::skipped --> x0d01604fe60472ef["population_midyear_bulletin_text"]:::skipped
    xefa1a60843915f9f["population_midyear_bulletin_pages"]:::skipped --> x946771421caaa150["population_midyear_by_age"]:::skipped
    x0d01604fe60472ef["population_midyear_bulletin_text"]:::skipped --> x946771421caaa150["population_midyear_by_age"]:::skipped
    x946771421caaa150["population_midyear_by_age"]:::skipped --> xa1686cc888963231(["population_midyear_by_age_csv"]):::skipped
    xefa1a60843915f9f["population_midyear_bulletin_pages"]:::skipped --> x426614807f974316["population_midyear_by_age_sex"]:::skipped
    x0d01604fe60472ef["population_midyear_bulletin_text"]:::skipped --> x426614807f974316["population_midyear_by_age_sex"]:::skipped
    x426614807f974316["population_midyear_by_age_sex"]:::skipped --> xf3039e66b37a1219(["population_midyear_by_age_sex_csv"]):::skipped
    x7b26bed1fc581742(["map_adm3"]):::skipped --> xd0c8e8b884ab8581["population_midyear_by_district"]:::skipped
    x9f6b6d2ed74a37b4["population_midyear_bulletin_district_pages"]:::skipped --> xd0c8e8b884ab8581["population_midyear_by_district"]:::skipped
    x0d01604fe60472ef["population_midyear_bulletin_text"]:::skipped --> xd0c8e8b884ab8581["population_midyear_by_district"]:::skipped
    xd0c8e8b884ab8581["population_midyear_by_district"]:::skipped --> xf94a6bce4cea6c14(["population_midyear_by_district_csv"]):::skipped
    x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x52d6a6119be69714["population_midyear_total"]:::skipped
    xe648a7801cd2da2c["births_endyear_pages"]:::skipped --> x52d6a6119be69714["population_midyear_total"]:::skipped
    x52d6a6119be69714["population_midyear_total"]:::skipped --> x37b834e41770c27f(["population_midyear_total_csv"]):::skipped
    
  end
Loading

To run the workflow, issue the following command in R from within the project directory

targets::tar_make()

or issue the following command in Terminal from within the project directory

Rscript -e  "targets::tar_make()"

Authors and contributors

Authors

  • Prof. Proochista Ariana
  • Dr. Aronrag Meeyai
  • Dr. Sylvie Pool
  • Dr. Sanjeev Pugazhendhi
  • Ituen Williams-Umanah
  • Ned Rosalie
  • Keddy Woodcock

The Seychelles-Oxford Partnership, from which this project came about, was made possible through the efforts of Prof. Proochista Ariana, Dr. Aronrag Meeyai, and Dr. Sylvie Pool. The original codebases from which this project was built on were written by Dr. Sanjeev Pugazhendhi, Ituen Williams-Umanah, Ned Rosalie, and Keddy Woodcock.

Contributors

  • Dr. Johanna Rapanarilala
  • Dr. Carine Asnong
  • Prof. Bruno Holthof
  • Dr. Giri Rajahram
  • Dr. Bushra Naz
  • Dr. Yih Seong Wong
  • Anita Makori
  • Dr. Jillian Francise Lee
  • Neira Budiono
  • Dr. Nyasha Manyeruke
  • Dr. Ibrahim Ajami

This project would also not be possible without the contributions of Dr. Johanna Rapanarilala who supervised and mentored several of the University of Oxford MSc in International Health and Tropical Medicine who came to Seychelles for their study placement for their MSc and Dr. Carine Asnong and Prof. Bruno Holthof who have contributed to teaching on research skills, data analysis, and healthcare management and quality of care for members of the Seychelles Ministry of Health. Finally, the feedback, insights, and learning gained from Dr. Giri Rajahram, Dr. Bushra Naz, Dr. Yih Seong Wong, Anita Makori, Dr. Jillian Francise Lee, Neira Budiono, Dr. Nyasha Manyeruke, and Dr. Ibrahim Ajami - students who spent their study placement in Seychelles - contributed to identifying critical and priority datasets to include in this project.

The project is currently maintained by Ernest Guevarra.

Disclaimer

This project is an independent effort by members of the Seychelles-Oxford Partnership in support of group and individual data needs and goals. This project is not a recognised project by the Seychelles government or of the ministries/agencies from which our Seychellois colleagues are affiliated with. Any issues or problems arising from the seystats datasets or from participating or contributing to the development of this project are the responsibility of the authors and maintainers of this project and should be addressed to them accordingly and not to the ministries/agencies/organisations from which the data has been made available from.

This effort is aimed to serve as an example of how data can be curated and managed in a manner that is effective, efficient, machine-actionable, and analysis-ready using widely-available and open source tools. Given the open source nature of this project, it is our hope that the project can either be handed over to relevant and official entities within Seychelles to continue its upkeep and maintenance or that this becomes a basis for official efforts within Seychelles to streamline and make more effective and efficient its data curation and management processes.

License

All code in this project is released under a GPL-3.0 license. All text in this project is released under a CC-BY-4.0 license. All data is released under a CC0 license.

Citation

If you use the data provided through seystats in your work/research, please cite seystats along with all the sources of data that were used for curating the data available herewith. The suggested appropriate citation metadata is provided in CITATION.cff.

Community guidelines

Feedback, bug reports and feature requests are welcome; file issues or seek support here. If you would like to contribute to the project, please see our contributing guidelines.

This project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

About

Curating Seychelles data and statistics from publicly-available sources

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages