This repository is a
docker-containerised,
{targets}-based,
{renv}-enabled
R workflow for the seystats project
on curating Seychelles data and statistics from various
publicly-available sources.
In 2022, the Seychelles-Oxford Partnership on research capacity building leveraged the vast research experience and skills of University of Oxford research partners for training and upskilling data analysts from the Ministry of Health (MOH) Seychelles.
From this process, the partnership identified that while the Seychelles is one of the few sub-Saharan African countries with efficient, accurate, and comprehensive data collection and disaggregation across a variety of sectors and across a variety of metrics, most of this data are not in the format, shape, and structure that are ready for analysis. A good amount of these rich data are still on paper or ledgers. For the data that are electronic, they are either stored/distributed in formats that are not readily readable by machines for analysis (e.g., portable document format or PDF) or are in proprietary spreadsheet format (i.e. Microsoft Excel) structured into presentational tables meant for reports rather than for actual analysis.
It is from this context that the initial ideas and motivation around this project began. The partnership involves mostly individuals whose roles and responsibilities were related to health. As such, ad hoc plans prioritised health-related data. During this time, rough and informal plans were drawn as to how the various steps will be implemented and how the different technologies required will be resourced. Alongside these, ongoing capacity-building on data management and analysis related to research continued within the partnership.
By 2025, three years on from the start of the partnership, very little has progressed and has been implemented from these informal, ad hoc plans whilst the partnership continued to more research capacity-building focusing on other types of research skills (e.g. qualitative research), on student placement projects for University of Oxford Masters students, and other research efforts (e.g. cancer screening, cancer awareness, cancer quality-of-care). During this period and in all these activities, the same challenges and issues related to data identified in 2022 keep propping up.
It is within this background that the seystats project is being
(re-)launched. The current motivation is to try to get moving in a more
productive direction on the ideas generated in 2022 and to be able to
demonstrate the stated advantages of data that is accessible,
persistent, and machine-readable/machine-actionable to catalysing
research efforts in Seychelles.
The project uses only officially-released publicly-available sources of data on Seychelles. Such data are primarily sourced from Seychelles government websites either as file downloads or data embedded onto the webpages themselves. Other sources are official government publications not released online.
Currently, the available datasets from seystats are from the
Seychelles National Bureau of Statistics
(NBS) which provides
downloads of various official
statistics for/about Seychelles. This current release includes data from
the NBS Statistical Bulletin on Population and Vital
Statistics.
The currently available datasets from the seystats project are listed
and described in the table below.
| Description | Time Interval | Filename | Data URL |
|---|---|---|---|
| Registered births by age of mother and birth order | Yearly | births_by_age_birth_order.csv | file |
| Registered births by age of mother and mother’s district of residence | Yearly | births_by_age_district.csv | file |
| Registered births by age of mother | Yearly | births_by_age.csv | file |
| Registered births by birth order | Yearly | births_by_birth_order.csv | file |
| Registered births by mother’s district of residence | Yearly | births_by_district.csv | file |
| Registered births by month of birth registration | Monthly | births_by_month.csv | file |
| Registered births by sex of child | Yearly | births_by_sex.csv | file |
| Registered births total | Yearly | births_total.csv | file |
| Registered deaths by age and sex | Yearly | deaths_by_age_sex.csv | file |
| Registered deaths of infants total | Yearly | deaths_infant_total.csv | file |
| Registered deaths total | Yearly | deaths_total.csv | file |
| Population midyear by age and sex | Yearly | population_midyear_by_age_sex.csv | file |
| Population midyear by age | Yearly | population_midyear_by_age.csv | file |
| Population midyear by district of residence | Yearly | population_midyear_by_district.csv | file |
| Population midyear total | Yearly | population_midyear_total.csv | file |
All available datasets can be found in the data folder of this
repository. Other modes of distribution (e.g. Dolthub SQL database,
Zenodo, Figshare, etc.) are currently in development and would be
available soon.
The datasets curated by seystats can be accessed through the following
methods:
Fork
a copy of the project repository into your own GitHub account then
clone
your copy of the project repository into your local machine. This
requires a GitHub account and knowledge of git processes. This approach
will give you a copy of the entire repository of which the data
directory contains all the datasets listed above. This approach would be
ideal for those who would like to access the datasets but also would
like to potentially contribute to the source code for the curation of
the datasets.
Go to the project repository, then to the data directory and then
select and click the dataset CSV file you want to download. On the upper
right hand corner you will see an downward pointing arrow icon. Click on
this icon to download the selected dataset CSV file.
Using the data URL indicated in the table above, one can programmatically download the dataset of interest using your choice of programming tool.
In Terminal:
curl -OL https://raw.githubusercontent.com/OxfordIHTM/seystats/refs/heads/main/data/births_by_age.csvIn R:
download.file(
url = "https://raw.githubusercontent.com/OxfordIHTM/seystats/refs/heads/main/data/births_by_age.csv"
destfile = "data/births_by_age.csv"
)In the release version of seystats, we will be distributing the
datasets in repositories and archives that have more straightforward
user interfaces for downloading the datasets.
The project repository is structured as follows:
seystats
|-- .git-crypt/
|-- .github/
|-- auth/
|-- data/
|-- maps/
|-- outputs/
|-- pdf/
|-- R/
|-- renv
|-- reports
|-- schema/
|-- _targets.R
|-- .env
|-- .gitattributes
|-- .Rprofile
|-- packages.R
|-- renv.lock
-
.git-crypt/containsgit-cryptsoftware specific files to manage encryption of specific files and folders in the repository. -
.githubcontains project testing and automated deployment of outputs workflows via continuous integration and continuous deployment (CI/CD) using Github Actions. -
authcontains encrypted authentication keys used in this workflow. -
data/contains comma-separated value (CSV) files of the various datasets curated by the project. -
maps/contains Seychelles map data files downloaded by the workflow. -
outputs/contains compiled reports and figures produced by the workflow. -
pdf/contains PDF files downloaded by the workflow for data extraction. -
R/contains R functions developed/created specifically for use in this project. -
renv/containsrenvpackage specific files and directories used by the package for maintaining R package dependencies within the project. The directoryrenv/library, is a library that contains all packages currently used by the project. This directory, and all files and sub-directories within it, are all generated and managed by therenvpackage. Users should not change/edit these manually. -
reports/contains literate code for R Markdown and/or Quarto reports rendered in the workflow. -
schema/contains.sqlcode used for creating and deploying the project SQL database in DoltHub. -
_targets.Rfile defines the steps in the workflow’s data ingest, data processing, data outputs, and reporting pipeline. -
.envis an encrypted file that contains environment variables used in this project. -
.gitattributesfile contains information used bygit-cryptto determine which files and/or folders in the repository to encrypt. -
.Rprofilefile is a project R profile generated when initiatingrenvfor the first time. This file is run automatically every time R is run within this project, andrenvuses it to configure the R session to use therenvproject library. -
packages.Rfile lists out and loads all R package dependencies required by the workflow. -
renv.lockfile is therenvlockfile which records enough metadata about every package used in this project that it can be re-installed on a new machine. This file is generated by therenvpackage and should not be changed/edited manually.
This project is built using R 4.5.1. To manage R versions, it is
recommended to use rig - an R
installation manager - to be able to install multiple versions of R and
switch between them as needed.
This project uses the {renv} framework to record R package
dependencies and versions. Packages and versions used are recorded in
renv.lock and code used to manage dependencies is in the renv
directory and other files in the root project directory.
On starting an R session in the working directory of this repository, first run
renv::restore()to install R package dependencies. This is only done once when the project is being initiated for the first time by a user.
This project uses encrypted environment variables and authentication
keys for data retrieval managed using
git-crypt. Collaborators will
need to install git-crypt and
then provide their GPG key to the authors to be added as an authorised
user within the repository. To get a GPG key, download and install
GPG and then generate your GPG key
pair. Then provide your
GPG key id to the authors.
Once given permission into the project and GPG key id added to the
repository, update your local version of the repository by doing a git pull and then unlock the encrypted files/folders of the repository by
running the following command in Terminal from within the project
directory:
git-crypt unlockThe encrypted components of the repository will now be decrypted and accessible for running the workflow (described below).
The current workflow has the following steps:
graph LR
style Graph fill:#FFFFFF00,stroke:#000000;
subgraph Graph
direction LR
x39c53f3806f354bf["births_by_district_pages"]:::skipped --> xb5d471b223f71093["births_by_age"]:::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> xb5d471b223f71093["births_by_age"]:::skipped
x98cd2c1d9abf872a["births_by_birth_order_pages"]:::skipped --> xfccd4550cd074700["births_by_age_birth_order"]:::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> xfccd4550cd074700["births_by_age_birth_order"]:::skipped
xfccd4550cd074700["births_by_age_birth_order"]:::skipped --> x0c11dddedf200cf6(["births_by_age_birth_order_csv"]):::skipped
xb5d471b223f71093["births_by_age"]:::skipped --> xbbabd51f8df64492(["births_by_age_csv"]):::skipped
x39c53f3806f354bf["births_by_district_pages"]:::skipped --> x4094d4f6d0f8f35a["births_by_age_district"]:::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x4094d4f6d0f8f35a["births_by_age_district"]:::skipped
x4094d4f6d0f8f35a["births_by_age_district"]:::skipped --> x11faadcda3280dc2(["births_by_age_district_csv"]):::skipped
x98cd2c1d9abf872a["births_by_birth_order_pages"]:::skipped --> x8f0b9b44ecb1c0ea["births_by_birth_order"]:::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x8f0b9b44ecb1c0ea["births_by_birth_order"]:::skipped
x8f0b9b44ecb1c0ea["births_by_birth_order"]:::skipped --> x516b8a2e1c1f1ca7(["births_by_birth_order_csv"]):::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x98cd2c1d9abf872a["births_by_birth_order_pages"]:::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x0c2c53d9eeb365ed["births_by_district"]:::skipped
x39c53f3806f354bf["births_by_district_pages"]:::skipped --> x0c2c53d9eeb365ed["births_by_district"]:::skipped
x0c2c53d9eeb365ed["births_by_district"]:::skipped --> x45f5b18e27a4d0fe(["births_by_district_csv"]):::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x39c53f3806f354bf["births_by_district_pages"]:::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x2c584c9caafc1be8["births_by_month"]:::skipped
x5c7646da106bc2f6["births_endyear_monthly_pages"]:::skipped --> x2c584c9caafc1be8["births_by_month"]:::skipped
x2c584c9caafc1be8["births_by_month"]:::skipped --> xa87bb9563f27e00c(["births_by_month_csv"]):::skipped
x5c7646da106bc2f6["births_endyear_monthly_pages"]:::skipped --> x1e485a7f2384826f["births_by_sex"]:::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x1e485a7f2384826f["births_by_sex"]:::skipped
x1e485a7f2384826f["births_by_sex"]:::skipped --> xa6b75ce42a7aa79e(["births_by_sex_csv"]):::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x5c7646da106bc2f6["births_endyear_monthly_pages"]:::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> xe648a7801cd2da2c["births_endyear_pages"]:::skipped
xe648a7801cd2da2c["births_endyear_pages"]:::skipped --> x2f4500c2756065a9["births_total"]:::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x2f4500c2756065a9["births_total"]:::skipped
x2f4500c2756065a9["births_total"]:::skipped --> xd95928afea598d8e(["births_total_csv"]):::skipped
x344a2780ffaeb7bd["deaths_endyear_pages"]:::skipped --> xae333981c466810a["deaths_by_age_sex"]:::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> xae333981c466810a["deaths_by_age_sex"]:::skipped
xae333981c466810a["deaths_by_age_sex"]:::skipped --> xaf8165dda7ea936b(["deaths_by_age_sex_csv"]):::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x344a2780ffaeb7bd["deaths_endyear_pages"]:::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x1d5c00f1e4e41ab7["deaths_infant_total"]:::skipped
xe648a7801cd2da2c["births_endyear_pages"]:::skipped --> x1d5c00f1e4e41ab7["deaths_infant_total"]:::skipped
x1d5c00f1e4e41ab7["deaths_infant_total"]:::skipped --> x548dbfa0844427dc(["deaths_infant_total_csv"]):::skipped
xe648a7801cd2da2c["births_endyear_pages"]:::skipped --> xfd0ff0d9529d4bbd["deaths_total"]:::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> xfd0ff0d9529d4bbd["deaths_total"]:::skipped
xfd0ff0d9529d4bbd["deaths_total"]:::skipped --> xe42488d3267f69ff(["deaths_total_csv"]):::skipped
x8e509dc7997a12f8(["map_download_files"]):::skipped --> x4851b2941f7c62fc(["map_adm0"]):::skipped
x8e509dc7997a12f8(["map_download_files"]):::skipped --> xccb26dd891c9a035(["map_adm1"]):::skipped
x8e509dc7997a12f8(["map_download_files"]):::skipped --> x30d02f8bff8e7f8d(["map_adm2"]):::skipped
x8e509dc7997a12f8(["map_download_files"]):::skipped --> x7b26bed1fc581742(["map_adm3"]):::skipped
x90a781ac8daf46c6(["population_bulletin_download_links"]):::skipped --> x303dcb35f327bc97(["population_bulletin_download_files"]):::skipped
xa7b0e3bf1c25b597(["categories_download_links"]):::skipped --> x90a781ac8daf46c6(["population_bulletin_download_links"]):::skipped
x303dcb35f327bc97(["population_bulletin_download_files"]):::skipped --> xeea0ec396e5de5da(["population_endyear_bulletin_files"]):::skipped
xeea0ec396e5de5da(["population_endyear_bulletin_files"]):::skipped --> x068156701a18b444["population_endyear_bulletin_text"]:::skipped
x0d01604fe60472ef["population_midyear_bulletin_text"]:::skipped --> x9f6b6d2ed74a37b4["population_midyear_bulletin_district_pages"]:::skipped
x303dcb35f327bc97(["population_bulletin_download_files"]):::skipped --> x52e965cb7c1cd1fc(["population_midyear_bulletin_files"]):::skipped
x0d01604fe60472ef["population_midyear_bulletin_text"]:::skipped --> xefa1a60843915f9f["population_midyear_bulletin_pages"]:::skipped
x52e965cb7c1cd1fc(["population_midyear_bulletin_files"]):::skipped --> x0d01604fe60472ef["population_midyear_bulletin_text"]:::skipped
xefa1a60843915f9f["population_midyear_bulletin_pages"]:::skipped --> x946771421caaa150["population_midyear_by_age"]:::skipped
x0d01604fe60472ef["population_midyear_bulletin_text"]:::skipped --> x946771421caaa150["population_midyear_by_age"]:::skipped
x946771421caaa150["population_midyear_by_age"]:::skipped --> xa1686cc888963231(["population_midyear_by_age_csv"]):::skipped
xefa1a60843915f9f["population_midyear_bulletin_pages"]:::skipped --> x426614807f974316["population_midyear_by_age_sex"]:::skipped
x0d01604fe60472ef["population_midyear_bulletin_text"]:::skipped --> x426614807f974316["population_midyear_by_age_sex"]:::skipped
x426614807f974316["population_midyear_by_age_sex"]:::skipped --> xf3039e66b37a1219(["population_midyear_by_age_sex_csv"]):::skipped
x7b26bed1fc581742(["map_adm3"]):::skipped --> xd0c8e8b884ab8581["population_midyear_by_district"]:::skipped
x9f6b6d2ed74a37b4["population_midyear_bulletin_district_pages"]:::skipped --> xd0c8e8b884ab8581["population_midyear_by_district"]:::skipped
x0d01604fe60472ef["population_midyear_bulletin_text"]:::skipped --> xd0c8e8b884ab8581["population_midyear_by_district"]:::skipped
xd0c8e8b884ab8581["population_midyear_by_district"]:::skipped --> xf94a6bce4cea6c14(["population_midyear_by_district_csv"]):::skipped
x068156701a18b444["population_endyear_bulletin_text"]:::skipped --> x52d6a6119be69714["population_midyear_total"]:::skipped
xe648a7801cd2da2c["births_endyear_pages"]:::skipped --> x52d6a6119be69714["population_midyear_total"]:::skipped
x52d6a6119be69714["population_midyear_total"]:::skipped --> x37b834e41770c27f(["population_midyear_total_csv"]):::skipped
end
To run the workflow, issue the following command in R from within the project directory
targets::tar_make()or issue the following command in Terminal from within the project directory
Rscript -e "targets::tar_make()"- Prof. Proochista Ariana
- Dr. Aronrag Meeyai
- Dr. Sylvie Pool
- Dr. Sanjeev Pugazhendhi
- Ituen Williams-Umanah
- Ned Rosalie
- Keddy Woodcock
The Seychelles-Oxford Partnership, from which this project came about, was made possible through the efforts of Prof. Proochista Ariana, Dr. Aronrag Meeyai, and Dr. Sylvie Pool. The original codebases from which this project was built on were written by Dr. Sanjeev Pugazhendhi, Ituen Williams-Umanah, Ned Rosalie, and Keddy Woodcock.
- Dr. Johanna Rapanarilala
- Dr. Carine Asnong
- Prof. Bruno Holthof
- Dr. Giri Rajahram
- Dr. Bushra Naz
- Dr. Yih Seong Wong
- Anita Makori
- Dr. Jillian Francise Lee
- Neira Budiono
- Dr. Nyasha Manyeruke
- Dr. Ibrahim Ajami
This project would also not be possible without the contributions of Dr. Johanna Rapanarilala who supervised and mentored several of the University of Oxford MSc in International Health and Tropical Medicine who came to Seychelles for their study placement for their MSc and Dr. Carine Asnong and Prof. Bruno Holthof who have contributed to teaching on research skills, data analysis, and healthcare management and quality of care for members of the Seychelles Ministry of Health. Finally, the feedback, insights, and learning gained from Dr. Giri Rajahram, Dr. Bushra Naz, Dr. Yih Seong Wong, Anita Makori, Dr. Jillian Francise Lee, Neira Budiono, Dr. Nyasha Manyeruke, and Dr. Ibrahim Ajami - students who spent their study placement in Seychelles - contributed to identifying critical and priority datasets to include in this project.
The project is currently maintained by Ernest Guevarra.
This project is an independent effort by members of the
Seychelles-Oxford Partnership in support of group and individual
data needs and goals. This project is not a recognised project by the
Seychelles government or of the ministries/agencies from which our
Seychellois colleagues are affiliated with. Any issues or problems
arising from the seystats datasets or from participating or
contributing to the development of this project are the responsibility
of the authors and maintainers of this project and should be addressed
to them accordingly and not to the ministries/agencies/organisations
from which the data has been made available from.
This effort is aimed to serve as an example of how data can be curated and managed in a manner that is effective, efficient, machine-actionable, and analysis-ready using widely-available and open source tools. Given the open source nature of this project, it is our hope that the project can either be handed over to relevant and official entities within Seychelles to continue its upkeep and maintenance or that this becomes a basis for official efforts within Seychelles to streamline and make more effective and efficient its data curation and management processes.
All code in this project is released under a GPL-3.0 license. All text in this project is released under a CC-BY-4.0 license. All data is released under a CC0 license.
If you use the data provided through seystats in your work/research,
please cite seystats along with all the sources of data that were used
for curating the data available herewith. The suggested appropriate
citation metadata is provided in
CITATION.cff.
Feedback, bug reports and feature requests are welcome; file issues or seek support here. If you would like to contribute to the project, please see our contributing guidelines.
This project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
