Skip to content

System to facilitate translation processes of survey questionnaires.

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

RowSquared/survey.tm

Repository files navigation

survey.tm

survey.tm streamlines the process of managing and updating translations for multilingual CAPI surveys, significantly easing the task for users to maintain accuracy and consistency across multiple languages, especially when questionnaires undergo changes and working with multiple questionnaires.

Key features include:

  • Automated Synchronization: Automatically extracts and consolidates survey text items from the Survey Solutions Designer, ensuring translations are always up-to-date with the latest questionnaire versions.
  • Centralized Translation Management: Utilizes a Google Sheets-based ‘Translation Database’ for easy and intuitive management of translations by multiple translators. See example here.
  • Efficiency and Consistency: Minimizes manual work, reduces errors, and enhances consistency across translations, saving hours in translation management for both survey manager and Translator(s).
  • Quick Setup: Designed for a setup time of just 5 minutes, it’s scalable across multiple languages and questionnaires.

Initially tailored for Survey Solutions with future plans to support ODK-based systems, survey.tm is an essential tool for anyone conducting multilingual surveys seeking to improve their translation workflow.

Prerequisites

  • R version 4.0.0 or higher must be installed on your machine. If you do not have it, you can download and install the latest version of R from CRAN.
  • A valid Google account for accessing Google Sheets.
  • A CAPI questionnaire within Survey Solutions.

Once you have the prerequisites ready, proceed with the installation of survey.tm as outlined below.

Installation

You can install the development version of survey.tm through

# If you don't have `devtools` installed, uncomment the line below to install it:
# install.packages("devtools")
devtools::install_github("RowSquared/survey.tm", ref = "main")

Setup and Quick Start Guide

Set up

To start using survey.tm for a data collection project, set up a “Translation Database Sheet” on Google Sheets. A single sheet is recommended per project, regardless of the number of questionnaires or languages involved. There are two setup options:

Option 1: Use setup_tdb() Function

Run setup_tdb() once(!) to create a “Translation Database Sheet”:

# First-time `googlesheets4` users need to authenticate. Follow the on-screen instructions.
# googlesheets4::gs4_auth(email = "[email protected]")

#library(survey.tm)

# Create the translation database sheet. Run once, then comment out or remove.
# The function will print the URL of the new sheet into the console
# setup_tdb(
#   ssheet_name = "Project X: Translation Sheet", # Name to assign to new google worksheet
#   lang_names = c("German", "French") # Translations/Languages needed. Needs to be ISO 639-1 language name.
# )

Option 2: Use a Template Sheet

  1. Create a copy of this template sheet
  2. Rename the sheet for your target language (e.g., “German”).
  3. Add more sheets for additional languages as needed, naming each one appropriately.

After setting up your “Translation Database Sheet” using either option, ensure you copy and save the Google Sheet ID for future use in the workflow.

Quick Start

Having established your “Translation Database Sheet”, you can jump straight into using survey.tm for your project. Below is a quick start code snippet that encapsulates the essential steps of the translation management workflow. Simply copy and paste the following code into your R environment to begin. It’s recommended to examine each object after each step to gain insight into the process and understand the changes taking place.

library(survey.tm)

# Optional: Authenticate with Google Sheets (required once per session)
# googlesheets4::gs4_auth(email = "[email protected]")

# Define your project's Translation Database Sheet ID
translation_db_id = "your_google_sheet_id"

# Specify the Survey Solutions questionnaire IDs you're working with
# Locate your questionnaire ID in Survey Solutions by navigating to your 
# questionnaire in the Designer. The ID appears in the URL as
#https://designer.mysurvey.solutions/questionnaire/details/<questionnaire_id>
questionnaire_ids = c("questionnaire1_id", "questionnaire2_id")


#Step 1: Pull (& compare) all data sources ---- 

#Step 1.1: Retrieve the 'Source Questionnaire' template files from Survey Solutions
suso_trans_templates <- get_suso_tfiles(
  questionnaires = questionnaire_ids,
  user = "[email protected]", # Registered with Survey Solutions Designer
  password = "your_password"
  )

#Step 1.2: Aggregate and consolidate text items from multiple questionnaires and sheets into one data.table
source_titems <- parse_suso_titems(
  tmpl_list = suso_trans_templates, #Nested list as returned by `get_suso_tfiles()`
  collapse = TRUE # Keep only unique text items across questionnaires
)

#Step 1.3: Pull 'Translation Database' Google Sheet data
#Pull all columns and rows found in all sheets in 'Translation Database' Google Sheet
tdb_data <- get_tdb_data(ss = translation_db_id)
#Please note: At first run, all elements of the list returned will be empty data.table's
#(as there is not data in the Google Sheet..)

#Step 1.4: Update the 'Translation Database' object based on the text items in current version of 'Source Questionnaire(s)'
# It changes the statuses of any text item in the translation database object that no longer is part of the source questionnaire(s). 
# And adds any new text item from source questionnaire(s) not yet found in the database
new_tdb <- update_tdb(
  tdb = tdb_data,
  source_titems = source_titems
)

#Optional Step 1.5: Query API Translation Services for items without any translation
#Currently DeepL and Google API supported
# batchTranslate_Deepl2(new_tdb,
#   API_key = my_api_key
# )

#Optional Step 1.6: Check translation for software-related issues
new_tdb <- syntax_check(
  tdb=new_tdb
  )

#Step 2: Update the Translation Database ---- 
# Write one particular (updated) language to the 'Translation Database' Google Sheet
write_tdb_data(new_tdb[["German"]],
    ss = translation_db_id,
    sheet = "German"
  )
#Or write all languages found in 'Translation Database' object to the Google Sheet
# purrr::walk(
#   .x = names(new_tdb),
#   .f = ~ write_tdb_data(new_tdb[[.x]],
#                         ss = translation_db_id,
#                         sheet = .x
#   )
# )

#Step 3: Generate outputs for CAPI program ---- 

# Take the 'German' Translation from database and merge into the source questionnaire as returned by `get_suso_tfiles()`
# Consider only text items of particular statuses defined by Translator
create_suso_file(
  tdb.language = new_tdb[["German"]],
  source_questionnaire = suso_trans_templates[["NAME-OF-QUESTIONNAIRE"]],
  statuses = c("machine", "reviewed", "translated"),
  path = "your-path/German_NAME-OF-QUESTIONNAIRE.xlsx"
)

Workflow Details

Step 1: Pull all data sources

Pull the Questionnaire Template(s) and current ‘Translation Database’, compare and produce an updated translation database object.

1.1 Pull the Questionnaire Template(s)

Click to see detailed description

To retrieve the current version and text items of the questionnaire(s) you’re working with, you can use the get_suso_tfiles() function. This function downloads the questionnaire templates in Excel format from the Survey Solutions Designer and stores them as nested lists in your environment. You can specify the questionnaires by using their questionnaire ID, which is a 32-character alphanumeric identifier.

You can find this ID by logging in to the Survey Solutions Designer and accessing your questionnaire. The URL in your browser should look like https://designer.mysurvey.solutions/questionnaire/details/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, where the combination of x’s after /details/ represents your questionnaire ID.

#One needs to pull the Questionnaire Template(s) ('Translation File') from the Survey Solutions Designer

# Define the questionnaire IDs you want to retrieve translations for.
questionnaire_ids <- c("9e37f1c59e1c47039167928481cf42d2", "702bef443dfb4fc8bd8c754612590875")

# Retrieve the 'Source Questionnaire' template files
suso_trans_templates <- get_suso_tfiles(
  questionnaires = questionnaire_ids,
  sheets=NULL, #If you want to read only specific sheets (like sheets 'Translations' without any reusable categories which is stored on seperate sheet)
  user = "[email protected]", # Registered with Survey Solutions Designer
  password = "your_password"
  )

1.2 Parse Questionnaire Templates

Click to see detailed description

Text items to be translated are currently distributed across questionnaires and sheets in the object returned by get_suso_tfiles(). To aggregate all these items into a single consolidated data.table, use the parse_suso_titems() function.

#Aggregate and consolidate text items from multiple questionnaires and sheets into a single data.table using parse_suso_titems().
source_titems <- parse_suso_titems(
  tmpl_list = suso_trans_templates, #Nested list as returned by `get_suso_tfiles()`
  collapse = TRUE, # Keep only unique text items across questionnaires
  qcode_pattern = "^Q\\d+.\\s" #Remove question coding (e.g. Q1.). Needs to be added later in `create_suso_file`
)

1.3 Pull ‘Translation Database’ Google Sheet data

Click to see detailed description

In order to add translations to the ‘Source Questionnaires’, it’s necessary to pull any existing translations (if available) specified by the Translator from the ‘Translation Database’ Google Sheet.

get_tdb_data() is a simple wrapper for googlesheets4::read_sheet() and returns a list where each element is a sheet in the Google Sheet, representing translation data for a specific language.

Please note: At first run after setup_tdb(), all elements of the list returned will be empty data.table’s.

# googlesheets4::gs4_auth(email = "[email protected]")

#Pull all columns and rows found in all sheets in 'Translation Database' Google Sheet
tdb_data <- get_tdb_data(ss = "GOOGLE-SHEET-IDENTIFIER")

1.4 Update ‘Translation Database’ object

Click to see detailed description

Compares ‘Source Questionnaire’ text items (object returned by parse_suso_titems()) against the 'Translation Database' list returned byget_tdb_data()`.

Adds any new text item from source questionnaire(s) not yet found in the database. Text items in any sheet in the translation database object that no longer is part of the source questionnaire(s) does remain in the object but status is changed to outdated.

List returned will be used for subsequent processes, including creating ‘Translated Questionnaires’ for upload to CAPI system and/or pushed to the ‘Translation Database’

#Update the 'Translation Database' object based on the text items in current version of 'Source Questionnaire(s)'
new_tdb <- update_tdb(
  tdb = tdb_data,
  source_titems = source_titems
)

Step 2: Update the Translation Database

Click to see detailed description

Object new_tdb contains the most recent text items from the source questionnaire along with the previously added translations (if any) for all languages in the ‘Translation Database’. As a next step, one should update the Google Worksheet ‘Database’ so that Translators can continue working on the most recent version of the questionnaire(s). To this end, use write_tdb_data() which writes one language from new_tdb object to one particular sheet in the Google Worksheet.

Please note: It is advisable to coordinate with the Translator(s) to not overwrite translations that she is adding to the Database sheet while you are about to push new ones.

#Write one particular (updated) language to the 'Translation Database' Google Sheet
write_tdb_data(new_tdb[["German"]],
    ss = "GOOGLE-SHEET-IDENTIFIER",
    sheet = "German"
  )

#Or write all languages found in 'Translation Database' object to the Google Sheet
# purrr::walk(
#   .x = names(new_tdb),
#   .f = ~ write_tdb_data(new_tdb[[.x]],
#                         ss = ss,
#                         sheet = .x
#   )
# )

Step 3: Generate outputs for CAPI program

Click to see detailed description

Using object new_tdb, one can now add the most recent translation to a questionnaire source template (as returned by get_suso_tfiles()) of your choice.

To this end, use create_suso_file() which will write a .xlsx file that one can use to upload to the Survey Solutions Designer.

# Take the 'German' Translation from database and merge into the source questionnaire as returned by `get_suso_tfiles()`}
# Consider only text items of particular statuses defined by Translator
#Attention: If you used parameter qcode_pattern at `parse_suso_titems()`, you should specify here again to ensure strings are merged correctly
create_suso_file(
  tdb.language = new_tdb[["German"]],
  source_questionnaire = suso_trans_templates[["NAME-OF-QUESTIONNAIRE"]],
  statuses = c("machine", "reviewed", "translated"),
  qcode_pattern = "^Q\\d+.\\s",
  path = "your-path/German_NAME-OF-QUESTIONNAIRE.xlsx"
)

# Or loop through all source questionnaires pulled via `get_suso_tfiles()`and create a file for each language sheet on 'Translation Database'
#languages <- names(new_tdb)
#questionnaires <- names(suso_trans_templates)
#purrr::walk(languages, ~{
#  lang <- .x
#  purrr::walk(questionnaires, ~create_suso_file(
#    tdb.language = new_tdb[[lang]],
#    source_questionnaire = suso_trans_templates[[.x]],
#    statuses = c("machine", "reviewed", "translated"),
#    path = paste0("your-path/", lang, "_", .x, ".xlsx")
#  ))
#})

Utility Functions

In addition to the basic workflow functions, there are a number of additional wrappers that help in translation as well as validating the translation.

Identifying Software-Related Mismatches in Translations

The syntax_check function serves to identify software-related mismatches between the ‘Text_Item’ and ‘Translation’ columns of a given translation data table. These mismatches could arise due to:

  1. HTML-tag issues: Mismatches in HTML tags or their count between the ‘Text_Item’ and ‘Translation’ columns.
  2. Text Substitution issues: Discrepancies in text substitutions between the ‘Text_Item’ and ‘Translation’ columns. For instance, if a text substitution like %rostertitle% is present in the ‘Text_Item’ but missing in the ‘Translation’.
new_tdb <- syntax_check(
  tdb=new_tdb, # Your list of translations (usually created by `get_tdb_data()` or `update_tdb()`).
  pattern = "%[a-zA-Z0-9_]+%" # Regular expression that identifies text items in 'Text_Item' and 'Translation'. The default pattern is `%[a-zA-Z0-9_]+%`, which matches substitutions like `%rostertitle%`.
)

Query Translations

Quickly translate missing ‘Translation’ items using API providers:

Google Translate API

Simple wrapper for the Google Translate API, utilizing the googleLanguageR package. For more details on setting up Google Translate API and obtaining authentication details, visit Google Cloud Translation.

# Query Google API
batchTranslate_GApi(new_tdb,
                    auth="my_path/my_service_account_key.json")

DeepL

Simple wrapper for the DeepL API, utilizing the deeplr package. For more details on setting up the API, visit DeepL API.

batchTranslate_Deepl2(new_tdb,
  API_key = my_api_key
)

About

System to facilitate translation processes of survey questionnaires.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published