-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
412 lines (276 loc) · 16.8 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
---
output:
github_document:
toc: true
toc_depth: 2
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# survey.tm
`survey.tm` streamlines the process of managing and updating translations for multilingual CAPI surveys, significantly easing the task for users to maintain accuracy and consistency across multiple languages, especially when questionnaires undergo changes and working with multiple questionnaires.
Key features include:
- **Automated Synchronization**: Automatically extracts and consolidates survey text items from the [Survey Solutions Designer](https://designer.mysurvey.solutions), ensuring translations are always up-to-date with the latest questionnaire versions.
- **Centralized Translation Management**: Utilizes a Google Sheets-based 'Translation Database' for easy and intuitive management of translations by multiple translators. See example [here](https://docs.google.com/spreadsheets/d/1qnLflfVEpCm2sRIIv2NXrBRZikkiUp7JRW0zujChq5E/edit?usp=sharing).
- **Efficiency and Consistency**: Minimizes manual work, reduces errors, and enhances consistency across translations, saving hours in translation management for both survey manager and Translator(s).
- **Quick Setup**: Designed for a setup time of just 5 minutes, it's scalable across multiple languages and questionnaires.
Initially tailored for [Survey Solutions](https://mysurvey.solutions/en/) with future plans to support ODK-based systems, `survey.tm` is an essential tool for anyone conducting multilingual surveys seeking to improve their translation workflow.
## Prerequisites
- R version 4.0.0 or higher must be installed on your machine. If you do not have it, you can download and install the latest version of R from [CRAN](https://cran.r-project.org/).
- A valid Google account for accessing [Google Sheets](https://www.google.com/sheets/about/).
- A CAPI questionnaire within [Survey Solutions](https://designer.mysurvey.solutions/questionnaire/my).
Once you have the prerequisites ready, proceed with the installation of `survey.tm` as outlined below.
## Installation
You can install the development version of survey.tm through
``` r
# If you don't have `devtools` installed, uncomment the line below to install it:
# install.packages("devtools")
devtools::install_github("RowSquared/survey.tm", ref = "main")
```
## Setup and Quick Start Guide
### Set up
To start using `survey.tm` for a data collection project, set up a "Translation Database Sheet" on Google Sheets. A single sheet is recommended per project, regardless of the number of questionnaires or languages involved. There are two setup options:
#### Option 1: Use `setup_tdb()` Function
Run setup_tdb() **once**(!) to create a "Translation Database Sheet":
``` {r setup, eval=FALSE}
# First-time `googlesheets4` users need to authenticate. Follow the on-screen instructions.
# googlesheets4::gs4_auth(email = "[email protected]")
#library(survey.tm)
# Create the translation database sheet. Run once, then comment out or remove.
# The function will print the URL of the new sheet into the console
# setup_tdb(
# ssheet_name = "Project X: Translation Sheet", # Name to assign to new google worksheet
# lang_names = c("German", "French") # Translations/Languages needed. Needs to be ISO 639-1 language name.
# )
```
#### Option 2: Use a Template Sheet
1. [Create a copy of this template sheet](https://docs.google.com/spreadsheets/d/1xLHEDm5bgtv4IHHCbfyMf_sdNsdzz_AUcRrnpb2fXTw)
2. **Rename** the sheet for your target language (e.g., "German").
3. **Add more sheets** for additional languages as needed, naming each one appropriately.
After setting up your "Translation Database Sheet" using either option, ensure you copy and save the Google Sheet ID for future use in the workflow.
### Quick Start
Having established your "Translation Database Sheet", you can jump straight into using survey.tm for your project. Below is a quick start code snippet that encapsulates the essential steps of the translation management workflow. Simply copy and paste the following code into your R environment to begin. It's recommended to examine each object after each step to gain insight into the process and understand the changes taking place.
```{r quick_start_ov, eval=FALSE}
library(survey.tm)
# Optional: Authenticate with Google Sheets (required once per session)
# googlesheets4::gs4_auth(email = "[email protected]")
# Define your project's Translation Database Sheet ID
translation_db_id = "your_google_sheet_id"
# Specify the Survey Solutions questionnaire IDs you're working with
# Locate your questionnaire ID in Survey Solutions by navigating to your
# questionnaire in the Designer. The ID appears in the URL as
#https://designer.mysurvey.solutions/questionnaire/details/<questionnaire_id>
questionnaire_ids = c("questionnaire1_id", "questionnaire2_id")
#Step 1: Pull (& compare) all data sources ----
#Step 1.1: Retrieve the 'Source Questionnaire' template files from Survey Solutions
suso_trans_templates <- get_suso_tfiles(
questionnaires = questionnaire_ids,
user = "[email protected]", # Registered with Survey Solutions Designer
password = "your_password"
)
#Step 1.2: Aggregate and consolidate text items from multiple questionnaires and sheets into one data.table
source_titems <- parse_suso_titems(
tmpl_list = suso_trans_templates, #Nested list as returned by `get_suso_tfiles()`
collapse = TRUE # Keep only unique text items across questionnaires
)
#Step 1.3: Pull 'Translation Database' Google Sheet data
#Pull all columns and rows found in all sheets in 'Translation Database' Google Sheet
tdb_data <- get_tdb_data(ss = translation_db_id)
#Please note: At first run, all elements of the list returned will be empty data.table's
#(as there is not data in the Google Sheet..)
#Step 1.4: Update the 'Translation Database' object based on the text items in current version of 'Source Questionnaire(s)'
# It changes the statuses of any text item in the translation database object that no longer is part of the source questionnaire(s).
# And adds any new text item from source questionnaire(s) not yet found in the database
new_tdb <- update_tdb(
tdb = tdb_data,
source_titems = source_titems
)
#Optional Step 1.5: Query API Translation Services for items without any translation
#Currently DeepL and Google API supported
# batchTranslate_Deepl2(new_tdb,
# API_key = my_api_key
# )
#Optional Step 1.6: Check translation for software-related issues
new_tdb <- syntax_check(
tdb=new_tdb
)
#Step 2: Update the Translation Database ----
# Write one particular (updated) language to the 'Translation Database' Google Sheet
write_tdb_data(new_tdb[["German"]],
ss = translation_db_id,
sheet = "German"
)
#Or write all languages found in 'Translation Database' object to the Google Sheet
# purrr::walk(
# .x = names(new_tdb),
# .f = ~ write_tdb_data(new_tdb[[.x]],
# ss = translation_db_id,
# sheet = .x
# )
# )
#Step 3: Generate outputs for CAPI program ----
# Take the 'German' Translation from database and merge into the source questionnaire as returned by `get_suso_tfiles()`
# Consider only text items of particular statuses defined by Translator
create_suso_file(
tdb.language = new_tdb[["German"]],
source_questionnaire = suso_trans_templates[["NAME-OF-QUESTIONNAIRE"]],
statuses = c("machine", "reviewed", "translated"),
path = "your-path/German_NAME-OF-QUESTIONNAIRE.xlsx"
)
```
## Workflow Details
### Step 1: <span style="font-weight: normal"> Pull all data sources</span>
Pull the Questionnaire Template(s) and current 'Translation Database', compare and produce an updated translation database object.
#### 1.1 Pull the Questionnaire Template(s)
<details>
<summary>
Click to see detailed description
</summary>
To retrieve the current version and text items of the questionnaire(s) you're working with, you can use the `get_suso_tfiles()` function. This function downloads the [questionnaire templates](https://docs.mysurvey.solutions/questionnaire-designer/toolbar/multilingual-questionnaires/) in Excel format from the Survey Solutions Designer and stores them as nested lists in your environment. You can specify the questionnaires by using their questionnaire ID, which is a 32-character alphanumeric identifier.
You can find this ID by logging in to the Survey Solutions Designer and accessing your questionnaire. The URL in your browser should look like `https://designer.mysurvey.solutions/questionnaire/details/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`, where the combination of x's after `/details/` represents your questionnaire ID.
</details>
``` {r get_suso_tfiles, eval=FALSE}
#One needs to pull the Questionnaire Template(s) ('Translation File') from the Survey Solutions Designer
# Define the questionnaire IDs you want to retrieve translations for.
questionnaire_ids <- c("9e37f1c59e1c47039167928481cf42d2", "702bef443dfb4fc8bd8c754612590875")
# Retrieve the 'Source Questionnaire' template files
suso_trans_templates <- get_suso_tfiles(
questionnaires = questionnaire_ids,
sheets=NULL, #If you want to read only specific sheets (like sheets 'Translations' without any reusable categories which is stored on seperate sheet)
user = "[email protected]", # Registered with Survey Solutions Designer
password = "your_password"
)
```
#### 1.2 Parse Questionnaire Templates
<details>
<summary>
Click to see detailed description
</summary>
Text items to be translated are currently distributed across questionnaires and sheets in the object returned by `get_suso_tfiles()`. To aggregate all these items into a single consolidated data.table, use the `parse_suso_titems()` function.
</details>
``` {r parse_suso_titems, eval=FALSE}
#Aggregate and consolidate text items from multiple questionnaires and sheets into a single data.table using parse_suso_titems().
source_titems <- parse_suso_titems(
tmpl_list = suso_trans_templates, #Nested list as returned by `get_suso_tfiles()`
collapse = TRUE, # Keep only unique text items across questionnaires
qcode_pattern = "^Q\\d+.\\s" #Remove question coding (e.g. Q1.). Needs to be added later in `create_suso_file`
)
```
#### 1.3 Pull 'Translation Database' Google Sheet data
<details>
<summary>
Click to see detailed description
</summary>
In order to add translations to the 'Source Questionnaires', it's necessary to pull any existing translations (if available) specified by the Translator from the 'Translation Database' Google Sheet.
`get_tdb_data()` is a simple wrapper for `googlesheets4::read_sheet()` and returns a list where each element is a sheet in the Google Sheet, representing translation data for a specific language.
Please note: At first run after `setup_tdb()`, all elements of the list returned will be empty data.table's.
</details>
``` {r get_tdb_data, eval=FALSE}
# googlesheets4::gs4_auth(email = "[email protected]")
#Pull all columns and rows found in all sheets in 'Translation Database' Google Sheet
tdb_data <- get_tdb_data(ss = "GOOGLE-SHEET-IDENTIFIER")
```
#### 1.4 Update 'Translation Database' object
<details>
<summary>
Click to see detailed description
</summary>
Compares 'Source Questionnaire' text items (object returned by parse_suso_titems()`) against the 'Translation Database' list returned by `get_tdb_data()`.
Adds any new text item from source questionnaire(s) not yet found in the database. Text items in any sheet in the translation database object that no longer is part of the source questionnaire(s) does remain in the object but status is changed to `outdated`.
List returned will be used for subsequent processes, including creating 'Translated Questionnaires' for upload to CAPI system and/or pushed to the 'Translation Database'
</details>
``` {r update_tdb, eval=FALSE}
#Update the 'Translation Database' object based on the text items in current version of 'Source Questionnaire(s)'
new_tdb <- update_tdb(
tdb = tdb_data,
source_titems = source_titems
)
```
### Step 2: <span style="font-weight: normal"> Update the Translation Database</span>
<details>
<summary>
Click to see detailed description
</summary>
Object `new_tdb` contains the most recent text items from the source questionnaire along with the previously added translations (if any) for all languages in the 'Translation Database'. As a next step, one should update the Google Worksheet 'Database' so that Translators can continue working on the most recent version of the questionnaire(s). To this end, use `write_tdb_data()` which writes one language from `new_tdb` object to one particular sheet in the Google Worksheet.
Please note: It is advisable to coordinate with the Translator(s) to not overwrite translations that she is adding to the Database sheet while you are about to push new ones.
</details>
``` {r write_tdb_data, eval=FALSE}
#Write one particular (updated) language to the 'Translation Database' Google Sheet
write_tdb_data(new_tdb[["German"]],
ss = "GOOGLE-SHEET-IDENTIFIER",
sheet = "German"
)
#Or write all languages found in 'Translation Database' object to the Google Sheet
# purrr::walk(
# .x = names(new_tdb),
# .f = ~ write_tdb_data(new_tdb[[.x]],
# ss = ss,
# sheet = .x
# )
# )
```
### Step 3: <span style="font-weight: normal"> Generate outputs for CAPI program</span>
<details>
<summary>
Click to see detailed description
</summary>
Using object `new_tdb`, one can now add the most recent translation to a questionnaire source template (as returned by `get_suso_tfiles()`) of your choice.
To this end, use `create_suso_file()` which will write a .xlsx file that one can use to upload to the Survey Solutions Designer.
</details>
``` {r create_suso_file, eval=FALSE}
# Take the 'German' Translation from database and merge into the source questionnaire as returned by `get_suso_tfiles()`}
# Consider only text items of particular statuses defined by Translator
#Attention: If you used parameter qcode_pattern at `parse_suso_titems()`, you should specify here again to ensure strings are merged correctly
create_suso_file(
tdb.language = new_tdb[["German"]],
source_questionnaire = suso_trans_templates[["NAME-OF-QUESTIONNAIRE"]],
statuses = c("machine", "reviewed", "translated"),
qcode_pattern = "^Q\\d+.\\s",
path = "your-path/German_NAME-OF-QUESTIONNAIRE.xlsx"
)
# Or loop through all source questionnaires pulled via `get_suso_tfiles()`and create a file for each language sheet on 'Translation Database'
#languages <- names(new_tdb)
#questionnaires <- names(suso_trans_templates)
#purrr::walk(languages, ~{
# lang <- .x
# purrr::walk(questionnaires, ~create_suso_file(
# tdb.language = new_tdb[[lang]],
# source_questionnaire = suso_trans_templates[[.x]],
# statuses = c("machine", "reviewed", "translated"),
# path = paste0("your-path/", lang, "_", .x, ".xlsx")
# ))
#})
```
## Utility Functions
In addition to the basic workflow functions, there are a number of additional wrappers that help in translation as well as validating the translation.
### Identifying Software-Related Mismatches in Translations
The `syntax_check` function serves to identify software-related mismatches between the 'Text_Item' and 'Translation' columns of a given translation data table. These mismatches could arise due to:
1. **HTML-tag issues**: Mismatches in HTML tags or their count between the 'Text_Item' and 'Translation' columns.
2. **Text Substitution issues**: Discrepancies in text substitutions between the 'Text_Item' and 'Translation' columns. For instance, if a text substitution like `%rostertitle%` is present in the 'Text_Item' but missing in the 'Translation'.
``` {r syntax_check, eval=FALSE}
new_tdb <- syntax_check(
tdb=new_tdb, # Your list of translations (usually created by `get_tdb_data()` or `update_tdb()`).
pattern = "%[a-zA-Z0-9_]+%" # Regular expression that identifies text items in 'Text_Item' and 'Translation'. The default pattern is `%[a-zA-Z0-9_]+%`, which matches substitutions like `%rostertitle%`.
)
```
### Query Translations
Quickly translate missing 'Translation' items using API providers:
#### Google Translate API
Simple wrapper for the Google Translate API, utilizing the [`googleLanguageR`](https://CRAN.R-project.org/package=googleLanguageR) package. For more details on setting up Google Translate API and obtaining authentication details, visit [Google Cloud Translation](https://cloud.google.com/translate/docs/setup).
``` {r query_gl_api, eval=FALSE}
# Query Google API
batchTranslate_GApi(new_tdb,
auth="my_path/my_service_account_key.json")
```
#### DeepL
Simple wrapper for the DeepL API, utilizing the [`deeplr`](https://github.com/zumbov2/deeplr#readme) package. For more details on setting up the API, visit [DeepL API](https://www.deepl.com/pro/change-plan#developer).
``` {r query_deepl_api, eval=FALSE}
batchTranslate_Deepl2(new_tdb,
API_key = my_api_key
)
```