Questionnaire summary: Decide process for handling free text (answers from a definable set) #12

edwardchalstrey1 · 2023-05-15T10:03:36Z

Handling free-text questions where answers come from a definable set

- 7.a Which non-desktop interfaces are important to you?
- 8.a Which programming languages are important to you?
- 9.a Which repositories are important to you?
- 10.a Which commercially licensed software is important to you?
- 23.a Are there sensitivity systems that you think are important or use?

Notes

Categorise responses by percentage appearance

The text was updated successfully, but these errors were encountered:

JimMadge · 2023-05-16T12:47:14Z

For free text where we expect (lists of) answers drawn from a set, it feels like we should count the responses.

Something like

responses -> join -> split -> flatten -> lowercase -> count -> sort

followed by a human inspection to pick out, for instance, programming languages from other words (and combine synonyms).

>>> from collections import Counter
>>> from itertools import chain
>>> from re import split
>>> responses = ["Python,R,Julia", "R", "Python", "Python and R"]
["Python,R,Julia", "R", "Python", "Python and R"]
>>> Counter(chain.from_iterable(split(',| ', item) for item in responses))
Counter({'Python': 3, 'R': 3, 'Julia': 1, 'and': 1})

That ranking will give us an impression of what is most popular.

We could then look at votes/total responses or similar.

I think this should help for your category 2 and 3.

I think word clouds will be too qualitative to make decisions on (but would be useful for sharing out results).

(What if someone says "not Python"?)

drchriscole · 2023-05-16T13:34:00Z

There will be some questions (open ended comments) which will need a human interpreter, otherwise an attempt to categorise responses makes sense.

edwardchalstrey1 · 2023-05-16T13:43:45Z

@JimMadge your idea is roughly what I've been doing for languages. I've also found that coming up with some bespoke rules on eyeballing the data is needed.

So the process is like

some combo of join -> split -> flatten -> lowercase -> count -> sort
extra logic specific to edge cases, rather than attempting a fully automated tested pipeline

Step 2 is dataset specific, and we wouldn't want to do this if we wanted it to be reproducible with new survey data, but this is just being done once, so we don't care

JimMadge · 2023-05-16T13:50:42Z

Step 2 is dataset specific, and we wouldn't want to do this if we wanted it to be reproducible with new survey data, but this is just being done once, so we don't care

I think not entirely and it would still be good to share what those extra steps are.

We could for example have lists of synonyms e.g. "PyPI == Python Package Index" and how those are applied. It wouldn't be perfect but if we have a few more responses it would probably still work and be a good starting point for further work.

edwardchalstrey1 · 2023-05-17T08:51:56Z

from @harisood

Survey responses (free text, easily categorisable) example - this isn't actual analysis, just an example of what results could look like:

Programming language support

Summary

The percentage splits of responses to the SATRE survey question of what programming languages should be supported in a TRE.

Detail

Survey Results

Option	% Responses
Sorted from largest to smallest percentage
Python	x%
R	y%
C#	z%
...	...

Summary blurb, e.g.:
The community strongly favoured support for Python and R, with a variety of other languages less called for

Where

Specification features

Proposal

Include Python and R support in 'required' features for the specification
Provide a list of languages to support in 'Optional' features for the specification

This was referenced May 15, 2023

Process SATRE feature questionnaire #9

Open

Questionnaire summary: About you, institution #14

Open

harisood changed the title ~~Questionnaire summary: Decide a common approach on how to handle free text~~ Questionnaire summary: Decide a common approach on how to handle free text (answers from a definable set) May 17, 2023

harisood changed the title ~~Questionnaire summary: Decide a common approach on how to handle free text (answers from a definable set)~~ Questionnaire summary: Free text (answers from a definable set) May 17, 2023

harisood mentioned this issue May 17, 2023

Questionnaire summary: Decide process to handle free text (answers from a non-definable set) #16

Open

harisood changed the title ~~Questionnaire summary: Free text (answers from a definable set)~~ Questionnaire summary: Decide process for handling free text (answers from a definable set) May 17, 2023

This was referenced May 17, 2023

Questionnaire summary: Results from free text (answers from a definable set) #18

Open

Decide on process of how survey results get incorporated into specification doc #13

Open

harisood assigned craddm and JimMadge May 17, 2023

harisood added the WP1 Work package 1 work label May 17, 2023

harisood added this to SATRE backlog (public) May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questionnaire summary: Decide process for handling free text (answers from a definable set) #12

Questionnaire summary: Decide process for handling free text (answers from a definable set) #12

edwardchalstrey1 commented May 15, 2023 •

edited by harisood

Loading

JimMadge commented May 16, 2023 •

edited

Loading

drchriscole commented May 16, 2023

edwardchalstrey1 commented May 16, 2023

JimMadge commented May 16, 2023

edwardchalstrey1 commented May 17, 2023 •

edited by harisood

Loading

Questionnaire summary: Decide process for handling free text (answers from a definable set) #12

Questionnaire summary: Decide process for handling free text (answers from a definable set) #12

Comments

edwardchalstrey1 commented May 15, 2023 • edited by harisood Loading

Handling free-text questions where answers come from a definable set

Notes

JimMadge commented May 16, 2023 • edited Loading

drchriscole commented May 16, 2023

edwardchalstrey1 commented May 16, 2023

JimMadge commented May 16, 2023

edwardchalstrey1 commented May 17, 2023 • edited by harisood Loading

Programming language support

Summary

Detail

Survey Results

Where

Proposal

edwardchalstrey1 commented May 15, 2023 •

edited by harisood

Loading

JimMadge commented May 16, 2023 •

edited

Loading

edwardchalstrey1 commented May 17, 2023 •

edited by harisood

Loading