Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questionnaire summary: Decide process for handling free text (answers from a definable set) #12

Open
Tracked by #9
edwardchalstrey1 opened this issue May 15, 2023 · 5 comments
Assignees
Labels
WP1 Work package 1 work

Comments

@edwardchalstrey1
Copy link

edwardchalstrey1 commented May 15, 2023

Handling free-text questions where answers come from a definable set

- 7.a Which non-desktop interfaces are important to you?
- 8.a Which programming languages are important to you?
- 9.a Which repositories are important to you?
- 10.a Which commercially licensed software is important to you?
- 23.a Are there sensitivity systems that you think are important or use?

Notes

  • Categorise responses by percentage appearance
@JimMadge
Copy link
Member

JimMadge commented May 16, 2023

For free text where we expect (lists of) answers drawn from a set, it feels like we should count the responses.

Something like

responses -> join -> split -> flatten -> lowercase -> count -> sort

followed by a human inspection to pick out, for instance, programming languages from other words (and combine synonyms).

>>> from collections import Counter
>>> from itertools import chain
>>> from re import split
>>> responses = ["Python,R,Julia", "R", "Python", "Python and R"]
["Python,R,Julia", "R", "Python", "Python and R"]
>>> Counter(chain.from_iterable(split(',| ', item) for item in responses))
Counter({'Python': 3, 'R': 3, 'Julia': 1, 'and': 1})

That ranking will give us an impression of what is most popular.

We could then look at votes/total responses or similar.

I think this should help for your category 2 and 3.

I think word clouds will be too qualitative to make decisions on (but would be useful for sharing out results).

(What if someone says "not Python"?)

@drchriscole
Copy link

There will be some questions (open ended comments) which will need a human interpreter, otherwise an attempt to categorise responses makes sense.

@edwardchalstrey1
Copy link
Author

@JimMadge your idea is roughly what I've been doing for languages. I've also found that coming up with some bespoke rules on eyeballing the data is needed.

So the process is like

  1. some combo of join -> split -> flatten -> lowercase -> count -> sort
  2. extra logic specific to edge cases, rather than attempting a fully automated tested pipeline

Step 2 is dataset specific, and we wouldn't want to do this if we wanted it to be reproducible with new survey data, but this is just being done once, so we don't care

@JimMadge
Copy link
Member

Step 2 is dataset specific, and we wouldn't want to do this if we wanted it to be reproducible with new survey data, but this is just being done once, so we don't care

I think not entirely and it would still be good to share what those extra steps are.

We could for example have lists of synonyms e.g. "PyPI == Python Package Index" and how those are applied. It wouldn't be perfect but if we have a few more responses it would probably still work and be a good starting point for further work.

@edwardchalstrey1
Copy link
Author

edwardchalstrey1 commented May 17, 2023

from @harisood

Survey responses (free text, easily categorisable) example - this isn't actual analysis, just an example of what results could look like:

Programming language support

Summary

The percentage splits of responses to the SATRE survey question of what programming languages should be supported in a TRE.

Detail

Survey Results

Option % Responses
Sorted from largest to smallest percentage
Python x%
R y%
C# z%
... ...

Summary blurb, e.g.:
The community strongly favoured support for Python and R, with a variety of other languages less called for

Where

Specification features

Proposal

  • Include Python and R support in 'required' features for the specification
  • Provide a list of languages to support in 'Optional' features for the specification

@harisood harisood changed the title Questionnaire summary: Decide a common approach on how to handle free text Questionnaire summary: Decide a common approach on how to handle free text (answers from a definable set) May 17, 2023
@harisood harisood changed the title Questionnaire summary: Decide a common approach on how to handle free text (answers from a definable set) Questionnaire summary: Free text (answers from a definable set) May 17, 2023
@harisood harisood changed the title Questionnaire summary: Free text (answers from a definable set) Questionnaire summary: Decide process for handling free text (answers from a definable set) May 17, 2023
@harisood harisood added the WP1 Work package 1 work label May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WP1 Work package 1 work
Projects
Status: No status
Development

No branches or pull requests

5 participants