-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questionnaire summary: Decide process for handling free text (answers from a definable set) #12
Comments
For free text where we expect (lists of) answers drawn from a set, it feels like we should count the responses. Something like
followed by a human inspection to pick out, for instance, programming languages from other words (and combine synonyms). >>> from collections import Counter
>>> from itertools import chain
>>> from re import split
>>> responses = ["Python,R,Julia", "R", "Python", "Python and R"]
["Python,R,Julia", "R", "Python", "Python and R"]
>>> Counter(chain.from_iterable(split(',| ', item) for item in responses))
Counter({'Python': 3, 'R': 3, 'Julia': 1, 'and': 1}) That ranking will give us an impression of what is most popular. We could then look at I think this should help for your category 2 and 3. I think word clouds will be too qualitative to make decisions on (but would be useful for sharing out results). (What if someone says "not Python"?) |
There will be some questions (open ended comments) which will need a human interpreter, otherwise an attempt to categorise responses makes sense. |
@JimMadge your idea is roughly what I've been doing for languages. I've also found that coming up with some bespoke rules on eyeballing the data is needed. So the process is like
Step 2 is dataset specific, and we wouldn't want to do this if we wanted it to be reproducible with new survey data, but this is just being done once, so we don't care |
I think not entirely and it would still be good to share what those extra steps are. We could for example have lists of synonyms e.g. "PyPI == Python Package Index" and how those are applied. It wouldn't be perfect but if we have a few more responses it would probably still work and be a good starting point for further work. |
from @harisood Survey responses (free text, easily categorisable) example - this isn't actual analysis, just an example of what results could look like: Programming language supportSummaryThe percentage splits of responses to the SATRE survey question of what programming languages should be supported in a TRE. DetailSurvey Results
Summary blurb, e.g.: WhereSpecification features Proposal
|
Handling free-text questions where answers come from a definable set
Notes
The text was updated successfully, but these errors were encountered: