Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2021 Dask survey results #193

Merged
merged 8 commits into from
Sep 16, 2021
Merged

Conversation

GenevieveBuckley
Copy link
Contributor

@GenevieveBuckley GenevieveBuckley commented Sep 6, 2021

This PR adds the Dask survey results and report from 2021.

The rendered notebook can be seen here: https://github.com/GenevieveBuckley/dask-examples/blob/survey2021/surveys/2021.ipynb

Cross reference: dask/dask-blog#109

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@GenevieveBuckley
Copy link
Contributor Author

Couple of points:

  • For some reason the separator characters are inconsistent (I'm not talking about the column separators, I'm talking about where users can select multiple options as answers for a single question. In the previous years' datasets those are separated by semicolons, but the spreadsheet I downloaded they are separated by comma-space. Some of the answer options also include commas within them, so this was a bit inconvenient). I just worked around it because that was easier, but if we need to be more strict about what data gets uploaded to this repository, please let me know.
  • I included an XKCD comic (this one) because it felt relevant, but if that makes the tone too unprofessional/casual, let me know there too.

@GenevieveBuckley
Copy link
Contributor Author

Also, one person complained about there being too many questions. They must have been pretty motivated to do that, since we didn't really give a good opening for that kind of feedback. They might have a point: there were 43 questions this year, the survey was roughly 50% longer than last year.

@GenevieveBuckley
Copy link
Contributor Author

@TomAugspurger you might be interested, since you did the survey analysis the last two years

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great. Just spotted a couple of typos.

surveys/2021.ipynb Outdated Show resolved Hide resolved
Copy link
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for working on this @GenevieveBuckley!

surveys/2021.ipynb Outdated Show resolved Hide resolved
surveys/2021.ipynb Outdated Show resolved Hide resolved
@GenevieveBuckley
Copy link
Contributor Author

Thanks Tom!

@GenevieveBuckley
Copy link
Contributor Author

Suggestion from Matt on the blopost PR - add another graph showing that more people are saying many people widespread across their organization use Dask.

(I'll also need to rerun the notebook, to remove the FutureWarning Tom's change has just fixed)

@GenevieveBuckley
Copy link
Contributor Author

GenevieveBuckley commented Sep 11, 2021

Extra note: I saw some good quotes from people in some of the free text responses, it might be nice to add some of those to this post too.

@GenevieveBuckley
Copy link
Contributor Author

GenevieveBuckley commented Sep 13, 2021

In the last year, there has been an increase in the number of people who say that many people throughout their institution use Dask (32 people said this in 2021, compared to 19 in 2020).
Thoughts on trying to convey this visually? This seems like it might tell a nice story.

Interesting points here:

  1. From 2020 to 2021, there is a relatively substantial increase in people who say that beyond their group, many people throughout their institution use Dask. (I think it's relatively substantial because the numbers were so low to begin with, it's jumped from 7.5 or 8 percent up to 13.5 percent)
  2. From 2019 to 2020, there was a relatively substantial drop in people who say their immediate team uses Dask. (Drop is about 10%)
  3. The number of people using Dask mostly on their own looks pretty stable from year to year. The fluctuations appear to be explained by slightly different numbers of survey responses from year to year, not a change in the underlying data.

Some graphs

image

This graph was generated with...

q = 'Do you use Dask as part of a larger group?'
ax = sns.countplot(y=q, hue="Year", data=df.reset_index());
ax.set(ylabel="", title=q);

image

This graph was generated with...

q = 'Do you use Dask as part of a larger group?'
ans = 'Beyond my group, many people throughout my institution use Dask'
ax = sns.countplot(x=q, hue="Year", data=df.reset_index(), order=[ans]);

Why did we see these changes?

To be honest, it's not exactly clear why we saw these changes.

Re 1: the jump in people who say Dask use is widespread across an organisation

  • It doesn't seem as though there is movement from "my immediate team uses Dask" into "Dask is widespread in my organisation"
  • The slight drop in "I use Dask mostly on my own" is roughly equal in size to the increase in people who say Dask is widespread among their organisation, but it seems like an unlikely chain of events if we considered that the direct cause.
  • Perhaps more people from orgs where Dask is widespread answered our survey this year? That might easily cause an over-representation if say, more people heard about it from their colleagues. But I can't test this theory because not that many people gave us information about their workplace (which is reasonable, that's pretty identifying information)
  • I wondered if the existence of Coiled might be a contributing factor, but that theory does not look likely based on the data. Three people mentioned deploying with Coiled, but only one says Dask use is widespread in their org (the other two use Dask mostly on their own). I'm sure Coiled gets used for other things besides straight up deployment, but there's no mention of that in the survey results.
  • For the people who did give us workplace information, the workplaces where people say Dask use is widespread mostly include universities, national labs, and Capital One. Perhaps the people at unis & research institutes are seeing more uptake of Dask by other teams?

Re 2: the drop in people who say Dask is used by their immediate team

@GenevieveBuckley
Copy link
Contributor Author

Ok, waiting on final approval here.

Last remaining question is, do we need this graph too (i.e. is the other one misleading without this context, or am I starting to get a bit in the weeds with details here). If yes, it would be good to talk about why these changes happened, but we really don't know why so ¯\_(ツ)_/¯
graph

@GenevieveBuckley
Copy link
Contributor Author

Update: I spoke to Matt and we're going with this graph that shows information from all years

png

Will publish it shortly.

@GenevieveBuckley
Copy link
Contributor Author

Please merge @dask/maintenance - thank you!

I've scheduled a tweet tomorrow to promote the blogpost, so it'd be really nice if the binder button works by then (it just needs merging here, then it will work).

@jacobtomlinson jacobtomlinson merged commit 9d3c941 into dask:main Sep 16, 2021
@jacobtomlinson
Copy link
Member

Thanks so much @GenevieveBuckley this is a great post!

@GenevieveBuckley GenevieveBuckley deleted the survey2021 branch September 16, 2021 23:16
@GenevieveBuckley
Copy link
Contributor Author

GenevieveBuckley commented Sep 16, 2021

Hmm, the binder badge links for all the survey notebooks are getting 404 errors - does anyone know how I can fix this?
Link to issue #196 with more details - click here

@GenevieveBuckley
Copy link
Contributor Author

Update: have worked out a fix for the binder links.

@jsignell
Copy link
Member

Super interesting! Thanks for putting this together @GenevieveBuckley

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants