Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Telemetry module to (anonymously) collect useful data #285

Closed
3 tasks done
klonos opened this issue Jul 29, 2014 · 84 comments · Fixed by backdrop/backdrop#3704
Closed
3 tasks done

Add a Telemetry module to (anonymously) collect useful data #285

klonos opened this issue Jul 29, 2014 · 84 comments · Fixed by backdrop/backdrop#3704

Comments

@klonos
Copy link
Member

klonos commented Jul 29, 2014

See new Issue Summary here: https://github.com/backdrop-ops/backdropcms.org/wiki/Telemetry-Initiative


Telemetry: (anonymously) collect useful data so that we can make better-informed decisions about what should go into (or be removed from) backdrop core.

I remember the endless debates of whether a certain setting/module/feature should be on or off by default leading to 300+ long issues in d.o. Here are some related d.o issues:

Metrics collected in the initial implementation:

Other related d.o issues:

Recent d.org Telemetry initiative: https://www.drupal.org/project/ideas/issues/2940737

Gathering some key data from end-users about how Drupal is used can give the community and the Drupal Association insights that will help us improve the product roadmap, community programs and outreach efforts by the association, and more. Right now, the only data we receive is a very limited amount of data from the sites that call home to Drupal.org for updates information.

Telemetry initiative: Gathering data about Drupal usage

The goal is to gather data about who uses Drupal, what modules they use, what modules they don't use, maybe basic traffic/load information, what php/db versions are in common use, etc... All of this information could be tremendously helpful in setting direction for the project.

We would want to build a modular telemetry system so that we can gather different kinds of data with each major release, if we want to focus on certain areas of the project for improvement.


Use cases we are targeting to improve:

  • Providing aggregate, anonymiZed data to core developers to help them understand real-world usage patterns with Drupal.
  • Providing the Drupal Association with data about the scope of the Drupal community outside of Drupal.org - how many sites are there really? Who are they?
  • Making all of this telemetry sending opt-out.
  • Making sure that headless sites can be identified as Drupal sites underneath by crawlers, and by the d.o metrics system

PR by @docwilmot (based on @quicksketch's work): backdrop/backdrop#3704

@klonos
Copy link
Member Author

klonos commented Jul 29, 2014

This would help us make more educated decisions in issues like #278, #279 etc. instead of having to estimate 80/20 cases (and have people disputing over the percentages).

@mikemccaffrey
Copy link

mikemccaffrey commented Jul 1, 2016

What is the status of this initiative? Has @quicksketch done any work to track configuration statistics for core yet?

I'd like to track the "Users may log in" setting that was introduced in issue #277. It seems like 95%+ of site would be fine with users logging in with either their username and password, and have no need to restrict it to one or another. Since it may create some confusion (see #1994), we may want to remove that setting in a future version if no one is using it.

@klonos
Copy link
Member Author

klonos commented Jul 2, 2016

@mikemccaffrey that use case and the need to help make educated decisions (instead of doing guesswork) is precisely why this issue here was filed for. It was a thing that greatly bothered me in d.org where decisions were made based on what a group of people thought "most people need/use".

@mikemccaffrey
Copy link

I think that the first thing that we need to determine is what we are going to call this thing that we are building. It seems like when we are describing this functionality, you could use any combination of "statistics", "feedback", "analytics", "logging", or "reporting".

Maybe it would help if we thought about how we are going to present the feature to the end users. What should we ask next to the checkbox to turn it on and off? "Would you like to send anonymous data to backdropcms.org to help inform future product development?"

What do others think? Is there anything in the project module already that does reporting? Should we look there to see what it is called?

@klonos
Copy link
Member Author

klonos commented May 4, 2017

Well, if we get technical about it, then we are not "logging" anything. Not on the actual system where the data gathering is to be performed anyways. The logging part will be made on the b.org side, and even then it's not logging, but rather data storing.

Also, "feedback" to me implies user interaction and not something that is done automatically in the background.

The term "heuristics" was suggested over Gitter. (Ancient Greek: εὑρίσκω, "find" or "discover")

...any approach to problem solving, learning, or discovery that employs a practical method not guaranteed to be optimal or perfect, but sufficient for the immediate goals. Where finding an optimal solution is impossible or impractical, heuristic methods can be used to speed up the process of finding a satisfactory solution. Heuristics can be mental shortcuts that ease the cognitive load of making a decision. Examples of this method include using a rule of thumb, an educated guess, an intuitive judgment, stereotyping, profiling, or common sense.

...although it makes perfect sense etymologically, I'm not sure if most people are familiar with the word or what it means.

"statistics", "analytics" and "reporting" make more sense to me, but these words alone do not provide enough context. Something like "Feature Analytics" perhaps?

@klonos
Copy link
Member Author

klonos commented May 4, 2017

"Would you like to send anonymous data to backdropcms.org to help inform future product development?"

This sounds really good 👍. Perhaps lose the word "data" because people will start wondering what sorts of data. Better to say "statistics" instead I think.

"future product development" is very accurate, but people care more about "features" rather than the general product development, so how about adding that word into play in order to make it more "luring" to keep that checkbox ticked.

Also, change the order of the purpose and what we are asking, because when people reach half-way through that sentence and all they have read is "send data", they might skip reading the rest of it.

Something like this perhaps:

"Would you like to help making better-informed decisions when adding new product features to Backdrop by sending anonymous statistics to backdropcms.org?"

Note that were are not telling them that we will also be using that information in order to be removing certain features 😈 [evil laugh]

@klonos
Copy link
Member Author

klonos commented May 4, 2017

...would also be a great idea to have a "more about this" link that explains what data is being transmitted, the fact that we do not share this information with 3rd parties and more importantly our privacy policy that ensures that the information collected is anonymous and cannot be traced back to the person/site that provides them.

@olafgrabienski
Copy link

not telling them that we will also (...) removing certain features

I guess, that's really a problem if we suggest it's (only) about "adding new product features".

@jenlampton
Copy link
Member

jenlampton commented May 25, 2017

"Would you like to send anonymous data to backdropcms.org to help inform future product development?"

I love this language. Product development doesn't limit us to adding features, but could include removing some, too.

Can we add a link from this issue to the one where we itemized the things we want to be tracking? (Maybe that one was in Project module?)

@jenlampton
Copy link
Member

We're two weeks away from code freeze for 1.8, and with no code here yet to review or revise it's not likely this feature will get done in time. Bumping to 1.9.

@jenlampton jenlampton modified the milestones: 1.9.0, 1.8.0 Aug 17, 2017
@jenlampton jenlampton removed this from the 1.9.0 milestone Jan 15, 2018
@ghost
Copy link

ghost commented Apr 20, 2018

This is something I noticed (in the recent CMS installation comparison video) that Joomla does. Not being at all familiar with Joomla, here's some information I've found that may help in deciding if/how we do this in Backdrop:

My personal opinion is that this would be a good idea, as long as it's done anonymously, and with the users consent (maybe disabled by default?). I also support the idea of linking to a page on BDcms.org specifically discussing this, why we do it, why you can trust us, etc. Maybe even link to the code on Github showing what data we collect?

There's the potential to collect lots of useful information - not just PHP version, Backdrop version, etc., but things like if content revisions are enabled, the site timezone, how often cron runs, etc. (or is that getting too personal?). Also, I like how Joomla provides an API for developers to use that information, giving it back to the community as it were.

@jenlampton jenlampton added this to the 1.11.0 milestone May 24, 2018
@jenlampton jenlampton changed the title (anonymously) Retrieve configuration/settings data so we can make better decisions... [META] (anonymously) Retrieve configuration/settings/data so we can make better decisions... May 24, 2018
@klonos
Copy link
Member Author

klonos commented Jul 1, 2018

Here's what Joomla does:

screen shot 2018-07-02 at 6 38 41 am

Stats Collection in Joomla

Since version 3.5.0

Since Joomla! 3.5 a statistics plugin will submit anonymous data to the Joomla Project. This will only submit the Joomla version, PHP version, database engine and version, and server operating system.

This data is collected to ensure that future versions of Joomla can take advantage of the latest database and PHP features without affecting significant numbers of users. The need for this became clear when a minimum of PHP 5.3.10 was required when Joomla! 3.3 implemented the more secure Bcrypt passwords.

In the interest of full transparency and to help developers this data is publicly available. An API and graphs will show the Joomla version, PHP versions and database engines in use.

If you do not wish to provide the Joomla Project with this information you can disable the plugin called System - Joomla Statistics.

@stpaultim
Copy link
Member

This has been a long time coming...

Nice work everyone.

@quicksketch
Copy link
Member

I merged the core PR at backdrop/backdrop#3704 into 1.x for 1.20.0!

After a few minor hiccups with Project module, I also released Project module 2.2.2, which includes the new project_telemetry sub-module. It is now deployed and visible on BackdropCMS.org here: https://backdropcms.org/project/backdrop/telemetry

@klonos I went ahead and made the minor changes you suggested above as well. Here's the Telemetry Reports page as it exists now:
image

@klonos
Copy link
Member Author

klonos commented Aug 29, 2021

Thanks everyone for their efforts on getting this implemented! 🎉

Here's a list of follow-up tasks that I could think of that we should ideally get done before the final 1.20 release:

  • Create a change record about this
  • Write a blog post about it
  • Update https://backdropcms.org/privacy to include info about what data we gather via Telemetry (or create an entirely separate page for it?)
  • Consider gathering more metrics (see list of relevant issues in the issue summary, or any issues tagged with the needs - usage metrics label)
  • Expose a checkbox for enabling Telemetry in the installer (Separate checking for updates from collecting anonymous data #3168) - thank people for helping us improve Backdrop if they choose to enable it.
  • Consider throwing a message when people disable the Telemetry module, with a link to provide feedback. What made them change their mind?
  • Consider "decoupling" the definition of metrics gathered from core (allow adding/removing metrics without having to wait for another core release). Decoupling may be easy for things that are config values, but harder if PHP code is needed to gather the metrics (and security/risk of this needs to be considered).
  • Establish a procedure and policy around nominating/approving data to be gathered, and figure out a way to let site owners know about any changes (newsletter? notification in the Dashboard?)
  • Work on a page in backdropcms.org that presents gathered data in a useful manner (graphs/charts)
  • Decide whether all hook_telementry_info() and hook_telemetry_data() hooks in core should be added to the telemetry module itself (benefit of that is that if telemetry is disabled, then none of that code will ever run), or if these hooks should be implemented in the respective modules that provide/gather these metrics.
  • ...??? (lets discuss during our next weekly dev meeting)

@klonos
Copy link
Member Author

klonos commented Aug 29, 2021

Should we rename the "MySQL version" metric to "Database version" or "Database information" instead? It seems odd to say "MySQL" when the db is MariaDB (although it derives from MySQL, it is a different product), and we should perhaps consider https://www.silkscreencms.org which adds support for other databases.

@jenlampton jenlampton changed the title [META] Telemetry: (anonymously) collect useful data Add a Telemetry module to (anonymously) collect useful data Sep 15, 2021
@jenlampton
Copy link
Member

@klonos Adding something new doesn't usually get a change record. I think a blog post might be more suitable :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.