-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
KubeCon 2022 --> KubeCon 2023. Basic reports.
Signed-off-by: Matt Young <[email protected]>
- Loading branch information
1 parent
c4423fd
commit 2f302a4
Showing
30 changed files
with
110,112 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
==================================== | ||
GZ File Consolidation Script | ||
==================================== | ||
|
||
Source: /Users/matt/gharchive-cncf/cncf.all | ||
Target: /Users/matt/gharchive-cncf/cncf.byrepo | ||
Dry Run: 0 | ||
Verbose: 1 | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/CommitCommentEvent | ||
dirName: CommitCommentEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/CommitCommentEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/CommitCommentEvent into /Users/matt/gharchive-cncf/cncf.byrepo/CommitCommentEvent-consolidated.gz... | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/CreateEvent | ||
dirName: CreateEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/CreateEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/CreateEvent into /Users/matt/gharchive-cncf/cncf.byrepo/CreateEvent-consolidated.gz... | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/DeleteEvent | ||
dirName: DeleteEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/DeleteEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/DeleteEvent into /Users/matt/gharchive-cncf/cncf.byrepo/DeleteEvent-consolidated.gz... | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/ForkEvent | ||
dirName: ForkEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/ForkEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/ForkEvent into /Users/matt/gharchive-cncf/cncf.byrepo/ForkEvent-consolidated.gz... | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/GollumEvent | ||
dirName: GollumEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/GollumEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/GollumEvent into /Users/matt/gharchive-cncf/cncf.byrepo/GollumEvent-consolidated.gz... | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/IssueCommentEvent | ||
dirName: IssueCommentEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/IssueCommentEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/IssueCommentEvent into /Users/matt/gharchive-cncf/cncf.byrepo/IssueCommentEvent-consolidated.gz... | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/IssuesEvent | ||
dirName: IssuesEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/IssuesEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/IssuesEvent into /Users/matt/gharchive-cncf/cncf.byrepo/IssuesEvent-consolidated.gz... | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/MemberEvent | ||
dirName: MemberEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/MemberEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/MemberEvent into /Users/matt/gharchive-cncf/cncf.byrepo/MemberEvent-consolidated.gz... | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/PublicEvent | ||
dirName: PublicEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/PublicEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/PublicEvent into /Users/matt/gharchive-cncf/cncf.byrepo/PublicEvent-consolidated.gz... | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/PullRequestEvent | ||
dirName: PullRequestEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/PullRequestEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/PullRequestEvent into /Users/matt/gharchive-cncf/cncf.byrepo/PullRequestEvent-consolidated.gz... | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/PullRequestReviewCommentEvent | ||
dirName: PullRequestReviewCommentEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/PullRequestReviewCommentEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/PullRequestReviewCommentEvent into /Users/matt/gharchive-cncf/cncf.byrepo/PullRequestReviewCommentEvent-consolidated.gz... | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/PullRequestReviewEvent | ||
dirName: PullRequestReviewEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/PullRequestReviewEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/PullRequestReviewEvent into /Users/matt/gharchive-cncf/cncf.byrepo/PullRequestReviewEvent-consolidated.gz... | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/PushEvent | ||
dirName: PushEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/PushEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/PushEvent into /Users/matt/gharchive-cncf/cncf.byrepo/PushEvent-consolidated.gz... | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/ReleaseEvent | ||
dirName: ReleaseEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/ReleaseEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/ReleaseEvent into /Users/matt/gharchive-cncf/cncf.byrepo/ReleaseEvent-consolidated.gz... | ||
Processing directory: /Users/matt/gharchive-cncf/cncf.all/WatchEvent | ||
dirName: WatchEvent | ||
outputFile: /Users/matt/gharchive-cncf/cncf.byrepo/WatchEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/cncf.all/WatchEvent into /Users/matt/gharchive-cncf/cncf.byrepo/WatchEvent-consolidated.gz... | ||
Concatenation complete. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
#!/bin/bash | ||
|
||
handle_sigint() { | ||
echo "Caught Ctrl+C, stopping..." | ||
# Perform any necessary cleanup here | ||
exit 1 | ||
} | ||
|
||
# Trap SIGINT and call handle_sigint when it's received | ||
trap 'handle_sigint' SIGINT | ||
|
||
set -euox pipefail | ||
|
||
# ᐅ ./gharchive-concat-daily.sh --help | ||
# Usage: ./gharchive-concat-daily.sh [options] | ||
|
||
# Options: | ||
# -s, --source <dir> Source directory (required) | ||
# -t, --target <dir> Target directory (required) | ||
# -d, --dry-run Perform a dry run without creating files | ||
# -v, --verbose Enable verbose output | ||
# -f, --fast-mode Use faster but less resilient to mix-match compression, concatenation (cat) method | ||
# -p, --use-pigz Use pigz instead of gzip for compression | ||
# -r, --report Generate a report with line counts | ||
# -h, --help Display this help text | ||
|
||
|
||
# ./gharchive-concat-daily.sh --source ~/gharchive-cncf/debug.cncf.all \ | ||
# --target ~/gharchive-cncf/debug.cncf.byrepo \ | ||
# --verbose \ | ||
# --fast-mode > gharchive-concat-daily.log | ||
|
||
./gharchive-concat-daily.sh --source ~/gharchive-cncf/debug.cncf.all \ | ||
--target ~/gharchive-cncf/debug.cncf.byrepo \ | ||
--verbose \ | ||
--fast-mode |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
==================================== | ||
GZ File Consolidation Script | ||
==================================== | ||
|
||
==================================== | ||
GZ File Consolidation Script | ||
==================================== | ||
|
||
Source: /Users/matt/gharchive-cncf/debug.cncf.all | ||
Target: /Users/matt/gharchive-cncf/debug.cncf.byrepo | ||
Dry Run: 0 | ||
Verbose: 1 | ||
Processing directory: /Users/matt/gharchive-cncf/debug.cncf.all/CommitCommentEvent | ||
dirName: CommitCommentEvent | ||
outputFile: /Users/matt/gharchive-cncf/debug.cncf.byrepo/CommitCommentEvent-consolidated.gz | ||
Concatenating files from /Users/matt/gharchive-cncf/debug.cncf.all/CommitCommentEvent into /Users/matt/gharchive-cncf/debug.cncf.byrepo/CommitCommentEvent-consolidated.gz... | ||
Concatenation complete. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
============================================================= | ||
GitHub Archive: combine daily archives into per repo archives | ||
============================================================= | ||
|
||
Creating target directory: /Users/matt/gharchive-cncf/cncf.byrepo |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
# On Classifying Projects and Communities | ||
|
||
<!-- TOC tocDepth:2..3 chapterDepth:2..6 --> | ||
|
||
- [Types of Projects, Contributors, and Communities](#types-of-projects-contributors-and-communities) | ||
- [Project Types](#project-types) | ||
- [Project member types](#project-member-types) | ||
- [Contributor Cohorts (Segmentation)](#contributor-cohorts-segmentation) | ||
- [Project Metrics, Measures, and Attributes](#project-metrics-measures-and-attributes) | ||
|
||
<!-- /TOC --> | ||
|
||
_apart from the diagram, what's below is reproduced from "Working in Public," by Nadia Eghbal ([https://press.stripe.com/working-in-public](https://press.stripe.com/working-in-public))_ | ||
|
||
## Types of Projects, Contributors, and Communities | ||
|
||
### Project Types | ||
|
||
The upper right quadrant (Federations) have the **highest user and contributor growth**, while the lower left quadrant (Toys) have the lowest of both measures. | ||
|
||
![image](./project-types-quadrants.jpg) | ||
|
||
#### Federations | ||
|
||
- rare, impactful, ubiquitous | ||
- ~ < 3% of OSS projects | ||
- outsized impact and adoption | ||
- growth pattern: shard | ||
- complex governance, large scale | ||
|
||
#### Stadiums | ||
|
||
- very low maintainer to user ratio. | ||
- Unlike Federations and Clubs, which exhibit *decentralized communities*, Stadiums typically have a *centralized community topology*. | ||
|
||
- Often enjoy large, sometimes federated user communities and groups, oftentimes replicated and segmented by geography. . | ||
|
||
#### Clubs | ||
|
||
- often contributors are users. | ||
- Niche Languages, Frameworks, Libraries | ||
- domain specific solutions | ||
- analogous to meetups or hobby groups self-selected users, often aligned around a singular axis/dimension of common needs or interests. | ||
- Passionate, dedicated, cadre of contributors. High Net Promoter Score (NPS). | ||
|
||
#### Toys | ||
|
||
- Side Projects, Hackathon outcomes, experiments, personal growth/learning projects. | ||
|
||
### Project member types | ||
|
||
#### Maintainers | ||
|
||
Maintainers are those who are responsible for the future of a project's repository (or repositories), whose decisions affect the project laterally. Maintainers can be thought of as "trustees" or stewards of the project. | ||
|
||
#### Contributors | ||
|
||
Contributors are those who make contributions to a project's repository, ranging from casual to significant, but who aren't responsible for its overall success. | ||
|
||
##### Active Contributors | ||
|
||
(aka "regular" or "long-term" contributors) are considered members of the project, based on their reputation, the consistency of their contributions, or in many cases by explicit declaration from the project's governance mechanism(s) or via fiat. | ||
|
||
##### Casual Contributors | ||
|
||
Also known as drive-by, reactive, or passive contributors. Often motivated by interests of self or employer, commonly presenting with a transactional engagement style. | ||
|
||
#### Users | ||
|
||
Users are those whose primary relationship to a project's repository is to consume or use its code [and/or artifacts]. | ||
|
||
##### Active Users | ||
|
||
Frequently self-identified in ADOPTERS.md or via other declarative mechanisms, and captured in case studies and whitepapers as part of project collateral. Historically (and more generally) a project's maintainers don't have a way to identify users, an expectation shared by Users. | ||
|
||
##### Passive Users | ||
|
||
#### On project member type mobility and fluidity | ||
|
||
TODO: Contributor Ladder, and its utility as a signal type. | ||
|
||
TODO: Reference and/or link to tag-contributor-strategy docs | ||
|
||
### Contributor Cohorts (Segmentation) | ||
|
||
#### What's "Cohort Analysis?" | ||
|
||
> **Cohort analysis** is a kind of [behavioral analytics](https://en.wikipedia.org/wiki/Behavioral_analytics) that breaks the data in a [data set](https://en.wikipedia.org/wiki/Data_set) into related groups before analysis. These groups, or [cohorts](https://en.wikipedia.org/wiki/Cohort_(statistics)), usually share common characteristics or experiences within a defined time-span.^[1]^^[2]^ Cohort analysis allows a company to "see patterns clearly across the life-cycle of a customer (or user), rather than slicing across all customers blindly without accounting for the natural cycle that a customer undergoes."^[3]^ By seeing these patterns of time, a company can adapt and tailor its service to those specific cohorts. While cohort analysis is sometimes associated with a [cohort study](https://en.wikipedia.org/wiki/Cohort_study), they are different and should not be viewed as one and the same. Cohort analysis is specifically the analysis of cohorts in regards to [big data](https://en.wikipedia.org/wiki/Big_data) and [business analytics](https://en.wikipedia.org/wiki/Business_analytics), while in cohort study, data is broken down into similar groups. | ||
|
||
_(source: [https://en.wikipedia.org/wiki/Cohort_analysis](https://en.wikipedia.org/wiki/Cohort_analysis))_ | ||
|
||
#### n-th Time Contributors | ||
|
||
- First Time Contributors | ||
- Second Time Contributors | ||
- Third Time Contributors | ||
|
||
#### Reputation Index | ||
|
||
This is problematic if not done transparently. We might consider generating a number of indices and considering their utility in practice. | ||
|
||
### Project Metrics, Measures, and Attributes | ||
|
||
These form part of a picture, but taken alone, in isolation, or without local, project specific context they are in practice often misunderstood. | ||
|
||
For all given points in time, aggregated by cohort(s) or other dimensions (never by individual): | ||
|
||
- OSS Scorecard Metrics | ||
- Active vs Passive Contributors | ||
- Active vs Passive Users | ||
- Number of open Issues | ||
- Number of open Pull Requests (PR) | ||
- Average time to close an Issue | ||
- Average time to close a PR | ||
- Average time to First Response to Issue | ||
- Average time to First Response to PR | ||
- Granularity of code/issue/pr churn (index over time) | ||
- Patterns of Project Activity over time | ||
- Bus Factor (low number of contributors working on the same areas of code/project over time). | ||
- Popularity (Stars, @mentions, #hashtags) | ||
- Depended Upon (aka PageRank) by other OSS projects | ||
- Depended Upon by Apple Services as correlated via SBOM data. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
<!-- TOC tocDepth:2..3 chapterDepth:2..6 --> | ||
|
||
- [Values: How we use data](#values-how-we-use-data) | ||
- [Sharing Data Creates Transparency, Public Participation, and Collaboration](#sharing-data-creates-transparency-public-participation-and-collaboration) | ||
- [It's how we work with data that really matters](#its-how-we-work-with-data-that-really-matters) | ||
- [How to responsibly use data](#how-to-responsibly-use-data) | ||
- [Axioms](#axioms) | ||
|
||
<!-- /TOC --> | ||
|
||
# On Contributor Privacy | ||
|
||
Open Source Projects are created, managed, and sustained by communities of contributors, maintainers, users, & vendors. As we seek to better understand the size, composition, topology, and shape of open source communities, we must exercise care and caution. The evolution and prevalence of open source software has made it increasingly easy to inadvertently, accidentally and unknowingly violate the privacy of contributors, causing harm and potentially putting people at risk, and in some cases danger. | ||
|
||
We have guidelines and rubrics for understanding how to build secure systems and defined controls to ensure that once built, our systems remain secure. Analogous guidelines and controls for privacy for the project are informed by some of the following ideas, which are presented in | ||
|
||
[Data Action: Using data for public good](https://direct.mit.edu/books/book/4983/Data-ActionUsing-Data-for-Public-Good)_"How to use data as a tool for empowerment rather than oppression"_. | ||
|
||
![data-action-cover](./data-action-cover.jpeg) | ||
_<https://direct.mit.edu/books/book/4983/Data-ActionUsing-Data-for-Public-Good>_ | ||
|
||
|
||
#### Additional Resources | ||
|
||
* Sophia Vargas's talk, "**Design Metric Programs to Respect Contributor Expectations and Promote Safety**" ([video](https://www.youtube.com/watch?v=b3KuTUc_mw0), [sched](https://sched.co/1R2qL)), and found it to be insightful. | ||
|
||
## Values: How we use data | ||
|
||
*Excerpts and quotes taken from “Data Action: Using Data For Public Good” unless otherwise noted.* | ||
|
||
<https://mitpress.mit.edu/9780262545310/data-action> | ||
|
||
### Sharing Data Creates Transparency, Public Participation, and Collaboration | ||
|
||
"Sharing Data does so much more than provide access to information. It **creates trusting relationships, changes power dynamics, teaches us about policies, fosters debate, and helps to generate collaborative knowledge sharing,** all of which are essential to building strong, deliberative communities." *S. Williams, Data Action: Using data for public good, p. 137* | ||
|
||
"Data visualizations help create a narrative around an idea, and **it's the narrative that ultimately has the ability to change people's hearts and minds**. When using data for action, **we must focus on the story we want to tell** with the data." *S. Williams, Data Action: Using data for public good, p. 141* | ||
|
||
### It's how we work with data that really matters | ||
|
||
…big data in its raw form cannot perform on its own; rather how data is transformed and operationalized can change the way we see the world. More specifically, **data can be used for civic action and policy change by communicating with the data clearly and responsibly to expose the hidden patterns and ideologies** to audiences inside and outside the policy arena. Communicating with data in this way requires the ability to **ask the right questions, **find or **collect the appropriate data, analyze and interpret that data, and visualize the results in a way that can be understood by broad audiences.** | ||
|
||
**Combining these methods transforms data** from a simple point on a map **to a narrative that has meaning.** Data is not often processed in this way because data analysts are often not familiar with the techniques that can be used to tell stories with the data ethically and responsibly. | ||
|
||
## How to responsibly use data | ||
|
||
1. We must interrogate the reasons we want to use data and determine the potential for our work to do more harm than good. | ||
|
||
2. Building teams to create narratives around data for action is essential for communicating the results effectively, but team collaboration also helps to make sure no harm is done to the people represented in the data itself. | ||
|
||
3. Building data helps change the power dynamics inherent in controlling and using data, while also having numerous side benefits, such as teaching data literacy. | ||
|
||
4. Coming up with unique ways to acquire, quantify, and model data can expose messages previously hidden from the public eye; however, we must expose ideas ethically, going back to the first principle above. | ||
|
||
5. We must validate the work we do with data by literally observing the phenomenon on the ground and asking those it [affects] to interpret the results. | ||
|
||
6. Sharing data is essential for communicating the need for policy change and generating a debate essential for that work. Data visualizations are effective at doing that. | ||
|
||
7. We must remember that data are people, and we must do them no harm. Regulations help provide standards of practice for the use of data, but they often are not developed in line with technological change; therefore, we must seek to develop our own standards and call upon others to do the same. | ||
|
||
### Axioms | ||
|
||
#### The Purpose for Using Data Analytics Must Be Interrogated | ||
|
||
…analysts must begin by asking policy questions of people with on-the-ground expertise, those who know the issue the best - and by believing ultimately that this collaboration will create smarter models. *p.215* | ||
|
||
#### Building Expert Teams is Essential to Making Data Work for Policy Change | ||
|
||
…working collaboratively with policy experts, communities, and designers is essential to reduce the potential for analytics to guide us toward misleading, unethical, or inaccurate conclusions. But more importantly, building expert teams helps *communicate* the work. | ||
|
||
#### Building Data Changes Power Dynamics and Shapes Communities | ||
|
||
…building data has other benefits: it teaches data literacy, builds communities around shared ideas, and creates media buzz around topics by placing them on the policy agenda. | ||
|
||
#### Quantify Ingeniously, but Remember Data Is Biased by Its Creator | ||
|
||
#### Data Brings Insights to the Public in Dynamic Ways | ||
|
||
#### Data Are People, and We Must Do Them No Harm | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.