- What is a data science working group? How has Code for San Francisco positioned their data science working group, and in general what has worked and what doesn't?
- Successful data science projects start with good and close relationships with relevant stakeholders. In our case, this tends to be local government and/or non-profit agencies that we partner with.
- "Data science and analytics" is only 5 percent of the work. 95% of the work is data engineering, building relationships with stakeholders, good UX/UI design, project management, and many other things.
- Lessons from SF Projects: One thing that has helped grow beyond doing data science work at hackathons to a more sustainable working group model, is focusing on the processes and infrastructure. Things that have helped us are: ++ Reducing "key-person" risk -- this means having multiple leaders, having well documented repositories, a well-managed and up-to-date task management system. ++ Focus on building that relationship with government partners ++ From an infrastructure/data engineering perspective, put a lot of effort into ETL processes and storing your data in an accessible manner. This could mean a centralized database and/or centralized documentation.
- How are we interacting more with government? How can we convince government of the value of data science?
- It's a long relationship and part of it is continuously interacting with government officials and staff to continue building relationships. An example project with United Way was discussed where the project created data viualizations and models that were given, pro-bono, to NGOs. Don't underestimate the power of getting coffee or shooting an email from time to time. Remember, government officials and staff are people too who care about things. Anything we can do to make ther lives easier and not view us as a burden is great.
- How many of us live in cities with a "Chief Data Officer"? The challenges of a small city vs a big city can be very different.
- There should be a focus on educating government about what data science can do. We should understand that there's a huge amount of risk aversion in government processes. Be an advocate and give them tangible examples about how it can improve outcomes.
- We should encourage ourselves to take on projects in a more sustainable manner. If we organize as non-profit organizations, don't be afraid to step up and submit RFP's and deliver great products. Maybe use hacknights as an opportunity to fill out RFPs!
- Where can data science go wrong and how do we guard against it?
- Black box models are dangerous. Especially when using data science for inference and if it's touching people's lives, be careful of what model you choose and be sure to be able to explain the factors that are important in your model.
- Documentation is important. Any good project should be able to explain what their doing.
- Check out "Methods of Math Destruction" which talks about some of the pitfalls of using big data in the wrong way (https://www.amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418815).
- Other Topics that came up but we only briefly touched on/never got to?
- What to do with new members interested in data science, but not enough projects to go around? Related, how to manage a team with many different skill levels?
- How to engage other people who aren't data scientists but can very much add value to a data science project?
- How to get more data out of government without making expensive FOIA requests
- Are there opportunities for a data science "learning" group, to upskill new members?
- 'Data science' not traditionally used for civic purposes
- Not just for internal analytics
- Not just for government efficiency
- To empower the public
- Exploratory Data Analysis (EDA)
- Visualization as a tool for data science insights
- Visualization as end-product for user (government or citizen)
- Collaborations with news organizations
- Combining human stories with data analysis
- Building a scalable model for cities for training
- Data won't be opened without policy, so if framed as 'transparency,' creates antagonistic relationship with municipalities.
- Survey I'm conducting for civic-technologists: http://cvl.io/civj
- ProPublica series on algorithmic bias
- Is is statistics?
- Is it EDA?
- Is it B.I.?
- Is it machine-learning?
- Is it visualization?
- Is it insight?
- The ability to buiold
- The abilitiy to read charts and obtain visual insights, and spot charts that mislead.
- Spreadsheet use
- Mean, median, mode
- P-Value, Z-Value, Regression
- Data Sceintist for LA --