Skip to content

Latest commit

 

History

History
866 lines (623 loc) · 64.7 KB

effective-engineer.md

File metadata and controls

866 lines (623 loc) · 64.7 KB

Part 1: Adopt the Right Mindsets

Working more hours isn't the most effective way to increase output. In fact, working too many hours leads to decreased productivity and burnout. Output may even turn out to be negative when it's necessary to repair the mistakes made by overworked and fatigued engineers. Need to be able to identify which activities produce more impact with smaller time investments.

What makes an effective engineer? Intuitively, we have some notion of which engineers we consider to be effective. They're the people who get things done. They're the ones who ship products that users love, launch features that customers pay for, build tools that boost team productivity, and deploy systems that help companies scale. Effective engineers produce results.

But if they took too long to accomplish these tasks, then we might hesitate to call them effective. They might be hard-working, but we would consider someone who produced the same results in less time and with fewer resources to be more effective. Effective engineers, therefore, also get things done efficiently.

Efficiency alone doesn't guarantee effectiveness, however. An engineer who efficiently builds infrastructure that can scale to millions of requests for an internal tool that would be used by at most a hundred people isn't effective. Nor is someone who builds a feature that only 0.1% of users adopt, when other features could reach 10% adoption—unless that 0.1% generates disproportionately more business value. Effective engineers focus on value and impact—they know how to choose which results to deliver.

Leverage is defined by a simple equation. It's the value, or impact, produced per time invested. Put another way, leverage is the return on investment (ROI) for the effort that's put in. Effective engineers aren't the ones trying to get more things done by working more hours. They're the ones who get things done efficiently—and who focus their limited time on the tasks that produce the most value. They try to increase the numerator in that equation while keeping the denominator small. Leverage is critical because time is your most limited resource. Unlike other resources, time cannot be stored, extended, or replaced. The limitations of time are inescapable, regardless of your goals.

Impact Produced  Greek mathematician and engineer Archimedes once declared, “Give me a place to stand, and a lever long enough, and I shall move the world.”. It can be hard to move a huge boulder by yourself, but with a powerful enough lever, you can move almost anything. High-leverage activities behave similarly, letting you amplify your limited time and effort to produce much more impact.

Your overall leverage—the amount of value that you produce per unit time—can only be increased in three ways:

  1. By reducing the time it takes to complete a certain activity.
  2. By increasing the output of a particular activity.
  3. By shifting to higher-leverage activities. These three ways naturally translate into three questions we can ask ourselves about any activity we're working on:
  4. How can I complete this activity in a shorter amount of time?
  5. How can I increase the value produced by this activity?
  6. Is there something else that I could spend my time on that would produce more value?

Figure 1: Leverage of different activities during a typical workday.

Leverage

To increase the leverage of each activity, ask yourself the previous three questions, each of which leads to a different avenue of potential improvements. For example, you might have a one-hour meeting that you've scheduled with your team to review their progress on a project. You can increase the meeting's leverage by:

  1. Defaulting to a half-hour meeting instead of a one-hour meeting to get the same amount done in less time.
  2. Preparing an agenda and a set of goals for the meeting and circulating them to attendees beforehand so that the meeting is focused and more productive.
  3. If an in-person discussion isn't actually necessary, replacing the meeting with a email discussion and spending the time building an important feature instead.

Or perhaps you're a product engineer, ready to start working on a new customer-facing feature for your company's flagship product. You might in- crease the leverage of your development time by:

  1. Automating parts of the development or testing process that have thus far been done manually, so that you can iterate more quickly.
  2. Prioritizing tasks based on how critical they are for launch so that you maximize the value of what you finally ship.
  3. Talking with the customer support team to gain insights into the customers' biggest pain points, and using that knowledge to understand whether there's another feature you could be working on that would produce even more value with less development effort.

Facebook has a strong hiring culture. Employees view themselves as the guardians of high standards, and hiring is a top priority for both managers and engineers.

Focus on high-leverage activities.

Key Takeaways

  • Use leverage to measure your engineering effectiveness. Focus on what generates the highest return on investment for your time spent.
  • Systematically increase the leverage of your time. Find ways to get an activity done more quickly, to increase the impact of an activity, or to shift to activities with higher leverage.
  • Focus your effort on leverage points. Time is your most limited asset. Identify the habits that produce disproportionately high impact for the time you invest.

Optimizing for learning

Optimizing for learning is a high-leverage activity for the effective engineer

Key Takeaways

  • Own your story. - Focus on changes that are within your sphere of influence rather than wasting energy on blaming the parts that you can't control.
  • Don't shortchange your learning rate. - Learning compounds like interest. The more you learn, the easier it is to apply prior insights and lessons to learn new things.
  • Find work environments that can sustain your growth.
  • Capitalize on opportunities at work to improve your technical skills. - Learn from your best co-workers. Dive into any available educational material provided by your company.
  • Locate learning opportunities outside of the workplace. - Challenge yourself to become better by just 1% a day.

Adopt a Growth Mindset

People adopt one of two mindsets, which in turn affects how they view effort and failure. People with a fixed mindset believe that “human qualities are carved in stone” and that they're born with a predetermined amount of intelligence—either they're smart or they're not. Failure indicates they're not, so they stick with the things they do well—things that validate their intelligence. They tend to give up early and easily, which enables them to point to a lack of effort rather than a lack of ability as causing failure. On the other hand, those with a growth mindset believe that they can cultivate and grow their intelligence and skills through effort. They may initially lack aptitude in certain areas, but they view challenges and failures as opportunities to learn. As a result, they're much less likely to give up on their paths to success.

Prior to joining Box's 30-person engineering team in 2011, Bercovici hadn't even done any full-time web development. She came from a theoretical and math-heavy background at an Israeli university. Engineering interviewers assumed that she didn't enjoy coding, that her PhD provided few practical advantages, and that she didn't know enough about engineering to ramp up quickly. “It's not about apologizing for where your resume doesn't line up but rather telling your story—who you are, what skills you've built, what you're excited about doing next and why,” Bercovici explained

It means accepting responsibility for each aspect of a situation that you can change—anything from improving your conversational skills to mastering a new engineering focus—rather than blaming failures and shortcomings on things outside your control. It means taking control of your own story. It means optimizing for experiences where you learn rather than for experiences where you effortlessly succeed. And it means investing in your rate of learning.

Invest in Your Rate of Learning

Once interest gets added to the principal of a deposit, that interest gets put to work generating future interest, which in turn generates even more future interest. There are three important takeaways from that simple lesson:

  1. Compounding leads to an exponential growth curve. An exponential growth curve looks like a hockey stick. It grows slowly at first, looking flat and almost linear; but then suddenly it transitions to rapid growth.
  2. The earlier compounding starts, the sooner you hit the region of rapid growth and the faster you can reap its benefits.
  3. Even small deltas in the interest rate can make massive differences in the long run

Learning, like interest, also compounds. Therefore, the same three takeaways apply:

  1. Learning follows an exponential growth curve. Knowledge gives you a foundation, enabling you to gain more knowledge even faster. For example, an understanding of recursion provides the basis for many other concepts, like trees and graph searches, which in turn are necessary for understand- ing compilers and network topologies.
  2. The earlier that you optimize for learning, the more time your learning has to compound. A good first job, for example, makes it easier to get a better second job, which then affects future career opportunities.
  3. Due to compounding, even small deltas in your own learning rate make a big difference over the long run. This last point about the compounding returns of intelligence is the least intuitive: we tend to drastically underestimate the impact of small changes on our growth rate.

What will you learn today to improve yourself by 1%? That 1% is a high leverage investment to develop the skills and the knowledge to make use of future opportunities. Invest your time in activities with the highest learning rate.

Seek Work Environments Conducive to Learning

Because we spend so much of our time at work, one of the most powerful leverage points for increasing our learning rate is our choice of work environment. Some work environments are more conducive than others for supporting a high personal and professional growth rate. Here are six major factors to consider when choosing a new job or team and the questions you should be asking for each of them:

  1. Fast growth. At fast-growing teams and companies, the number of problems to solve exceeds available resources, providing ample opportunities to make a big impact and to increase your responsibilities. The growth also makes it easier to attract strong talent and build a strong team, which feeds back to generate even more growth. A lack of growth, on the other hand, leads to stagnation and politics. Employees might squabble over limited opportunities, and it becomes harder to find and retain talent.

Questions to consider

  • What is the weekly or monthly growth rates of core business metrics (e.g., active users, annual recurring revenue, products sold, etc.)?
  • Are the particular initiatives that you'd be working on high priorities, with sufficient support and resources from the company to grow?
  • How aggressively has the company or team been hiring in the past year?
  • How quickly have the strongest team members grown into positions of leadership?
  1. Training. Strong onboarding programs demonstrate that the organization prioritizes training new employees. Any team that understands the value of ramping up new hires as quickly as possible will invest in creating tools, focus areas or initial hands-on development programs. Similarly, a solid mentorship program also indicates that the team prioritizes professional growth.

Questions to consider

  • Is each new person expected to figure things out on his or her own, or is there a more formalized way of onboarding new engineers?
  • Is there formal or informal mentorship?
  • What steps has the company taken to ensure that team members continue to learn and grow?
  • What new things have team members learned recently?
  1. Openness. A growing organization isn't going to figure out the most effective product idea, engineering design, or organizational process on its first attempt. If it can continuously learn and adapt from past mistakes, however, then it stands a much better chance of success. That's more likely to happen if employees challenge each others' decisions and incorporate feedback into future iterations. Look for a culture of curiosity, where everyone is encouraged to ask questions, coupled with a culture of openness, where feedback and information are shared proactively. Reflecting on failed projects, understanding what caused production outages, and reviewing the returns on different product investments all help the right lessons get internalized.

Questions to consider:

  • Do employees know what priorities different teams are working on?
  • Do teams meet to reflect on whether product changes and feature launches were worth the effort? Do they conduct post-mortems after outages?
  • How is knowledge documented and shared across the company?
  • What are examples of lessons that the team has learned?
  1. Pace. A work environment that iterates quickly provides a faster feedback cycle and enables you to learn at a faster rate. Automation tools, lightweight approval processes, and a willingness to experiment accelerate progress. Smaller teams and companies tend to have fewer bureaucratic barriers to getting things done than larger ones. Do push yourself, but also find a pace that's sustainable for you in the long run.

Questions to consider

  • Is moving quickly reflected in the company or engineering values?
  • What tools does the team use to increase iteration speed?
  • How long does it take to go from an idea's conception to launch approval?
  • What percentage of time is spent on maintenance versus developing new products and features?
  1. People. Surrounding yourself with people who are smarter, more talented, and more creative than you means surrounding yourself with potential teachers and mentors. Who you work with can matter more than what you actually do, in terms of your career growth and work happiness.

Questions to consider

  • Do the people who interviewed you seem smarter than you?
  • Are there skills they can teach you?
  • Were your interviews rigorous and comprehensive? Would you want to work with the types of people who would do well on them?
  • Do people tend to work on one-person projects, or are teamwork and cooperation common themes?
  1. Autonomy. The freedom to choose what to work on and how to do it drives our ability to learn—as long as we have the support that we need to use that freedom effectively. At smaller companies, you'll end up wielding significantly more autonomy over the total surface area of product features and responsibilities, but you'll also need to take more ownership of your own learning and growth.

Questions to consider

  • Do people have the autonomy to choose what projects they work on and how they do them?
  • How often do individuals switch teams or projects?
  • What breadth of the codebase can an individual expect to work on over the course of a year?
  • Do engineers participate in discussions on product design and influence product direction?

Dedicate Time on the Job to Develop New Skills

To invest in your own growth, you should carve out your own 20% time. It's more effective to take it in one or two-hour chunks each day rather than in one full day each week, because you can then make a daily habit out of improving your skills. Your productivity may decrease at first (or it might not change much if you're taking time away from web surfing or other distractions), but the goal is to make investments that will make you more effective in the long run.

What should you do with that 20% time?

  • Develop a deeper understanding of areas you're already working on and tools that you already use
  • Gain experience in “adjacent disciplines” that related to your core role and where increased familiarity can make you more self-sufficient and effective, such as:
    • product engineer - product management, user research, or even backend engineering
    • infrastructure engineer - machine learning, database internals, or web development
    • growth engineer - data science, marketing, or behavioral psychology

10 suggestions to take advantage of the resources available to you at work:

  • Study code for core abstractions written by the best engineers at your company. ✅
  • Write more code. ✅
  • Go through any technical, educational material available internally. ✅
  • Master the programming languages that you use. ✅
  • Send your code reviews to the harshest critics.
  • Enroll in classes on areas where you want to improve.
  • Participate in design discussions of projects you're interested in.
  • Work on a diversity of projects. ✅
  • Make sure you're on a team with at least a few senior engineers whom you can learn from. ✅
  • Jump fearlessly into code you don't know. ✅

Always Be Learning

Adopting a growth mindset, not limited to the workplace. Embracing a growth mindset in which you're motivated to learn about the things that excite you.

10 starting points to help inspire a habit of learning outside of the workplace:

  • Learn new programming languages and frameworks. ✅
  • Invest in skills that are in high demand. ✅
  • Read books. ✅
  • Join a discussion group.
  • Attend talks, conferences, and meetups.
  • Build and maintain a strong network of relationships.
  • Follow bloggers who teach.
  • Write to teach.
  • Tinker on side projects. ✅
  • Pursue what you love. ✅

Prioritize Regularly

Regular prioritization is a high-leverage activity, because it determines the leverage of the rest of your time. And working on one task means not working on another. The strategies used to prioritize effectively:

  • Track all our to-dos in a single and easily accessible list.
  • Make pairwise comparisons between what we're doing and what we could be doing instead.
  • Identify what's high-leverage by: focusing on what directly produces value, and focusing on the important and non-urgent.
  • Protect your maker's schedule, and limit the amount of work you have in progress.
  • Fight Procrastination with If-Then Plans

Key Takeaways

  • Write down and review to-dos.
  • Work on what directly leads to value.
  • Work on the important and non-urgent.
  • Reduce context switches.
  • Make if-then plans to combat procrastination.
  • Make prioritization a habit.

Track To-Dos in a Single, Easily Accessible List

Ask yourself on a recurring basis: Is there something else I could be doing that's higher-leverage? The goal is to continuously shift your top priorities toward the ones with the highest leverage, given the information you have.

Focus on What Directly Produces Value

That value is measured in terms of products shipped, users acquired, business metrics moved, or sales made, rather than in terms of hours worked, tasks completed, lines of code written, or meetings attended. Prioritize the ones that produce the most value with the least amount of effort. Don't try to get everything done. Focus on what matters—and what matters is what produces value.

Focus on the Important and Non-Urgent

Screen Shot 2022-08-15 at 16 02 52

Find which of your to-dos fall within Quadrant 2, and de-prioritize Quadrant 3 and 4 activities that aren't important.

Protect Your Maker's Schedule

Engineers need longer and more contiguous blocks of time to be productive than many other professionals. Psychologist Mihály Csíkszentmihályi calls flow, described by people who experience it as “a state of effortless concentration so deep that they lose their sense of time, of themselves, of their problems.”

Preserve larger blocks of focused time in your schedule. Schedule necessary meetings back-to-back or at the beginning or end of your work day, rather than scattering them throughout the day. When ppl ask you for help while you're in the middle of a focused activity, tell them that you'd be happy to do it before or after your breaks or during smaller chunks of your free time. Learn to say no to unimportant activities, such as meetings that don't require your attendance and other low-priority commitments that might fragment your schedule. Protect your time and your maker's schedule.

Limit the Amount of Work in Progress

Deliberate about limiting work in progress. Prioritizing and serializing different projects to maintain strong momentum.

The number of projects that you can work on simultaneously varies from person to person.

Fight Procrastination with If-Then Plans

Identify ahead of time a situation where we plan to do a certain task, such as:

  • if it's after my 3 pm meeting, then I'll investigate this long-standing bug,
  • if it's right after dinner, then I'll watch a lecture on Android development

The “planning creates a link between the situation or cue (the if) and the behavior that you should follow (the then).” When the cue triggers, the then behavior “follows automatically without any conscious intent.”

Make a Routine of Prioritization

Once we're knee-deep working on those tasks, a common pitfall for many engineers is neglecting to revisit those priorities.

Develop your own routine to manage and execute on your own priorities. Iteratively adapt your own and task management software or systems until you find something that works well for you.

An effective way to ensure that this morning prioritization happens is to make it part of your daily routine.

Prioritizing is difficult. It consumes time and energy, and sometimes it doesn't feel productive because you're not creating anything. You don't have to always be prioritizing. But when you have certain personal or professional goals that you want to achieve, you'll find that prioritization has very high leverage.

Part 2: Execute, Execute, Execute

Invest in Iteration Speed

Incorporated continuous deployment (or a variant called continuous delivery, where engineers selectively determine which versions to deploy) into their workflows to quickly deliver small bug fixes, improvements and features to production => build more and learn faster.

Move Fast to Learn Fast

MOVE FAST AND BREAK THINGS - iterating quickly and focusing on impact rather than being conservative and minimizing mistakes. The faster you can iterate, the more you can learn about what works and what doesn't work, build more things and try out more ideas. Moving fast doesn't necessarily mean moving recklessly.

Continuous deployment is but one of many powerful tools for increasing iteration speed. Other options include investing in time-saving tools, improving your debugging loops, mastering your programming work- flows, and, more generally, removing any bottlenecks that you identify.

Invest in Time-Saving Tools

If you have to do something manually more than twice, then write a tool for the third time.

  • Continuous deployment
  • Code compilation speed
  • Switching to languages with interactive programming environments
  • Hot code reloads
  • Continuous integration
  • IDE plugins
  • Build your own tool that saves time -> earns you leeway with your manager and your peers to explore more ideas in the future.

Shorten Your Debugging and Validation Loops

The concept of a minimal, reproducible test case that removes all unnecessary distractions so that more time and energy can be spent on the core issue, and it creates a tight feedback loop so that we can iterate quickly.

Shortcut around normal system behaviors and user interactions when we're testing our products. For example:

  • When finding a bug in the flow for sending an invite to a friend. You could navigate through the same three interactions that every normal user goes through: switching to the friends tab, choosing someone from your contacts, and then crafting an invite message,
  • Or create a much shorter workflowby spending a few minutes wiring up the application so that you're dropped into the buggy part of the invitation flow every time the application launches.

Master Your Programming Environment

Common tasks that can take a wide range of times for different people to complete:

  • Tracking changes in version control
  • Compiling or building code
  • Running a unit test or program
  • Reloading a web page on a development server with new changes
  • Testing out the behavior of an expression
  • Looking up the documentation for a certain function
  • Jumping to a function definition
  • Reformatting code or data in text editor
  • Finding the callers of a function
  • Rearranging desktop windows
  • Navigating to a specific place within a file

Solutions to mastering your programming fundamentals:

  • Get proficient with your favorite text editor or IDE -> Learned some text editor keyboard shortcuts that let us navigate faster.
  • Learn at least one productive, high-level programming language.
  • Get familiar with UNIX (or Windows) shell commands.
  • Prefer the keyboard over the mouse.
  • Automate your manual workflows -> I've manually performed a task three or more times, I start thinking about whether it would be worthwhile to automate it.
  • Test out ideas on an interactive interpreter.
  • Make it fast and easy to run just the unit tests associated with your current changes.

Don't Ignore Your Non-Engineering Bottlenecks

Non-engineering constraints may also hinder your iteration speed. For example: Customer support might be slow at collecting the details for a bug report.

Common type of bottlenecks: Dependency on other people -> Communication is critical for making progress on people-related bottlenecks:

  • Ask for updates and commitments from team members at meetings or daily stand-ups.
  • Periodically check in with that product manager to make sure what you need hasn't gotten dropped
  • Follow up with written communication (email or meeting notes) on key action items and dates that were decided in-person.

Obtaining approval from a key decision maker -> Don't wait until after you've invested massive amounts of engineering time to seek final project approval.

  • Prioritize building prototypes, collecting early data, conducting user studies, or whatever else is necessary to get preliminary project approval.
  • Explicitly ask the decision makers what they care about the most, so that you can make sure to get those details right.
  • Talk with the product managers, designers, or other leaders who have worked closely with them and who might be able to provide insight into their thought processes.

Review processes -> Plan ahead, again communication is key.

  • Expend slightly more effort in coordination; it could make a significant dent in your iteration speed.
  • Get the ball rolling on the requirements in your launch checklist, and don't wait until the last minute to schedule necessary reviews.

"Premature optimization is the root of all evil" -> Find out the biggest bottlenecks and solve them is far more important, i.e. building continuous deployment while the bottle neck is the review UI process can't speed up the delivery process.

Key Takeaways

  • The faster you can iterate, the more you can learn.
  • Invest in tooling.
  • Optimize your debugging workflow.
  • Master the fundamentals of your craft.
  • Take a holistic view of your iteration loop.

Measure What You Want to Improve

Google Search metrics 's example about long click and short click for evaluating user happiness with search result demonstrates the power of a well-chosen metric and its ability to tackle a wide range of problems to not only measure progress but also to drive it.

Use Metrics to Drive Progress

Metrics goals:

  • Help you focus on the right things.
  • Help guard against future regressions.
  • Can drive forward progress. -> Performance ratcheting that they now use to address this problem and apply downward pressure on performance metrics.
  • Measure your effectiveness over time -> Compare the leverage of what you're doing against other activities you could be doing instead.

Questions should be asked:

  • Is there some way to measure the progress of what I'm doing?
  • If a task I'm working on doesn't move a core metric, is it worth doing? Or is there a missing key metric?

Pick the Right Metric to Incentivize the Behavior You Want

The right metric functions as a North Star, aligning team efforts toward a common goal.

  • Hours worked per week vs. productivity per week. -> Ultimately, attempting to increase output by increasing hours worked per week is unsustainable.
  • Click-through rates vs. long click-through rates.
  • Average response times vs. 95th or 99th percentile response times. -> For decreasing the average, you'll focus more on general infrastructure improvements that can shave off milliseconds from all requests. To decrease the 95th or 99th percentile, however, you'll need to hunt down the worst-case behaviors in your system.
  • Bugs fixed vs. bugs outstanding. -> Tracking the number of outstanding bugs is better since developers will be less rigorous about testing when building new features when just focusing on bugs fixed number.
  • Registered users vs. weekly growth rate of registered users. -> weekly growth rate of registered users shows whether growth is slowing down.
  • Weekly active users vs. weekly active rate by age of cohort. -> Weekly active rate by age of cohort helps measure the fraction of users who are still weekly actives the nth week after signing up, and track how that number changes over time.

What you don't measure is important as well.

When deciding which metrics to use, choose ones that:

  1. maximize impact. -> economic denominator that helps answer: “If you could pick one and only one ratio—profit per x ...—to systematically increase over time, what x would have the greatest and most sustainable impact on your economic engine?”
  2. are actionable. -> actionable metric is one whose movements can be causally explained by the team's efforts. In contrast, vanity metrics track gross numbers like page views per month, total registered users, or total paying customers.
  3. are responsive yet robust. -> responsive metric updates quickly enough to give feedback about whether a given change was positive or negative, so that your team can learn where to apply future efforts. However, a metric also needs to be robust enough that external factors outside of the team's control don't lead to significant noise.

Instrument Everything to Understand What's Going On

Examples: The goal of airline pilots is to fly their passengers from point A to point B, as measured by distance to their destination but they do not fly blind—they have sets of instruments to understand and monitor the state of their aircraft.

Since we often don't know everything that we want to measure ahead of time. Therefore, we need to build flexible tools and abstractions that make it easy to track additional metrics.

Etsy, a company that sells handmade crafts online, does this exceptionally well. The engineering team instruments their web application according to their philosophy of “measure anything, measure everything.” To do this effectively, they use a system called Graphite that supports flexible, real-time graphing, 30 and a library called StatsD for aggregating metrics.

At Google, site reliability engineers use a monitoring system called Borgmon to collect, aggregate, and graph metrics and to send alerts when it detects anomalies.

Twitter built a distributed platform called Observability to collect, store, and present a volume of 170 million individual metrics per minute.

LinkedIn developed a graphing and analytics system called inGraphs that lets engineers view site dashboards, compare metrics over time, and set up threshold-based alerts, all with a few lines of configuration.

Open-source tools like Graphite, StatsD, InfluxDB, Ganglia, Nagios, and Munin make it easy to monitor systems in near real-time. Teams who want a managed, enterprise solution have options like New Relic or AppDynamics that can quickly provide code-level performance visibility into many standard platforms.

Internalize Useful Numbers

Measuring the goals you want to achieve and instrumenting the systems that you want to understand are high-leverage activities. Ensuring you have access to a few useful numbers to approximate your progress and benchmark your performance is a high-leverage investment: they provide the benefits of metrics at a much lower cost.

List of 13 numbers that every engineer ought to know:

Access Type Latency
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 100 ns
Main memory reference 100 ns
Compress 1K bytes with Snappy 10,000 ns = 10 μs
Send 2K bytes over 1 Gbps network 20,000 ns = 20 μs
Read 1 MB sequentially from memory 250,000 ns = 250 μs
Round trip within same datacenter 500,000 ns = 500 μs
Disk seek 10,000,000 ns = 10 ms
Read 1 MB sequentially from network 10,000,000 ns = 10 ms
Read 1 MB sequentially from disk 30,000,000 ns = 30 ms
Send packet CA → Netherlands → CA 150,000,000 ns = 150 ms

Knowing useful numbers like these enables you, with a few back-of-the-envelope calculations, to quickly estimate the performance properties of a design without actually having to build it.

Internalizing useful numbers can also help you spot anomalies in data measurements.

Knowledge of useful numbers can clarify both the areas and scope for improvement.

Other numbers that might be useful to internalize or at least have readily at hand include: the number of registered users, weekly active users, and monthly users

  • the number of requests per second
  • the amount and total capacity of data stored
  • the amount of data written and accessed daily
  • the number of servers needed to support a given service
  • the throughput of different services or endpoints
  • the growth rate of traffic
  • the average page load time
  • the distribution of traffic across different parts of a product
  • the distribution of traffic across web browsers, mobile devices, and operating system versions

Be Skeptical about Data Integrity

Using data to support your arguments is powerful. The right metric can slice through office politics, philosophical biases, and product arguments, quickly resolving discussions.

Unfortunately, the wrong metric can do the same thing—with disastrous results. -> For example, we see users spending more time on a newly redesigned feature and optimistically attribute it to increased engagement but in reality, they are struggling to understand a confusing interface.

Investing the effort to ensure that your data is accurate is high-leverage. Here are some strategies that you can use to increase confidence in your data integrity:

  • Log data liberally, in case it turns out to be useful later on.
  • Build tools to iterate on data accuracy sooner.
  • Write end-to-end integration tests to validate your entire analytics pipeline.
  • Examine collected data sooner.
  • Cross-validate data accuracy by computing the same metric in multiple ways.
  • When a number does look off, dig in to it early.

Key Takeaways

  • Measure your progress.
  • Carefully choose your top-level metric.
  • Instrument your system.
  • Know your numbers.
  • Prioritize data integrity.

Validate Your Ideas Early and Often

Validate our ideas both early and often helps us get the right things done. Find low-effort and iterative ways to validate that we're on the right track and to reduce wasted effort. Use A/B testing to continuously validate our product changes with quantitative data. Examine a common anti-pattern — the one-person team. Build feedback and validation loops applies to every decision we make.

Find Low-Effort Ways to Validate Your Work

In large projects, we should continually ask ourselves: Can I expend a small fraction of the total effort to collect some data and validate that what I'm doing will work?

The MVP - called as a “minimum viable product” - is "a version of a new product which allows a team to collect the maximum amount of validated learning about customers with the least effort.”

The strategy of faking the full implementation of an idea to validate whether it will work is extremely powerful.

Continuously Validate Product Changes with A/B Testing

In an A/B test, a random subset of users sees a change or a new feature. Articulate a hypothesis, construct an A/B test to validate the hypothesis, and then iterate based on what they learned.

For example, they hypothesized that “showing a visitor more marketplace items would decrease bounce rate,” ran an experiment to show images of similar products at the top of the listing page, and analyzed whether the metrics supported or rejected the hypothesis (in fact, it reduced bounce rate by nearly 10%). Based on that experiment, the team learned that they should incorporate images of more marketplace products into their final design.

When deciding what to A/B test, time is your limiting resource. Hone into differences that are high-leverage and practically significant, the ones that actually matter for your particular scale.

Beware the One-Person Team

Adds friction to the process of getting feedback—and you need feedback to help validate that what you're doing will work. More demoralizing and less motivating when you're working alone.

Set up the necessary feedback channels to increase the chances of our projects succeeding. Here are some strategies:

  • Be open and receptive to feedback.
  • Commit code early and often.
  • Request code reviews from thorough critics.
  • Ask to bounce ideas off your teammates.
  • Design the interface or API of a new system first.
  • Send out a design document before devoting your energy to your code.
  • If possible, structure ongoing projects so that there is some shared context with your teammates.
  • Solicit buy-in for controversial features before investing too much time. -> floating the idea in conversations and building a prototype to help convince relevant stakeholders.

Build Feedback Loops for Your Decisions

Validation means formulating a hypothesis about what might work, designing an experiment to test it, understanding what good and bad outcomes look like, running the experiment, and learning from the results.

Key Takeaways

  • Approach a problem iteratively to reduce wasted effort.
  • Reduce the risk of large implementations by using small validations.
  • Use A/B testing to continuously validate your product hypotheses.
  • When working on a solo project, find ways of soliciting regular feedback.
  • Adopt a willingness to validate your decisions.

Improve Your Project Estimation Skills

In 2009, after studying over 50,000 software projects, the Standish Group conclud- ed that 44% of projects are delivered late, overbudget, or missing requirements; 24% fail to complete; and the average slipped project overruns its time budget by 79%.

Take charge of your own project plans and push back against unrealistic schedules. Decompose project estimates to increase accuracy. Budget for the unknown. Clearly define a project's scope and establish measurable milestones. Reduce risk as early as possible so that we can adapt sooner. Be careful not to use overtime to sprint toward a deadline if we find ourselves falling behind.

Use Accurate Estimates to Drive Project Planning

“A good estimate, is an estimate that provides a clear enough view of the project reality to allow the project leadership to make good decisions about how to control the project to hit its targets.”

How do we produce accurate estimates that provide us the flexibility we need? Here are some concrete strategies:

  • Decompose the project into granular tasks.
  • Estimate based on how long tasks will take, not on how long you or someone else wants them to take.
  • Think of estimates as probability distributions, not best-case scenarios. -> the most optimistic prediction that has a non-zero probability of coming true.
  • Let the person doing the actual task make the estimate.
  • Beware of anchoring bias. -> Avoid committing to an initial number before actually outlining the tasks involved.
  • Use multiple approaches to estimate the same task. -> 1) decompose the project into granular tasks, estimate each individual task, and create a bottom-up estimate; 2) gather historical data on how long it took to build something similar; and 3) count the number of subsystems you have to build and estimate the average time required for each one.
  • Beware the mythical man-month. -> For example, one woman can give birth to a baby in nine months doesn't mean that nine women can give birth to a baby in one month. As additional members join, the communication overhead from meetings, emails, one-on-ones, discussions, etc., grows quadratically with the size of the team. Moreover, new team members require time to ramp up on a project before they're productive, so don't assume that adding more people will shorten a project timeline.
  • Validate estimates against historical data.
  • Use timeboxing to constrain tasks that can grow in scope. -> Plan instead to allocate a fixed amount of time, or a time box, to open-ended activities.
  • Allow others to challenge estimates.

Adopt a rule of thumb of multiplying our engineering estimates by a factor of 2 to capture unestimated tasks.

Budget for the Unknown

We can better deal with unknowns by acknowledging that the longer a project is, the more likely that an unexpected problem will arise.

Be explicit about how much time per day each member of the team will realistically spend on a given project. For example, Jack Heart, an engineering manager at Asana, explained that the team maps each ideal engineering day to 2 workdays to account for daily interruptions.

Explicitly track the time spent on tasks not initially part of the project plan, in order to build awareness, reduce the chance that these distractions will catch your project plan by surprise.

Define Specific Project Goals and Measurable Milestones

Define specific goals for a project based on the problem you're working to solve, and then use milestones to measure progress on those goals.

The more specific the goal, the more it can help us discriminate between features. Some examples of specific project goals are:

  • To reduce the 95th percentile of user latency for the home page to under 500 milliseconds.
  • To launch a new search feature that lets users filter their results by content type.
  • To port a service from Ruby to C++ to improve performance.
  • To redesign a web application to request configuration parameters from the server.
  • To build offline support for a mobile application so that content is accessible even when there is no cell connection.
  • To A/B test the product checkout flow to increase sales per customer.
  • To develop a new analytics report that segments key metrics by country.

“Each milestone was a very clear point where we had introduced some value that we didn't have before" -> the milestones were measurable; either the system met the criteria and behaved as promised or it didn't.

Reduce Risk Early

Effectively executing on a project means minimizing the risk that a deadline might slip and surfacing unexpected issues as early as possible. -> The goal from the beginning should be to maximize learning and minimize risk, so that we can adjust our project plan if necessary.

a common risk to all large projects comes during system integration, which almost always takes longer than planned. -> How can we reduce integration risk? One effective strategy is to build end-to-end scaffolding and do system testing earlier. Stub out incomplete functions and modules, and assemble an end-to-end system as soon as possible, even if it's only partly functional.

Approach Rewrite Projects with Extreme Caution

Rewrite projects are particularly troublesome for a few reasons:

  • They share the same project planning and estimation difficulties as other software projects.
  • Because we tend to be familiar with the original version, we typically underestimate rewrite projects more drastically than we would an undertaking in a new area.
  • It is easy and tempting to bundle additional improvements into a rewrite. Why not refactor the code to reduce some technical debt, use a more performant algorithm, or redesign this subsystem while we're rewriting the code?
  • When a rewrite is ongoing, any new features or improvements must either be added to the rewritten version (in which case they won't launch until the rewrite completes) or they must be duplicated across the existing version and the new version (in order to get the feature or improvement out sooner). The cost of either option grows with the timeline of the project.

Engineers who successfully rewrite systems tend to do so by converting a large rewrite project into a series of smaller projects. ->For example, invested some time up front building infrastructure to support a hybrid version of the application, one that allowed them to embed HTML5 components within the Flash application.

The next best approach is to break the rewrite down into separate, targeted phases -> take the shortest possible path toward getting the site up and running in Google's data centers.

It's discouraging to write code for earlier phases, knowing that you'll soon be throwing the intermediate code away. But it would be even more demoralizing to miss the target date by a wide margin, delay the launch of new features, or be forced to build urgent functionality twice.

Don't Sprint in the Middle of a Marathon

There are a number of reasons why working more hours doesn't necessarily mean hitting the launch date:

  • Hourly productivity decreases with additional hours worked.
  • You're probably more behind schedule than you think.
  • Additional hours can burn out team members.
  • Working extra hours can hurt team dynamics. -> Not everyone on the team will have the flexibility to pitch in the extra hours. The result can be bitterness or resentment between members of a formerly-happy team.
  • Communication overhead increases as the deadline looms.
  • The sprint toward the deadline incentivizes technical debt.

Increase the probability that overtime will actually accomplish your goals by:

  • Making sure everyone understands the primary causes for why the timeline has slipped this far.
  • Developing a realistic and revised version of the project plan and timeline.
  • Being ready to abandon the sprint if you slip even further from the revised timeline.

Key Takeaways

  • Incorporate estimates into the project plan.
  • Allow buffer room for the unknown in the schedule.
  • Define measurable milestones.
  • Do the riskiest tasks first.
  • Know the limits of overtime.

Part 3: Build Long-Term Value

Balance Quality with Pragmatism

Examine several strategies for building a high-quality code base and consider the trade-offs Cover both the benefits and the costs of code reviews and lay out some ways that teams can review code without unduly compromising iteration speed. How building the right abstraction can manage complexity and amplify engineering output and how generalizing code too soon can slow us down Show how extensive and automated testing makes fast iteration speed possible, and why some tests have higher leverage than others Discuss when it makes sense to accumulate technical debt and when we should repay it

Establish a Sustainable Code Review Process

The benefits of code reviews are obvious. They include:

  • Catching bugs or design shortcomings early.
  • Increasing accountability for code changes.
  • Positive modeling of how to write good code.
  • Sharing working knowledge of the code base.
  • Increasing long-term agility.

Manage Complexity through Abstraction

How the right abstraction increases engineering productivity:

  • It reduces the complexity of the original problem into easier-to-understand primitives.
  • It reduces future application maintenance and makes it easier to apply future improvements.
  • It solves the hard problems once and enables the solutions to be used multiple times.

Building an abstraction for a problem comes with trade-offs:

  • Building a generalized solution takes more time than building one specific to a given problem.
  • It's possible to over-invest in them up front.
  • Building a poor abstraction.

Good abstractions should be:

  • easy to learn
  • easy to use even without documentation
  • hard to misuse
  • sufficiently powerful to satisfy requirements
  • easy to extend
  • appropriate to the audience

Here are some ideas to get you started:

  • Find popular abstractions in your code base at work or from repositories on GitHub. Read through their documentation, dig through their source code, and try extending them.
  • Look through the open source projects at technology companies like Google, Facebook, LinkedIn, and Twitter. Learn why abstractions like Protocol Buffers, Thrift, Hive, and MapReduce have been indispensable to their growth.
  • Study the interfaces of popular APIs developed by Parse, Stripe, Dropbox, Facebook, and Amazon Web Services, and figure out what makes it so easy for developers to build on top of their platforms.

Automate Testing

Error rates over time, with and without automated testing: Error Rate

An effective way to initiate the habit of testing, particularly when working with a large codebase with few automated tests, is to focus on high-leverage tests—ones that can save you a disproportionate amount of time relative to how long they take to write.

Repay Technical Debt

Technical debt refers to all the deferred work that's necessary to improve the health and quality of the codebase and that would slow us down if left unaddressed.

The key to being a more effective engineer is to incur technical debt when it's necessary to get things done for a deadline, but to pay off that debt periodically.

It's up to individual engineers to schedule and prioritize repayment of technical debt against other work.

Not all technical debt is worth repaying. Effective engineers spend their finite time repaying the debt with the highest leverage—code in highly-trafficked parts of the codebase that takes the least time to fix up.

Key Takeaways

  • Establish a culture of reviewing code.
  • Invest in good software abstractions to simplify difficult problems.
  • Scale code quality with automated testing.
  • Manage your technical debt.

Minimize Operational Burden

Examine strategies for minimizing operational burden. Build systems to fail fast makes them easier to maintain. Relentlessly automating mechanical tasks. Make automation idempotent reduces recurring costs. Practice and develop our ability to recover quickly.

Embrace Operational Simplicity

Effective engineers focus on simplicity. Simple solutions impose a lower operational burden because they're easier to understand, maintain, and modify.

Having too complex of an architecture imposes a maintenance cost in a few ways:

  • Engineering expertise gets splintered across multiple systems.
  • Increased complexity introduces more potential single points of failure.
  • New engineers face a steeper learning curve when learning and under- standing the new systems.
  • Effort towards improving abstractions, libraries, and tools gets diluted across the different systems.

The discipline to focus on simplicity provides high leverage. That lesson applies to a variety of scenarios:

  • It's fine to experiment with a new programming language for a prototype or a toy project, but think hard before using it in a new production system.
  • Do your research on opponents of new techs promise that their systems solve the problems.
  • When tackling a new problem, consider whether re-purposing an existing abstraction or tool would be simpler than developing a custom solution.
  • If you're processing large amounts of data, consider whether the data is actually large enough such that you need a distributed cluster, or whether a single, beefy machine will suffice.

Build Systems to Fail Fast

Failing fast doesn't necessarily mean crashing your programs for users. You can take a hybrid approach: use fail-fast techniques to surface issues immediately and as close to the actual source of error as possible; and complement them with a global exception handler that reports the error to engineers while failing gracefully to the end user.

Relentlessly Automate Mechanical Tasks

Engineers automate less frequently than they should, for a few reasons:

  • They don't have the time right now.
  • They suffer from the tragedy of the commons -> in which individuals act rationally according to their own self-interest but contrary to the group's best long-term interests.
  • They lack familiarity with automation tools.
  • They underestimate the future frequency of the task.
  • They don't internalize the time savings over a long time horizon.

Activities where automation can help include:

  • Validating that a piece of code, an interaction, or a system behaves as expected
  • Extracting, transforming, and summarizing data
  • Detecting spikes in the error rate
  • Building and deploying software to new machines
  • Capturing and restoring database snapshots
  • Periodically running batch computations
  • Restarting a web service
  • Checking code to ensure it conforms to style guidelines
  • Training a machine learning model
  • Managing user accounts or user data
  • Adding or removing a server to or from a group of services

Make Batch Processes Idempotent

One technique to make batch processes (a sequence of actions without human intervention) easier to maintain and more resilient to failure is to make them idempotent. An idempotent process produces the same results regardless of whether it's run once or multiple times.

Structuring a batch process so that it's at least retry-able or re-entrant can still help. A retry-able or reentrant process is completed successfully after a previous interrupted call.

Hone Your Ability to Respond and Recover Quickly

Netflix built a system called Chaos Monkey that randomly kills services in its own infrastructure.

Other companies have also adopted strategies for simulating failures and disasters, preparing themselves for the unexpected:

  • Google runs annual, multi-day Disaster Recovery Testing (DiRT) events. They simulate disasters, like earthquakes or hurricanes, that cut the power for entire data centers and offices.
  • At Dropbox, the engineering team often simulates additional load for their production systems.

Ask “what if” questions and work through contingency plans for handling different situations:

  • What if a critical bug gets deployed as part of a release? How quickly can we roll it back or respond with a fix, and can we shorten that window?
  • What if a database server fails? How do we fail over to another machine and recover any lost data?
  • What if our servers get overloaded?
  • What if our testing or staging environments get corrupted?
  • What if a customer reports an urgent issue?

Other aspects of software engineering:

  • What if a manager or other stakeholder at an infrequent review meeting raises objections about the product plan?
  • What if a critical team member gets sick or injured, or leaves?
  • What if users revolt over a new and controversial feature?
  • What if a project slips past a promised deadline?

Key Takeaways

  • Do the simple thing first.
  • Fail fast to pinpoint the source of errors.
  • Automate mechanics over decision-making. Aggressively automate manual tasks to save yourself time.
  • Aim for idempotence and reentrancy. These properties make it easier for you to retry actions in the face of failure.
  • Plan and practice failure modes. Building confidence in your ability to recover lets you proceed more boldly.

Invest in Your Team's Growth

Sink or swim -> No life preserver was coming. I had better figure outhow to stay afloat, fast.

Investing in a positive, smooth onboarding experience is extremely valuable.

Thinking early in your career about how to help your co-workers succeed instills the right habits that in turn will lead to your own success.

The secret to your own career success is to “focus primarily on making everyone around you succeed.”

Make Hiring Everyone's Responsibility

Spent two hours of every day that month talking with a candidate, writing up feedback, and debriefing on whether to make an offer. It was exhausting. But if those 40 hours resulted in even just one additional hire, the 2,000+ hours of output that he or she would contribute per year would more than justify the cost.

A good interview process achieves two goals. First, it screens for the type of people likely to do well on the team. And second, it gets candidates excited about the team, the mission, and the culture.

As an interviewer, your goal is to optimize for questions with high signal-to-noise ratios—questions that reveal a large amount of useful information (signal) about the candidate per minute spent, with little irrelevant or useless data (noise).

An increasing number of companies have shifted toward interviews that include a hands-on programming component.

Here are a few higher-leverage strategies to keep in mind:

  • Take time with your team to identify which qualities in a potential teammate you care about the most: coding aptitude, mastery of programming languages, algorithms, data structures, product skills, debugging, communication skills, culture fit, or something else.
  • Periodically meet to discuss how effective the current recruiting and interview processes are at finding new hires who succeed on the team.
  • Design interview problems with multiple layers of difficulty that you can tailor to the candidate's ability by adding or removing variables and constraints.
  • Control the interview pace to maintain a high signal-to-noise ratio.
  • Scan for red flags by rapidly firing short-answer questions to probe a wide surface area.
  • Periodically shadow or pair with another team member during interviews.
  • Don't be afraid to use unconventional interview approaches if they help you identify the signals that your team cares about. Airbnb, for example, devotes at least two of its interviews to evaluating a candidate's culture fit.

Design a Good Onboarding Process

Training a new engineer for an hour or two a day during his/her first month generates much more organizational impact than spending those same hours working on the product.

How do you create a good onboarding process for your team?

  • First, identify the goals that your team wants to achieve.
  • Second, construct a set of mechanisms to accomplish these goals.

Goals that the process should achieve:

  1. Ramp up new engineers as quickly as possible.
  2. Impart the team's culture and values.
  3. Expose new engineers to the breadth of fundamentals needed to succeed.
  4. Socially integrate new engineers onto the team.

Four main pillars for Quora's onboarding program:

  1. Codelabs. -> A codelab is a document that explains why a core abstraction was designed and how it's used, walks through relevant parts of its code internals, and supplies programming exercises to validate understanding.
  2. Onboarding talks. -> These talks, given by senior engineers on the team, introduced the codebase and site architecture, ex- plained and demoed our different development tools, covered engineering expectations and values around topics like unit testing, and introduced Quora's key focus areas—the things we believed were the most important for new hires to learn.
  3. Mentorship. -> Paired each new hire with a mentor to provide more personalized training during their first few months.
  4. Starter tasks. -> New engineers pushed commits to add themselves to the team page on their first day, and we aimed for each of them to complete a starter task whether it be deploying a bug fix, a small new feature, or a new experiment by the end of the first week.

Share Ownership of Code

To increase shared ownership, reduce the friction that other team members might encounter while browsing, understanding, and modifying code that you write or tools that you build. Here are some strategies:

  • Avoid one-person teams.
  • Review each other's code and software designs.
  • Rotate different types of tasks and responsibilities across the team.
  • Keep code readable and code quality high.
  • Present tech talks on software decisions and architecture.
  • Document your software, either through high-level design documents or in code-level comments.
  • Document the complex workflows or non-obvious workarounds necessary for you to get things done.
  • Invest time in teaching and mentoring other team members.

Build Collective Wisdom through Post-Mortems

After a site outage, a high-priority bug, or some other infrastructure issue, effective teams meet and conduct a detailed post-mortem.

The goal of the post-mortem is not to assign blame, which can be counterproductive to the discussion, but to work together to identify better solutions.

The mission debriefs are time-consuming but invaluable, and the cumulative lessons from 200+ space flights are captured in NASA's comprehensive tome, Flight Rules.

To build their own version of Flight Rules, companies like Amazon and Asana use methodologies like Toyota's “Five Whys” to understand the root cause of operational issues. -> For instance, when the site goes down, they might ask, “Why did the site crash?” Because some servers were overloaded. “Why were they overloaded?” Because a disproportionately high fraction of traffic was hitting a few servers. “Why wasn't traffic more randomly distribut- ed?” Because the requests were all coming from the same customer, and their data is only hosted on those machines

Build a Great Engineering Culture

Engineering culture consists of the set of values and habits shared by people on the team. The culture provides a shared context and a framework for making decisions, which helps teams and organizations adapt more quickly to problems they encounter.

Great engineering cultures could be:

  1. Optimize for iteration speed.
  2. Push relentlessly towards automation.
  3. Build the right software abstractions.
  4. Focus on high code quality by using code reviews.
  5. Maintain a respectful work environment.
  6. Build shared ownership of code.
  7. Invest in automated testing.
  8. Allot experimentation time, either through 20% time or hackathons.
  9. Foster a culture of learning and continuous improvement.
  10. Hire the best.

Key Takeaways

  • Help the people around you be successful.
  • Make hiring a priority.
  • Invest in onboarding and mentoring.
  • Build shared ownership of code.
  • Debrief and document collective wisdom.
  • Create a great engineering culture.

Epilogue

Time is our most finite asset, and leverage—the value we produce per unit time—allows us to direct our time toward what matters most.

Appendix

10 Books Every Effective Engineer Should Read

Recommended Blogs To Follow