|
| 1 | +# Performance Testing |
| 2 | + |
| 3 | +Why does performance matter? |
| 4 | +1. Speed = Trust! |
| 5 | +2. If the site is performant (especially on routes that get a lot of traffic, then there are less bottlenecks where the thundering herd can take down the site! (App Servers first, then DB server) |
| 6 | + |
| 7 | +# Computer Science Refresher |
| 8 | + |
| 9 | +CPU L1/L2 cache is 10x faster than |
| 10 | +Memory, which is 10x faster than |
| 11 | +disk, which is 10x faster than |
| 12 | +network, which is 10x faster than |
| 13 | +a human doing something |
| 14 | + |
| 15 | +# App Tuning |
| 16 | + |
| 17 | +## Preventative Measures |
| 18 | + |
| 19 | +1. Use `ratelimit` to limit the number of times a specific user can hit a route. |
| 20 | +2. Horizontal scaling of app servers (less easy to do with DB servers) |
| 21 | +3. Use `.cache()` on a query that will take a lot of DB cycles to execute. |
| 22 | + |
| 23 | +## Proactive Measures |
| 24 | + |
| 25 | +1. Limit the number of DB queries per read |
| 26 | +2. Limit the number of in-memory operations that need to be done to serve a request (esp if O(n^2) or above) |
| 27 | +3. Leverage DB indices as much as possible |
| 28 | +4. Cache data on write (instead of generating it on read) |
| 29 | + |
| 30 | + |
| 31 | +# Handy Links |
| 32 | + |
| 33 | +- Datadog Slow Traces - https://app.datadoghq.com/apm/traces?end=1618411302590&paused=false&query=env%3Aprod%20service%3Adjango%20%40duration%3A%3E%3D2s&start=1618407702590&streamTraces=true&topLevelSpansOnly=true |
| 34 | +- DataDog App servers - https://app.datadoghq.com/dashboard/hqu-92q-auf/app-servers?from_ts=1611496890412&is_auto=false&live=true&page=0&to_ts=1611583290412 |
| 35 | +- DataDog App server logs https://app.datadoghq.com/logs?index=%2A&query= |
| 36 | + |
| 37 | +# Tools |
| 38 | +- Use `ngxtop` on a server, in order to see which requests it is serving (esp useful if the app server is overloaded) |
| 39 | +- Use `select * from pg_stat_activity` on the postgres server to see queries that are running on it (useful if DB is overloaded or locked) |
| 40 | +- Python tool you can use to see the exact microseconds: `import time; print(round(time.time(), 2))` that a call is at. Profiling code is a great starting place! DO NOT STAB IN THE DARK LOOKING FOR A PERF ISSUE, PROFILE FIRST! |
| 41 | + |
| 42 | +# How to spin up a new server |
| 43 | + |
| 44 | +1. in aws console take a snapshot of a current app server |
| 45 | +2. create a new server from the snapshot and name is app(N+1) where n is the highest server we have |
| 46 | +3. ssh into it and make sure gunicorn +nginx is running on it. update `/etc/datadog-agent/datadog.yaml` to reflect new servername and then `service datadog-agent restart` |
| 47 | +4. if gunicorn is working, add the server to the [load balancer](https://us-west-2.console.aws.amazon.com/ec2/v2/home?region=us-west-2#TargetGroups:) |
| 48 | +5. sit back, and watch the traffic flow in |
| 49 | +6. if u like your teammates, update g-deploy repo so they know about the server |
| 50 | + |
| 51 | + |
0 commit comments