|
| 1 | +--- |
| 2 | +title: "Improve Deployment Infrastructure using 12 Factor: GSoC'23 Final Report 📝" |
| 3 | +date: 2023-08-27 |
| 4 | +author: Vaibhav Upreti |
| 5 | +type: post |
| 6 | +--- |
| 7 | + |
| 8 | +The goal of this blog is to showcase, in detail, the work that [Vaibhav Upreti](https://github.com/VaibhavUpreti) did on |
| 9 | +[CircuitVerse](https://circuitverse.org/) during Google Summer of Code 2023, which took place from May 29, 2023, to 28 August 2023. |
| 10 | + |
| 11 | +**CircuitVerse** is a cool open-source platform which allows users to construct digital logic circuits online. |
| 12 | + |
| 13 | +Table of Contents |
| 14 | +{{< toc >}} |
| 15 | + |
| 16 | +## Project Description 📁 |
| 17 | +___ |
| 18 | + |
| 19 | +The primary objective of my GSoC project was to upgrade CircuitVerse's deployment infrastructure to |
| 20 | +meet the 12 factor standards, that would pave the way for a more efficient, scalable, maintainable, and robust platform. |
| 21 | +The project involved several important tasks, each contributing to the overall enhancement of the platform. |
| 22 | + |
| 23 | +For a detailed description of the project, refer to the [project page](https://summerofcode.withgoogle.com/archive/2023/projects/r2R8ARJ9). |
| 24 | + |
| 25 | +## Accomplishments 📜 |
| 26 | +--- |
| 27 | + |
| 28 | +Here's a concise summary of my achievements: |
| 29 | + |
| 30 | +- This is indeed the first time I've made changes that directly impacted hundreds of thousands of users through large-scale data migrations |
| 31 | +- My changes and optimisations resulted in a direct benefit to the organization by reducing infrastructure costs. |
| 32 | +- Successfully applied 12-factor principles, boosting scalability, reliability, and significantly reducing infrastructure costs. |
| 33 | +- Learnt a great deal from my mentor, a senior software engineer, about Ruby, Rails, software development practices and handling applications in production. |
| 34 | + |
| 35 | +### 1. Make CircuitVerse a 12 Factor Application ⚙️ |
| 36 | + |
| 37 | +I prioritized the implementation of 12 Factor principles throughout the development process. |
| 38 | + |
| 39 | +An achievement was customizing CircuitVerse's Docker image for wider usability, reducing memory consumption(by using [jemalloc](https://jemalloc.net/)) and reducing Docker image build time. |
| 40 | + |
| 41 | +Initialized CircuitVerse runbooks, as suggested by my mentor, which provide comprehensive documentation for production deployment, including all necessary background information. |
| 42 | + |
| 43 | +- [CircuitVerse Runbooks](https://github.com/CircuitVerse/infra/tree/main/runbooks) |
| 44 | + |
| 45 | +### 2. Migrate Assets to AWS ☁️ S3 🪣 |
| 46 | + |
| 47 | +**Large-Scale Migration**: I led the migration of nearly a million assets, including user profile pictures and circuit images from old, deprecated Configuration (CarrierWave, PaperClip) to |
| 48 | +rails solution for handling file uploads called ActiveStorage on AWS S3. |
| 49 | +This transition not only improved storage efficiency but set the stage for seamless expansion. |
| 50 | + |
| 51 | +**My approach**: Ensure zero downtime for users by mirroring uploads to both new(ActiveStorage) and old(Paperclip, CarrierWave) configurations, followed by data migrations and background jobs |
| 52 | +to backfill data. |
| 53 | + |
| 54 | +Initially, we employed the data_migrations approach, maintaining a Redis counter for tracking progress and enhancing logging for insights. |
| 55 | +However, with growing server traffic, memory issues arose, leading us to transition to background jobs via Sidekiq. For this we utilized Shopify's [maintenance_tasks](https://github.com/Shopify/maintenance_tasks) gem, employing a single job to migrate 1000 records. |
| 56 | + |
| 57 | +**Scalability & Cost Reduction**: Migrating to object storage, specifically S3, not only reduced infrastructure costs compared to EBS due to its cost-effectiveness but also ensured scalability, |
| 58 | +making it a preferred choice for storing large volumes of data and accommodating future growth. |
| 59 | + |
| 60 | + |
| 61 | + |
| 62 | +### 3. Improve Observability using OpenTelemetry 🔭 |
| 63 | + |
| 64 | +I configured distributed tracing with OpenTelemetry for CircuitVerse and exported the telemetry data to [jaeger](https://www.jaegertracing.io/) and [new relic](https://newrelic.com/) backend. |
| 65 | +This tracing system provides invaluable insights into our platform's performance, enabling us to identify bottlenecks and enhance user experiences |
| 66 | + |
| 67 | + |
| 68 | +OpenTelemetry's architecture and its utilization in our service- |
| 69 | + |
| 70 | + |
| 71 | + |
| 72 | +**Jaeger Dashboard** |
| 73 | + |
| 74 | + |
| 75 | + |
| 76 | +**New Relic Dashboard** |
| 77 | + |
| 78 | + |
| 79 | +**Inspecting a trace** |
| 80 | + |
| 81 | + |
| 82 | + |
| 83 | + |
| 84 | + |
| 85 | +- [OpenTelemetry Setup & Configuration docs](https://github.com/CircuitVerse/CircuitVerse/tree/master/.otel) |
| 86 | +- [OpenTelemetry runbook](https://github.com/CircuitVerse/infra/tree/main/runbooks/docs/opentelemetry) |
| 87 | + |
| 88 | + |
| 89 | +### 4. Zero Downtime Deployment Pipeline with GitHub Actions and Kamal 🛠️ |
| 90 | + |
| 91 | +Successfully set up a Continous Deployment Pipeline that deploys CircuitVerse Docker images to production using GitHub Actions and [kamal](https://kamal-deploy.org/) with zero downtime. |
| 92 | + |
| 93 | +Kamal uses the dynamic reverse-proxy Traefik to hold requests, while the new app container is started and the old one is stopped — working seamlessly across multiple hosts, using SSHKit to |
| 94 | +execute commands. Originally built for Rails apps, Kamal will work with any type of web app that can be containerized with Docker. |
| 95 | + |
| 96 | + |
| 97 | +The workflow consists of two jobs: |
| 98 | + |
| 99 | +1. **`build-production`**: |
| 100 | +This job builds the Docker image and pushes it to the registry for linux/amd64 and linux/arm64 architectures. |
| 101 | +The build process is optimized using docker buildx caching, significantly reducing build times. |
| 102 | + |
| 103 | +2. **`deploy`**: |
| 104 | +After the build job completes, the deploy workflow requires a review by a repository committer. |
| 105 | +Once approved, it sets up Kamal and deploys the latest Docker image tagged with the GitHub SHA hash from the repository's current origin. |
| 106 | + |
| 107 | + |
| 108 | + |
| 109 | +As we can see in the image above the deploy job has protection rules for the "production" environment in GitHub Actions. When a newer `deploy` job is enqueued, it cancels the previous workflow to ensure the latest image is deployed. |
| 110 | + |
| 111 | + |
| 112 | +In the deploy action, Kamal performs several key tasks: |
| 113 | +1. pulls the image from the registry |
| 114 | +2. runs healthchecks on the servers at `http://localhost:3999/up` route. |
| 115 | +3. If the healthchecks are healthy, Kamal proceeds to swap the existing container with the newer version. |
| 116 | +4. However, if the health check fails, Kamal acquires a lock on the deployment to prevent any conflicts or issues during the update process. |
| 117 | + |
| 118 | +Hence, in CircuitVerse CI workflows, we build Docker images for each pull request to the master branch, helping developers validate their code for production readiness. |
| 119 | + |
| 120 | +**Memory Optimisation**: Configured Jemalloc for Docker image, reducing memory fragmentation. |
| 121 | + |
| 122 | +Deploying CircuitVerse to staging environment successfully. |
| 123 | + |
| 124 | + |
| 125 | + |
| 126 | +**Feeback** |
| 127 | + |
| 128 | + |
| 129 | +- [GitHub Action workflow file](https://github.com/CircuitVerse/CircuitVerse/blob/master/.github/workflows/deploy.yml) |
| 130 | +- [kamal runbooks](https://github.com/CircuitVerse/infra/tree/main/runbooks/docs/kamal) |
| 131 | + |
| 132 | + |
| 133 | +### 5. Monitoring Server with Monit 🔎 |
| 134 | + |
| 135 | +Introduced [Monit](https://mmonit.com/monit/), |
| 136 | +Monit is an open source server monitoring tool, it conducts automatic maintenance and repair and can execute meaningful tasks. |
| 137 | + |
| 138 | +I added Monit configuration for the following services: |
| 139 | +- Sidekiq |
| 140 | +- Procodile |
| 141 | +- Postgres |
| 142 | +- Redis |
| 143 | + |
| 144 | +Monit promptly restarts services and sends SMTP alerts when a service goes down or reaches its alert limit |
| 145 | + |
| 146 | +**Monit Alerts** |
| 147 | + |
| 148 | + |
| 149 | +- [Monit configuration files](https://github.com/CircuitVerse/infra/tree/main/runbooks/docs/monit/conf-enabled) |
| 150 | +- [Monit Runbook](https://github.com/CircuitVerse/infra/tree/main/runbooks/docs/monit) |
| 151 | + |
| 152 | +### 6. Drop visitor tracking by storing user details and adopt HyperLogLog for project view counts 🗂️ |
| 153 | + |
| 154 | + |
| 155 | +HyperLogLog is a probabilistic data structure that estimates the cardinality of a set. As a probabilistic data structure, HyperLogLog trades perfect accuracy for efficient space utilization. |
| 156 | +Thus this algorithm can estimate the number of unique values within a very large dataset using little memory and time. |
| 157 | + |
| 158 | +**Transition Strategy**: I evaluated multiple HLL (HyperLogLog) libraries, prioritizing solutions aligned with ease of setup, precision, and strong community support. |
| 159 | + |
| 160 | + |
| 161 | +We had three options: |
| 162 | + |
| 163 | +1. Utilize the postgres-hll extension, incorporating a separate HLL field for projects. |
| 164 | +2. Implement [Redis HyperLogLog](https://redis.io/docs/data-types/probabilistic/hyperloglogs/) |
| 165 | +3. Store HLLs as text in the PostgreSQL database. |
| 166 | + |
| 167 | +Most of the libraries that evaluated HLLs were outdated, hence the idea of storing HLLs as text in the database was temporarily shelved. |
| 168 | +Additionally, others had external dependencies that could complicate setup for new contributors. Using Redis HyperLogLog counters appeared viable(just like GitLab uses HLL counters) but would |
| 169 | +entail higher infrastructure costs. After discussions with my mentor, we decided to exclude this from the program's scope due to the need for further research and potential complexities. |
| 170 | + |
| 171 | +- [Postgres-hll extension approach](https://github.com/VaibhavUpreti/gsoc-work/pull/54) |
| 172 | +- [Text Based HyperLogLog](https://github.com/VaibhavUpreti/gsoc-work/pull/70) |
| 173 | + |
| 174 | + |
| 175 | +## Feedback 📈 |
| 176 | +___ |
| 177 | + |
| 178 | + |
| 179 | + |
| 180 | +## Pull Requests 🔄 & Blogs 📖 |
| 181 | +___ |
| 182 | + |
| 183 | + |
| 184 | +### [Repo - CircuitVerse/CircuitVerse](https://github.com/CircuitVerse/CircuitVerse) |
| 185 | +| Pull Request | Description | |
| 186 | +|--------------------------------------------|----------------------------------------------------------| |
| 187 | +| [fix: erb tags](https://github.com/CircuitVerse/CircuitVerse/pull/3756) | Fix for erb tags in the codebase | |
| 188 | +| [feat: mirror pfp & projects, backfill profile_pictures](https://github.com/CircuitVerse/CircuitVerse/pull/3786) | Added a feature to mirror pfp & projects,while simultaneously backfilled profile_pictures | |
| 189 | +| [feat: migrate image_preview to AWS S3](https://github.com/CircuitVerse/CircuitVerse/pull/3813) | Migration of image_preview to AWS S3 storage | |
| 190 | +| [chore: update rails to 7.0.5.1](https://github.com/CircuitVerse/CircuitVerse/pull/3842) | Updated Rails version to 7.0.5.1 | |
| 191 | +| [fix: use env[] instead of fetch](https://github.com/CircuitVerse/CircuitVerse/pull/3856) | Code fix to use `env[]` instead of `.fetch` | |
| 192 | +| [feat: make member since field more readable](https://github.com/CircuitVerse/CircuitVerse/pull/3938) | Added a feature to make the 'member since' field more readable | |
| 193 | +| [feat: distributed tracing using OpenTelemetry](https://github.com/CircuitVerse/CircuitVerse/pull/3947) | Implemented distributed tracing using OpenTelemetry | |
| 194 | +| [feat: continuous deployment workflow using GitHub Actions and Kamal](https://github.com/CircuitVerse/CircuitVerse/pull/3952) | Added a continuous deployment workflow using GitHub Actions and Kamal | |
| 195 | +| [feat: serve profile_pictures with ActiveStorage](https://github.com/CircuitVerse/CircuitVerse/pull/3960) | Implemented serving profile pictures with ActiveStorage | |
| 196 | +| [chore: disable generating spans for default settings](https://github.com/CircuitVerse/CircuitVerse/pull/3961) | Disabled generating spans for default settings | |
| 197 | +| [fix: commentator profile_picture error](https://github.com/CircuitVerse/CircuitVerse/pull/3966) | Fixed commentator profile_picture error | |
| 198 | +| [chore: rerun image preview migration](https://github.com/CircuitVerse/CircuitVerse/pull/3972) | Reran the image preview migration | |
| 199 | +| [feat: migrate image_preview using Sidekiq](https://github.com/CircuitVerse/CircuitVerse/pull/3984) | Migrated image_preview using Sidekiq | |
| 200 | +| [chore: make maintenance tasks migrations safe](https://github.com/CircuitVerse/CircuitVerse/pull/3993) | Made maintenance tasks migrations safe | |
| 201 | +| [chore: mark maintenance tasks migrations safe](https://github.com/CircuitVerse/CircuitVerse/pull/4000) | Marked maintenance tasks migrations as safe | |
| 202 | +| [feat: deploy CircuitVerse to staging using Kamal](https://github.com/CircuitVerse/CircuitVerse/pull/4001) | Deployed CircuitVerse to staging using Kamal | |
| 203 | +| [feat: Serve assets using active storage](https://github.com/CircuitVerse/CircuitVerse/pull/3860) | Serve Image Preview using ActiveStorage | |
| 204 | +| [feat: production deployment using kamal](https://github.com/CircuitVerse/CircuitVerse/pull/3994) | Deploy CircuitVerse to production using kamal | |
| 205 | + |
| 206 | + |
| 207 | +### [Repo - CircuitVerse/infra](https://github.com/CircuitVerse/infra) |
| 208 | + |
| 209 | +| Pull Request | Description | |
| 210 | +|--------------------------------------------|------------------------------------------------------------| |
| 211 | +| [feat: monit config files #1](https://github.com/CircuitVerse/infra/pull/1) | Added Monit configuration files | |
| 212 | +| [feat: Intialise runbook #3](https://github.com/CircuitVerse/infra/pull/3) | Initialized CircuitVerse runbooks | |
| 213 | +| [docs: distributed tracing using OpenTelemetry #5](https://github.com/CircuitVerse/infra/pull/5) | Documented distributed tracing using OpenTelemetry | |
| 214 | +| [docs: Kamal documentation #6](https://github.com/CircuitVerse/infra/pull/6) | Added Kamal documentation | |
| 215 | + |
| 216 | +### Blog Posts |
| 217 | + |
| 218 | +I published weekly blog posts throughout this period, which you can read at https://vaibhavupreti.github.io/hugo-blog/tags/gsoc |
| 219 | + |
| 220 | +**Featured posts:** |
| 221 | + |
| 222 | +- [My Journey to GSoC 2023 with CircuitVerse.Org: How I Prepared and Applied for the Program](https://vaibhavupreti.github.io/hugo-blog/blog/my-journey-to-gsoc-2023-with-circuitverse/) |
| 223 | +- [The Ultimate Guide to Distributed Tracing: Monitor your Rails app using Opentelemetry, Jaeger and New Relic Agent](https://vaibhavupreti.github.io/hugo-blog/blog/distributed-tracing-opentelemetry/) |
| 224 | + |
| 225 | +- [Community Bonding Period at CircuitVerse.org](https://vaibhavupreti.github.io/hugo-blog/blog/community-bonding-period-gsoc/) |
| 226 | + |
| 227 | +## What's Next 📅 |
| 228 | +--- |
| 229 | +I’m excited to continue as a Core Team member, maintaining this incredible open-source project. |
| 230 | + |
| 231 | +Additionally, we plan to implement a blue-green deployment approach implement the CD pipeline after rigorous testing in the staging environment. |
| 232 | +- `Blue` - older server |
| 233 | +- `Green` - current staging environment |
| 234 | + |
| 235 | +This involves copying the latest production data to staging(latest `pg_dump` and redis data), |
| 236 | +Production traffic will continue on 'blue' until we replicate and scale 'green' to match or exceed its capacity. |
| 237 | +Once performance and stability are confirmed, we'll transition production traffic to 'green', the `staging server` and phase out the older 'blue' instance, ensuring a risk-minimized transition. |
| 238 | + |
| 239 | + |
| 240 | +## Acknowledgments 📝 |
| 241 | +___ |
| 242 | + |
| 243 | +I'm grateful to my mentor [Aboobacker M.K](https://github.com/tachyons) who helped me whenever I faced challenges and never |
| 244 | +overlooked any part of their mentoring. Taught me a lot of stuff around Ruby, Rails and Software Development in general. |
| 245 | +The weekly meetings were exceptionally informative, and I cannot overstate how much I learned through my interactions |
| 246 | +with my mentor. I doubt I will ever encounter a similar experience. Their dedication motivates me to aspire to become |
| 247 | +a software engineer like them and to share my learnings with others. |
| 248 | + |
0 commit comments