Skip to content

Commit bbafd9f

Browse files
feat: improve deployment infrastructure final report (#178)
1 parent d8b76f7 commit bbafd9f

File tree

10 files changed

+289
-0
lines changed

10 files changed

+289
-0
lines changed

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,18 @@ hugo new posts/my_new_post.md
2525
# Run Instructions
2626
Start Server: `hugo server -D`
2727

28+
## ShortCodes
29+
30+
Some common Hugo ShortCodes that might come in handy while writing a blog post:
31+
1. Table of Contents
32+
33+
```html
34+
<!-- Use anywhere in markdown -->
35+
{{<toc>}}
36+
```
37+
38+
2. Embed a Video
39+
40+
```html
41+
{{< video src="/path/to/video.mp4" type="video/mp4" preload="auto" >}}
42+
```

assets/scss/_mixins.scss

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,3 +126,26 @@
126126
max-width: 100%;
127127
height: auto;
128128
}
129+
130+
// toc
131+
.toc {
132+
background-color: #f5f5f5;
133+
border: 1px solid #ccc;
134+
padding: 10px;
135+
margin-top: 10px;
136+
margin-bottom: 20px;
137+
}
138+
139+
body .toc a {
140+
display: block;
141+
text-decoration: none;
142+
color: #000 !important; /* Dark black text color with !important */
143+
font-weight: bold !important; /* Make the text bold with !important */
144+
cursor: pointer;
145+
margin-left: 10px; /* Adjust the margin to control the indentation */
146+
margin-bottom: 5px; /* Reduce space between headings */
147+
}
148+
149+
body .toc a:hover {
150+
color: #4CAF50 !important; /* Light green text color on hover with !important */
151+
}
Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
---
2+
title: "Improve Deployment Infrastructure using 12 Factor: GSoC'23 Final Report 📝"
3+
date: 2023-08-27
4+
author: Vaibhav Upreti
5+
type: post
6+
---
7+
8+
The goal of this blog is to showcase, in detail, the work that [Vaibhav Upreti](https://github.com/VaibhavUpreti) did on
9+
[CircuitVerse](https://circuitverse.org/) during Google Summer of Code 2023, which took place from May 29, 2023, to 28 August 2023.
10+
11+
**CircuitVerse** is a cool open-source platform which allows users to construct digital logic circuits online.
12+
13+
Table of Contents
14+
{{< toc >}}
15+
16+
## Project Description 📁
17+
___
18+
19+
The primary objective of my GSoC project was to upgrade CircuitVerse's deployment infrastructure to
20+
meet the 12 factor standards, that would pave the way for a more efficient, scalable, maintainable, and robust platform.
21+
The project involved several important tasks, each contributing to the overall enhancement of the platform.
22+
23+
For a detailed description of the project, refer to the [project page](https://summerofcode.withgoogle.com/archive/2023/projects/r2R8ARJ9).
24+
25+
## Accomplishments 📜
26+
---
27+
28+
Here's a concise summary of my achievements:
29+
30+
- This is indeed the first time I've made changes that directly impacted hundreds of thousands of users through large-scale data migrations
31+
- My changes and optimisations resulted in a direct benefit to the organization by reducing infrastructure costs.
32+
- Successfully applied 12-factor principles, boosting scalability, reliability, and significantly reducing infrastructure costs.
33+
- Learnt a great deal from my mentor, a senior software engineer, about Ruby, Rails, software development practices and handling applications in production.
34+
35+
### 1. Make CircuitVerse a 12 Factor Application ⚙️
36+
37+
I prioritized the implementation of 12 Factor principles throughout the development process.
38+
39+
An achievement was customizing CircuitVerse's Docker image for wider usability, reducing memory consumption(by using [jemalloc](https://jemalloc.net/)) and reducing Docker image build time.
40+
41+
Initialized CircuitVerse runbooks, as suggested by my mentor, which provide comprehensive documentation for production deployment, including all necessary background information.
42+
43+
- [CircuitVerse Runbooks](https://github.com/CircuitVerse/infra/tree/main/runbooks)
44+
45+
### 2. Migrate Assets to AWS ☁️ S3 🪣
46+
47+
**Large-Scale Migration**: I led the migration of nearly a million assets, including user profile pictures and circuit images from old, deprecated Configuration (CarrierWave, PaperClip) to
48+
rails solution for handling file uploads called ActiveStorage on AWS S3.
49+
This transition not only improved storage efficiency but set the stage for seamless expansion.
50+
51+
**My approach**: Ensure zero downtime for users by mirroring uploads to both new(ActiveStorage) and old(Paperclip, CarrierWave) configurations, followed by data migrations and background jobs
52+
to backfill data.
53+
54+
Initially, we employed the data_migrations approach, maintaining a Redis counter for tracking progress and enhancing logging for insights.
55+
However, with growing server traffic, memory issues arose, leading us to transition to background jobs via Sidekiq. For this we utilized Shopify's [maintenance_tasks](https://github.com/Shopify/maintenance_tasks) gem, employing a single job to migrate 1000 records.
56+
57+
**Scalability & Cost Reduction**: Migrating to object storage, specifically S3, not only reduced infrastructure costs compared to EBS due to its cost-effectiveness but also ensured scalability,
58+
making it a preferred choice for storing large volumes of data and accommodating future growth.
59+
60+
61+
62+
### 3. Improve Observability using OpenTelemetry 🔭
63+
64+
I configured distributed tracing with OpenTelemetry for CircuitVerse and exported the telemetry data to [jaeger](https://www.jaegertracing.io/) and [new relic](https://newrelic.com/) backend.
65+
This tracing system provides invaluable insights into our platform's performance, enabling us to identify bottlenecks and enhance user experiences
66+
67+
68+
OpenTelemetry's architecture and its utilization in our service-
69+
![Otel-arch](/images/vaibhav-upreti/otel-arch.png)
70+
71+
72+
**Jaeger Dashboard**
73+
74+
![Otel-arch](/images/vaibhav-upreti/jaeger-dashboard.png)
75+
76+
**New Relic Dashboard**
77+
![new-relic-dashboard](/images/vaibhav-upreti/new-relic-otel-dashboard.png)
78+
79+
**Inspecting a trace**
80+
81+
![Otel-arch](/images/vaibhav-upreti/jaeger-trace-inspect-deep.png)
82+
![Otel-arch](/images/vaibhav-upreti/jaeger-trace-inspect.png)
83+
84+
85+
- [OpenTelemetry Setup & Configuration docs](https://github.com/CircuitVerse/CircuitVerse/tree/master/.otel)
86+
- [OpenTelemetry runbook](https://github.com/CircuitVerse/infra/tree/main/runbooks/docs/opentelemetry)
87+
88+
89+
### 4. Zero Downtime Deployment Pipeline with GitHub Actions and Kamal 🛠️
90+
91+
Successfully set up a Continous Deployment Pipeline that deploys CircuitVerse Docker images to production using GitHub Actions and [kamal](https://kamal-deploy.org/) with zero downtime.
92+
93+
Kamal uses the dynamic reverse-proxy Traefik to hold requests, while the new app container is started and the old one is stopped — working seamlessly across multiple hosts, using SSHKit to
94+
execute commands. Originally built for Rails apps, Kamal will work with any type of web app that can be containerized with Docker.
95+
96+
97+
The workflow consists of two jobs:
98+
99+
1. **`build-production`**:
100+
This job builds the Docker image and pushes it to the registry for linux/amd64 and linux/arm64 architectures.
101+
The build process is optimized using docker buildx caching, significantly reducing build times.
102+
103+
2. **`deploy`**:
104+
After the build job completes, the deploy workflow requires a review by a repository committer.
105+
Once approved, it sets up Kamal and deploys the latest Docker image tagged with the GitHub SHA hash from the repository's current origin.
106+
107+
![kamal-job](/images/vaibhav-upreti/kamal-job.png)
108+
109+
As we can see in the image above the deploy job has protection rules for the "production" environment in GitHub Actions. When a newer `deploy` job is enqueued, it cancels the previous workflow to ensure the latest image is deployed.
110+
111+
112+
In the deploy action, Kamal performs several key tasks:
113+
1. pulls the image from the registry
114+
2. runs healthchecks on the servers at `http://localhost:3999/up` route.
115+
3. If the healthchecks are healthy, Kamal proceeds to swap the existing container with the newer version.
116+
4. However, if the health check fails, Kamal acquires a lock on the deployment to prevent any conflicts or issues during the update process.
117+
118+
Hence, in CircuitVerse CI workflows, we build Docker images for each pull request to the master branch, helping developers validate their code for production readiness.
119+
120+
**Memory Optimisation**: Configured Jemalloc for Docker image, reducing memory fragmentation.
121+
122+
Deploying CircuitVerse to staging environment successfully.
123+
124+
![cv-staging](/images/vaibhav-upreti/cv-staging.png)
125+
126+
**Feeback**
127+
![staging-feedback](/images/vaibhav-upreti/staging-feedback.png)
128+
129+
- [GitHub Action workflow file](https://github.com/CircuitVerse/CircuitVerse/blob/master/.github/workflows/deploy.yml)
130+
- [kamal runbooks](https://github.com/CircuitVerse/infra/tree/main/runbooks/docs/kamal)
131+
132+
133+
### 5. Monitoring Server with Monit 🔎
134+
135+
Introduced [Monit](https://mmonit.com/monit/),
136+
Monit is an open source server monitoring tool, it conducts automatic maintenance and repair and can execute meaningful tasks.
137+
138+
I added Monit configuration for the following services:
139+
- Sidekiq
140+
- Procodile
141+
- Postgres
142+
- Redis
143+
144+
Monit promptly restarts services and sends SMTP alerts when a service goes down or reaches its alert limit
145+
146+
**Monit Alerts**
147+
![monit-alerts](/images/vaibhav-upreti/monit-alerts.png)
148+
149+
- [Monit configuration files](https://github.com/CircuitVerse/infra/tree/main/runbooks/docs/monit/conf-enabled)
150+
- [Monit Runbook](https://github.com/CircuitVerse/infra/tree/main/runbooks/docs/monit)
151+
152+
### 6. Drop visitor tracking by storing user details and adopt HyperLogLog for project view counts 🗂️
153+
154+
155+
HyperLogLog is a probabilistic data structure that estimates the cardinality of a set. As a probabilistic data structure, HyperLogLog trades perfect accuracy for efficient space utilization.
156+
Thus this algorithm can estimate the number of unique values within a very large dataset using little memory and time.
157+
158+
**Transition Strategy**: I evaluated multiple HLL (HyperLogLog) libraries, prioritizing solutions aligned with ease of setup, precision, and strong community support.
159+
160+
161+
We had three options:
162+
163+
1. Utilize the postgres-hll extension, incorporating a separate HLL field for projects.
164+
2. Implement [Redis HyperLogLog](https://redis.io/docs/data-types/probabilistic/hyperloglogs/)
165+
3. Store HLLs as text in the PostgreSQL database.
166+
167+
Most of the libraries that evaluated HLLs were outdated, hence the idea of storing HLLs as text in the database was temporarily shelved.
168+
Additionally, others had external dependencies that could complicate setup for new contributors. Using Redis HyperLogLog counters appeared viable(just like GitLab uses HLL counters) but would
169+
entail higher infrastructure costs. After discussions with my mentor, we decided to exclude this from the program's scope due to the need for further research and potential complexities.
170+
171+
- [Postgres-hll extension approach](https://github.com/VaibhavUpreti/gsoc-work/pull/54)
172+
- [Text Based HyperLogLog](https://github.com/VaibhavUpreti/gsoc-work/pull/70)
173+
174+
175+
## Feedback 📈
176+
___
177+
178+
![vu-midterm-feedback](/images/vaibhav-upreti/vu-midterm-feedback.png)
179+
180+
## Pull Requests 🔄 & Blogs 📖
181+
___
182+
183+
184+
### [Repo - CircuitVerse/CircuitVerse](https://github.com/CircuitVerse/CircuitVerse)
185+
| Pull Request | Description |
186+
|--------------------------------------------|----------------------------------------------------------|
187+
| [fix: erb tags](https://github.com/CircuitVerse/CircuitVerse/pull/3756) | Fix for erb tags in the codebase |
188+
| [feat: mirror pfp & projects, backfill profile_pictures](https://github.com/CircuitVerse/CircuitVerse/pull/3786) | Added a feature to mirror pfp & projects,while simultaneously backfilled profile_pictures |
189+
| [feat: migrate image_preview to AWS S3](https://github.com/CircuitVerse/CircuitVerse/pull/3813) | Migration of image_preview to AWS S3 storage |
190+
| [chore: update rails to 7.0.5.1](https://github.com/CircuitVerse/CircuitVerse/pull/3842) | Updated Rails version to 7.0.5.1 |
191+
| [fix: use env[] instead of fetch](https://github.com/CircuitVerse/CircuitVerse/pull/3856) | Code fix to use `env[]` instead of `.fetch` |
192+
| [feat: make member since field more readable](https://github.com/CircuitVerse/CircuitVerse/pull/3938) | Added a feature to make the 'member since' field more readable |
193+
| [feat: distributed tracing using OpenTelemetry](https://github.com/CircuitVerse/CircuitVerse/pull/3947) | Implemented distributed tracing using OpenTelemetry |
194+
| [feat: continuous deployment workflow using GitHub Actions and Kamal](https://github.com/CircuitVerse/CircuitVerse/pull/3952) | Added a continuous deployment workflow using GitHub Actions and Kamal |
195+
| [feat: serve profile_pictures with ActiveStorage](https://github.com/CircuitVerse/CircuitVerse/pull/3960) | Implemented serving profile pictures with ActiveStorage |
196+
| [chore: disable generating spans for default settings](https://github.com/CircuitVerse/CircuitVerse/pull/3961) | Disabled generating spans for default settings |
197+
| [fix: commentator profile_picture error](https://github.com/CircuitVerse/CircuitVerse/pull/3966) | Fixed commentator profile_picture error |
198+
| [chore: rerun image preview migration](https://github.com/CircuitVerse/CircuitVerse/pull/3972) | Reran the image preview migration |
199+
| [feat: migrate image_preview using Sidekiq](https://github.com/CircuitVerse/CircuitVerse/pull/3984) | Migrated image_preview using Sidekiq |
200+
| [chore: make maintenance tasks migrations safe](https://github.com/CircuitVerse/CircuitVerse/pull/3993) | Made maintenance tasks migrations safe |
201+
| [chore: mark maintenance tasks migrations safe](https://github.com/CircuitVerse/CircuitVerse/pull/4000) | Marked maintenance tasks migrations as safe |
202+
| [feat: deploy CircuitVerse to staging using Kamal](https://github.com/CircuitVerse/CircuitVerse/pull/4001) | Deployed CircuitVerse to staging using Kamal |
203+
| [feat: Serve assets using active storage](https://github.com/CircuitVerse/CircuitVerse/pull/3860) | Serve Image Preview using ActiveStorage |
204+
| [feat: production deployment using kamal](https://github.com/CircuitVerse/CircuitVerse/pull/3994) | Deploy CircuitVerse to production using kamal |
205+
206+
207+
### [Repo - CircuitVerse/infra](https://github.com/CircuitVerse/infra)
208+
209+
| Pull Request | Description |
210+
|--------------------------------------------|------------------------------------------------------------|
211+
| [feat: monit config files #1](https://github.com/CircuitVerse/infra/pull/1) | Added Monit configuration files |
212+
| [feat: Intialise runbook #3](https://github.com/CircuitVerse/infra/pull/3) | Initialized CircuitVerse runbooks |
213+
| [docs: distributed tracing using OpenTelemetry #5](https://github.com/CircuitVerse/infra/pull/5) | Documented distributed tracing using OpenTelemetry |
214+
| [docs: Kamal documentation #6](https://github.com/CircuitVerse/infra/pull/6) | Added Kamal documentation |
215+
216+
### Blog Posts
217+
218+
I published weekly blog posts throughout this period, which you can read at https://vaibhavupreti.github.io/hugo-blog/tags/gsoc
219+
220+
**Featured posts:**
221+
222+
- [My Journey to GSoC 2023 with CircuitVerse.Org: How I Prepared and Applied for the Program](https://vaibhavupreti.github.io/hugo-blog/blog/my-journey-to-gsoc-2023-with-circuitverse/)
223+
- [The Ultimate Guide to Distributed Tracing: Monitor your Rails app using Opentelemetry, Jaeger and New Relic Agent](https://vaibhavupreti.github.io/hugo-blog/blog/distributed-tracing-opentelemetry/)
224+
225+
- [Community Bonding Period at CircuitVerse.org](https://vaibhavupreti.github.io/hugo-blog/blog/community-bonding-period-gsoc/)
226+
227+
## What's Next 📅
228+
---
229+
I’m excited to continue as a Core Team member, maintaining this incredible open-source project.
230+
231+
Additionally, we plan to implement a blue-green deployment approach implement the CD pipeline after rigorous testing in the staging environment.
232+
- `Blue` - older server
233+
- `Green` - current staging environment
234+
235+
This involves copying the latest production data to staging(latest `pg_dump` and redis data),
236+
Production traffic will continue on 'blue' until we replicate and scale 'green' to match or exceed its capacity.
237+
Once performance and stability are confirmed, we'll transition production traffic to 'green', the `staging server` and phase out the older 'blue' instance, ensuring a risk-minimized transition.
238+
239+
240+
## Acknowledgments 📝
241+
___
242+
243+
I'm grateful to my mentor [Aboobacker M.K](https://github.com/tachyons) who helped me whenever I faced challenges and never
244+
overlooked any part of their mentoring. Taught me a lot of stuff around Ruby, Rails and Software Development in general.
245+
The weekly meetings were exceptionally informative, and I cannot overstate how much I learned through my interactions
246+
with my mentor. I doubt I will ever encounter a similar experience. Their dedication motivates me to aspire to become
247+
a software engineer like them and to share my learnings with others.
248+

layouts/shortcodes/toc.html

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
<div class="toc">
2+
{{ .Page.TableOfContents }}
3+
</div>
179 KB
Loading
91.7 KB
Loading
991 KB
Loading
65 KB
Loading
Loading
Loading

0 commit comments

Comments
 (0)