Journey into Kubernetes: Building Resilience, One Container at a Time

Mohammed
5 min readDec 21, 2023

--

The question that kept echoing in my mind was, “Is Kubernetes the right fit for us?” There’s no definitive answer to that, and sometimes, the only way to find out is through hindsight.

Pictured above is similar to Gartner’s Hype Cycle, The cycle begins with chaos sparking innovation, followed by heightened expectations. As planning and reality checks set in, there’s a phase of disappointment. However, through implementation, hope is restored, and over time, value is realised with widespread adoption.

This post breaks down the journey into different phases, explaining each step.

Chaos 🔥

Scaling our current system infrastructure to handle day to day growing traffic was a huge challenge. Imagine this: our services, which depend on virtual machines (VMs), would get overwhelmed when a lot of traffic and data hit them all at once. It was like a recurring nightmare where our systems struggled to cope with the load.

As the demand increased, we tried to solve the problem by adding more virtual machines (VMs). But instead of helping, this created a never-ending cycle of issues. Late-night calls became a regular thing, and our services struggled with too many requests, turning our operations into chaos.

At that crucial moment, we realized the only way forward was to create a solution that could scale.

Innovation 💡

We started looking at kubernetes as the only solution. After all, the term “imposter syndrome” was coined to describe the anxiety we face as we try to keep current with technology and what is happening in the industry.

Here’s what we wanted to solve:
1. Reduce Infrastructure management as much as possible.
2. Auto scale infra / services to cope with dynamic load.
3. Platform that can deliver resiliency if we plan to migrate our entire setup.
4. Better security and visibility into systems.
5. Cost.

However, as we delved deeper, we refused to settle for a single solution. Exploring various tech stacks, we considered auto-scaling VMs and Nomad. Alas, each alternative proved inadequate in meeting the breadth of our challenges. So, we took a big chance on Kubernetes, and that’s where our journey started….!

Planning🤞

Sharing plans with people (It’s Scary…..) 😓

I find it kind of scary to tell people what I’m working on!

Here are a few reasons I find it a bit scary:

  1. Imposter syndrome — I’m telling everyone we can do it, but what if I mess up? People might think I’m not good at my job.
  2. What if something unexpected happens and i’m unable to handle it?
  3. What if the duration or cost exceeds the initial projection I made?

It’s much safer to start doing the task, stay focused until it’s finished, and then announce, “Hey, we did it, and it worked!”

Luckily, I didn’t have the option to avoid it — my manager clearly instructed me to share plans with both leadership and the teams before the project could begin.

How i did it ?

I found it helpful to assume that what I’m doing will work out. While it’s important to think about risks, starting with the belief that things will probably succeed makes it easier to get things done and share with others. Confidence seems pretty key when dealing with tough tasks.

  1. I prepared a straightforward document outlining what I’m working on, why it matters, how I’ll approach it, cost and timelines. Sharing plans is beneficial for others tackling similar tasks. For instance, if your team reveals a project requiring assistance from another team, they can allocate someone to lend a hand.
  2. Sharing plans also helped uncover issues we missed, refining our goals and dealing with risks.
  3. Writing down a plan brought clarity to our objectives.
  4. Communicating clear plans helped management… manage.
  5. I shared a RACI (Responsible, Accountable, Consulted, and Informed) chart to clarify everyone’s roles and ensure a smooth migration.

However, sharing plans faced challenges:
1. Learning curve within team,
2. Uncertainty about ownership,
3. Questions on timing (Why do we have to do it right now?),
4. Cost concerns,
5. People concerns.

Rejections often boiled down to a “Fear of the unknown.” Communication is crucial — plans should be mostly true, if not 100% perfect. A proof of concept and discussions with stakeholders can help.

In a nutshell, while planning:

  1. Crafted a reasonable plan.
  2. Shared it with a reasonable group.
  3. Assumed things will work out, and challenges can be tackled.
  4. Listened to feedback and adjust as needed.

Most important thing while working on big projects that greatly affect the business, get used to no and remember “every no is one step closer to yes

Execution

We divided our services into different categories based on the ease of migration:

  1. Mouse: Services that are simple to containerize, requiring minimal effort, and with low traffic.
  2. Rabbit: Client-facing services with moderate traffic.
  3. Deer: Services handling critical business messages but not client-facing.
  4. Mammoth: Legacy code services with a significant impact on the business.

To ensure a strategic approach, we set clear boundaries. “Lift and shift” wasn’t about moving things as they are; we aimed for improved performance and observability. If code changes were necessary, we involved the respective teams — no more applying temporary fixes on the new platform.

We started with small tasks like mouse to boost our confidence. Then, we moved on to bigger challenges like rabbits. After that, we tested ourselves with a prototype involving deer-sized tasks to see if we could handle them and extract value. This step-by-step process paved the way for us to confidently tackle even larger challenges, represented by the metaphorical “Mammoth.”

At first, I thought getting things done would be the hardest part. But when we actually got to it, it was easier than expected. Even with challenges popping up, it was smooth because everyone on the team knew what we were up to. Asking for help was easy, and we had bandwidth set aside for any unexpected issues.

Value 💲

Starting with the statement, “Sometimes, the only way to ascertain the right choice is in hindsight,” we migrated over 25 services, we ran them seamlessly for over 90 days without any issues.

Here’s key value points

  • Handling 30 million messages seamlessly using keda with redis to scale workloads.
  • Eliminated midnight calls.
  • Improved system reliability significantly.
  • Cut costs by running some workloads on Kubernetes Spot instances instead of dedicated VMs.
  • Implemented Argo CD and used Helm chart templates for easy deployments with minimal learning curve.

However, is this the sole solution? The answer is no. Decisions were made based on unique circumstances and available bandwidth at the time.

I believe Kubernetes is not the solution to all the problem.

Is the journey complete? No. I was only able to prove that platform can deliver value its gonna take time for widespread adoption and migration.

Key Learning 🔆

I found the technical side of engineering easy, but dealing with people — sharing plans, onboarding, listening to feedback — was the real challenge. Planning was a bit of a pain, and I had to push through my own fears. It wasn’t until I wrote this document that I realised human dynamics, communication, and collaboration were the keys to making execution easy.

--

--