Making Deployments Boring with ArgoCD

Published in

SailPoint Engineering Blog

8 min readJul 21, 2021

Delivering applications quickly and safely is a major focus for the SailPoint DevOps team. You may even say we want them to be boring. Uneventful. Predictable. Reliable.

This past year we have adopted ArgoCD and the GitOps methodology into our production systems and wanted to share how we made our deployments boring. To make this a reality, we have chosen ArgoCD, an open-source CNCF project which provides continuous delivery capabilities to Kubernetes. ArgoCD combines git version control concepts and applies them to managing applications in Kuberentes via popular YAML management tools including Kustomize, Helm, and JSonnet.

SailPoint Kubernetes 1.0

Our Kubernetes journey began a few years ago, when Amazon’s EKS (Elastic Kubernetes Service) arrived on the scene. We quickly adopted EKS and built some of the foundational pieces of our AI functionality using Kubernetes. At first, we had a lot of Helm charts and plain YAML manifests in git. The deployment model was a well documented kubectl apply or helm upgrade --install, ensuring the values files were in the correct order.

A typical deployment look something like this:

helm upgrade --install \
my-release \
chart-repo/name \
--values ./values/account/values.yaml \
--values ./values/us-east-1/values.yaml \
--values ./values/my-cluster/values.yaml

Using the helm diff plugin, we would then do a sanity check before applying the manifest to each cluster. This worked well with a handful of production clusters, a small number of charts, and a small team.

As we onboarded more applications, the number of values files and their correct order grew our deployment complexity. This demanded additional, manual review and we discovered the need for slightly different patterns based on the application being deployed. Was it a per-cluster application, or per namespace? Did it have dependencies? How do we orchestrate those? Deployment automation proved to be challenging as well. While we had some initial attempts to create Jenkins pipelines to automate these deployments, they were often too opinionated to be re-usable and getting deployment feedback meant doing a lot of eyeball-checks ourselves.

We also began to observe issues with manifest drift and difficulty upgrading applications. For example, we needed to upgrade the external-dns chart in one of our dev clusters. The upgrade was completed by updating the Docker image variable in the Helm values file with no other changes. A mixture of Helm chart versions and some manual changes resulted in a situation where the required RBAC permissions were not as expected, and even the diff plugin was unable to guide us to that delta. Despite all our normal safeguards, it was not until we fully uninstalled and re-installed the helm chart that we could get it working and ultimately deduce the root cause.

So let’s summarize the growing pains we were currently facing:

Helm Chart upgrades and managing Helm Chart SDLC felt too hands-on
Drift and Diff tools available made it difficult to find bugs
Release pipelines were not reusable
Deployment feedback needed improvement

We knew we had some issues to address, and knew we needed to need to get better in order to operate at a larger scale. We were managing our deployments, but they were not boring yet. It was time to take a step back, challenge our assumptions, and take action on the future of our Kubernetes ecosystem.

Early Deployments with ArgoCD

Several members of the team had already heard of ArgoCD and it was easy to see the value this tool provides. The ability to see your application health across multiple clusters (and regions!) in a single view was quite valuable to our team. Built-in authentication and RBAC meant we could further enable our Engineering teams to have the same visibility that we do in DevOps. Plus, we could move forward knowing that, between the CLI and REST API, we could probably automate our way around anything that ArgoCD didn’t offer out-of-the-box.

The first deliverable using ArgoCD focused on a greenfield project — the Jaeger Distributed Tracing system, which we planned to deliver using the Jaeger Operator on top of Kubernetes. This would be deployed to net new EKS clusters which came with the added scope of setting up common Kubernetes tools such as cluster-autoscaler, external-dns, and cert-manager. The plan was to use ArgoCD to deploy Jaeger in addition to the above standard cluster tools.

First, we had to select which YAML manifest generation tool would best suit our needs. We needed something that allowed us to quickly take and adopt open-source software into our clusters without a lot of extra steps. We also had to be able to add our own customizations whether its to support operational requirements like resource limits or developer needs such as environment variables and container arguments.

It was clear to us that almost every open-source project we presently use or were considering using provided raw YAML manifests for their application spec. Rather than try to add Helm where it wasn’t needed, we determined that Kustomize was the best fit for us. It struck a good balance between the ability to customize (with overlays and components), adoption within the open-source community, and relative simplicity to get started.

We use the standard pattern of base and overlays from the Kustomize README. This led us to create our repo to look something similar to this:

./cluster-autoscaler
├── base
│   ├── deployment.yaml
│   ├── ingress.yaml
│   ├── kustomization.yaml
│   ├── rbac.yaml
│   └── service.yaml
└── overlays
   ├── cluster1
   │   └── kustomization.yaml
   └── cluster2
       └── kustomization.yaml
./jaeger
├── base
│   ├── operator.yaml
│   ├── kustomization.yaml
└── overlays
   ├── cluster1
   │   └── kustomization.yaml
   └── cluster2
       └── kustomization.yaml

For open-source deployments, the base for each application is the 100% unmodified manifest from the project maintainers. We then use the overlays to provide the last-mile configuration including things like ingress host names and ARNs for the Service Accounts when using IAM Roles for Service Accounts. This layout improves our ability to upgrade open-source applications with ease. We simply swap out the old base for the new version and let the overlays do all the SailPoint-specific customizations. Combining kustomize with the argocd app diff CLI command, we can easily evaluate the upcoming changes and rapidly roll out new versions of software.

We also had to decide how to map these configurations to each individual EKS cluster, which lead us to use the ArgoCD App of Apps pattern. This allows us to map the overlays and provide DRY, repeatable configurations on a per-cluster basis. We could now share things like the ArgoCD project naming conventions, the EKS cluster endpoint address, and any ArgoCD application-specific customizations such as namespace auto-creation or setting ignoreDifferences.

At the end of this project, we felt increasingly confident that we picked the right tools for us. We were able to deploy the entire stack consistently and repeatedly, and the production deployment was boring, just the way we wanted it.

Accelerating Application Delivery

With the Jaeger project blazing a new path for us, we shifted our focus towards some of our own home-grown applications. The best candidate was our Cloud Access Management system, which was already on Kubernetes and was looking to improve their continuous delivery story.

We had two major objectives we set out to accomplish. First, we wanted to fully convert this system to ArgoCD and Kustomize and away from Helm. This would help ensure we can keep customizing manifests without having a hard dependency on the helm chart supporting those needs. This allows both Dev and Ops to use the same repo and manifests with looser coupling. We also wanted to make our first, fully continuous delivery pipeline using ArgoCD, creating a repeatable model which could be re-used in future projects.

The helm conversion ended up being fairly straight-forward. We picked a point in time to create a new YAML base configuration using the helm template command and then re-produced the values files as kustomize patches. We then staged the application in ArgoCD but used the sync window feature to ensure we could not sync the application before it was ready. This allowed us to use ArgoCD itself to check for drift and make sure all changes were expected. When we were ready, we simply allowed syncing of the project and pushed the sync button. Everything went green and we were now running with ArgoCD! A perfectly boring transition to the new deployment tool.

For the deployment pipelines, we weighed the pros and cons of using the REST API directly or orchestrating the pipeline using the ArgoCD CLI itself. We ultimately settled on using the ArgoCD CLI, as it offered the ability to handle project credentials, selecting applications by label, and application synchronization, all out-of-the-box. We borrowed what Jenkins was already doing by gathering Jira tickets and deployment metadata, and let the agent orchestrate the actual CLI commands. The pipeline issues the kustomize edit image command in each overlay to update the microservice to a new version, and then commits the change back to git. The next build stage then issues the argocd app sync command to deploy the new version to staging, and then production. The CLI made this a fairly painless process and meant we didn’t have to write any extra custom wrapper code to fully automate this process.

Now that we were on ArgoCD, we were able to share all of the visibility with our engineering team in ways we had not been able to previously. The ability to access logs even before they have been fully processed into Elasticsearch is a fan-favorite. Also, the self-service aspect of seeing exactly what’s deployed and having an extra set of eyes when things are less boring has been quite an advantage for us. Virtual high-fives were had by all.

Next Steps

Today, we are now managing over 150 applications with ArgoCD. The ability to take a manifest from text to production-ready has been dramatically reduced, and the features offered by ArgoCD made that transition pretty boring. As we have grown, we’ve kept an eye on the ArgoCD ApplicationSet feature. While the App of Apps pattern is viable, it does feel like an “extra step” when we want to roll out an application. ApplicationSets offers the ability to generate these ArgoCD application specs based on your repository structure or cluster labels. Adopting ApplicationSets should remove one more manual step for new applications and enable concepts like default configurations for new clusters.

We’re also aiming to advance our development story by improving the application diff instrumentation and making it part of the GitHub pull request approval workflow. This will help us shift-left some of our checks and further speed up application delivery with fewer iterations.

Conclusion

Our Kubernetes deployments have become more boring than ever! The manifest creation and deployment processes are simple, well-understood, and easy to rollout. A special thanks is due to the Argo Slack community. They are absolutely fantastic, helpful, and open to questions. I’ve received plenty of great advice and help from this group, and a large part of making this move a successful one.

It has been a very exciting year and we are looking forward to taking the GitOps platform powered by ArgoCD to the next level through 2021 and beyond.