ArgoCD with Helm on Kubernetes

Motivation

Most of the CI/CD pipelines I’ve met so far were following this very known pattern: build the code using Docker, push the images to the managed private container registry, package the images in a Helm chart, connect to the kubernetes cluster and deploy the application using helm cli.

While this approach is okay for many small-mid projects consisting of just few microservices deployed on kubernetes, it get’s harder to maintain and keep track of all the pipeline files and chart repos once the number of them grows. I’ve usually seen this situation in projects where lots of self hosted stuff is being used: HCP Vault, Keycloak, RabbitMQ etc.

Recently by the request from one of the clients I was involved in migration from traditional deployment pipelines to GitOps, and thus I stumbled upon the ArgoCD world. Before migrating that, I’ve made my own little bit of testing with dummy examples on my homelab cluster:

Application’s codebase and Helm charts:
https://github.com/ptisma/argocd-helm
ArgoCD manifests:
https://github.com/ptisma/k3s-cluster/tree/main/manifests/examples/argocd
External Secret:
https://github.com/ptisma/argocd-helm/blob/main/charts/argocd-helm-server/secrets.yaml
External Secret Store:
https://github.com/ptisma/k3s-cluster/tree/main/manifests/examples/external-secrets

GitOps principle

The clutter of declarative pipeline files, bash scripts, helm charts, kubernetes manifests, kubectl scripts, monorepo vs multirepo etc. leads us to the birth of GitOps principle.

In short the GitOps is a principle where we manage the our infrastructure the very same way we manage the application codebase: the infrastructure is described by code which represents the source of truth, the code is versioned in Git and the state of the actual infrastructure is constantly reconciliated with source of truth.


For the context of this article, I’ll mostly focus on the “App infra definition” (aka our app in a Kubernetes) hence the name ArgoCD with Helm on Kubernetes, but bear in mind the GitOps goes way above this context: you can for example create the repo for cloud infrastructure on AWS written in CloudFormation, use the option GitSync to connect it to the AWS and with each commit on repo sync the changes on live AWS infra.

Practical GitOps implementation

In order to enable the GitOps we first need to decouple the application and infrastructure codebase, for example we got a repository called app-repo where we store our code and pipeline for building and pushing the image.
In the other infrastructure repo infra-repo we define Helm chart for our application: the chart consists of manifests and manifests use the images, how do we get those?

That’s why we need to connect these two repos with a pipeline: on each image build on app-repo, we need to update the tag on infra-repo.

The call for ArgoCD

Alright, so we decoupled the repos and we got our infrastructure manifests in a form of a Helm chart, how do we deploy it now on Kubernetes?

This is where the ArgoCD steps in: we no longer have the pipeline to deploy it, instead, the ArgoCD will do it for us. Let’s first introduce the ArgoCD.
ArgoCD is a declarative, GitOps continuous delivery tool for Kubernetes. It automates the deployment and management of applications and infrastructure within Kubernetes clusters by using Git repositories as the source of truth for the desired application state.

The following picture illustrates the full architecture of ArgoCD:


In my case, I deployed the ArgoCD on the Kubernetes cluster using Helm and exposed the UI on subpath via Traefik ingress, I’ve run the ArgoCD in insecure mode because we are doing the TLS termination at Ingress level.

One thing to note in this picture because the obvious default way was not illustrated, ArgoCD works in pull mechanism (watches the repos for changes and applies them on cluster level like already mentioned).
ArgoCD is stateless deployment, it communicates with repos and stores the data in Kubernetes etcd, but the Helm chart the ArgoCD is in also deploys redis but only for caching.

ArgoCD kinds

Once you get in the ArgoCD UI you can configure it in your likeness: register your git creds if needed, setup a Application which targets the Helm chart in the git repo and deploys it in your cluster in the selected namespace.

Previous step is good for testing, but for production purposes ArgoCD applications, projects and settings can be defined declaratively using Kubernetes manifests so we can git version them as well.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: argocd-helm-server
  namespace: argocd # The namespace must match the namespace of your Argo CD instance
spec:
  project: argocd-example-project
  source:
    repoURL: https://github.com/ptisma/argocd-helm
    targetRevision: main
    path: charts/argocd-helm-server
    helm:
     valueFiles:
      - values-main.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd-helm

One popular thing is also to implement the “App of apps” pattern in ArgoCD, aka we define the “root” application which manages the “child” apps. It enables us to have the logical grouping of our apps, global configurations, easier management and syncing and allows us to easier scale out, just add the new Application manifests in the “root’s” application target directory and that’s it.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: argocd-helm
  namespace: argocd
  finalizers:
  - resources-finalizer.argocd.argoproj.io
spec:
  project: argocd-example-project
  destination:
    namespace: argocd-helm
    server: https://kubernetes.default.svc
  project: default
  source:
    path: manifests/examples/argocd/apps  # contains application spec files
    repoURL: https://github.com/ptisma/k3s-cluster
    targetRevision: main
Secret management in ArgoCD using external-secrets

If you noticed in the “child” application manifests we are specifying the helm values files: a common practice I’ve usually seen was to have the values files for each of the environment like dev, qa and prod and then just use the one depending on the environment.
Usually for the secret stuff like connection strings, passwords, credentials I’ve been using the cloud managed libraries for example Azure Pipelines Library.

Thus we stumble upon a problem, we said, there’s no pipelines anymore, and we can’t just bake our secret values out there on values files in git repo.

Solution for our problem is the external-secrets-operator, I’ve deployed this using Helm on my cluster.

External Secrets Operator is a Kubernetes operator that integrates external secret management systems like HashiCorp Vault etc. The operator reads information from external APIs and automatically injects the values into a Kubernetes Secret.
External Secrets Operator uses its own CRD-s: we create the ExternalSecret which references the SecretStore, think of it as a object instance of some external secret API, once the ExternalSecret “fetches the data”, it stores it locally at our cluster as regular Kubernetes Secret.


In a context of our problem, we solve it in this way: we first deploy the SecretStore for our secret management service and configure it with the right credentials. Then we bake the ExternalSecret together with the other manifests in our application’s Helm Chart, so once we deploy the chart, the ExternalSecret will be created as well thus the real Kubernetes Secret with the actuall data will be available as well.
Inside the ExternalSecret we can control what name of this newly created secret will be and in which namespace will it reside, using this information we can simply then just use this secret’s name and data to inject our needed information to our pods.

And that’s it, now our pods got the sensitive information like connection strings, and we didn’t expose it inside the git versioned code.

Luckily, the External Secrets Operator gives us the ability to use the fake provider for SecretStore locally on kubernetes cluster, so I did not need to connect my dummy examples with real secret manegment stores.

ArgoCD syncing

While there are tons of options when it comes to syncing strategies, I’ve used the basic ones for starters, further optimization and fine-tuning always depends on your application/repo architecture thus this is kinda out of scope from this article.

Before diving deep into syncs, lets first compare it to the refresh option that we can see in the GUI. While sync reconciles the current cluster state with the target state in git, refres fetches the latest manifests from git and compares diff.


We want to automate the sync between our application manifests and live application on cluster, so we are going to use the following options on our Application kind:

spec:
  syncPolicy:

    syncOptions:
    - CreateNamespace=true
    automated:
      prune: true
      selfHeal: true

This will enable us the following; automated sync, auto delete of resources no longer defined in repo and reverting back to the state defined in the repo if we make the live changes on our cluster.
Since we are using the app-of-apps pattern, make sure to add this spec to all of the children apps as well to sync the actual kubernetes manifests.

The default polling interval is 3 minutes (180 seconds) with a configurable jitter. You can change the setting by updating the timeout.reconciliation value and the timeout.reconciliation.jitter in the argocd-cm config map. If there are any Git changes, Argo CD will only update applications with the auto-sync setting enabled.

After taking care of the automation, we want to control the flow of our syncs, and we u can use that with sync waves and phases. Sync waves provide finer control over the order in which resources are applied during the sync process

Each sync consists of three phases: PreSync, Sync and PostSync. They are useful for things like running init scripts, cleanup jobs and etc.

By default, all resources are in sync wave 0, but we can add our own sync waves, both with negative and positive value.

To configure the sync wave, add this to the kind Application:

metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "1"

This will now apply to the manifests targeted by this Application, so if we have like in my example, the root application in sync wave 0 and child applications in sync waves 1 and 2, the ArgoCD will sync first the child application 1 and its Helm chart (manifests), then it will continue to the second one.

We have already mentioned the sync phases: while we cannot annotate them directly as sync waves, we can use the resource hooks.

Hooks are simply Kubernetes manifests tracked in the source repository of your Argo CD Application annotated with argocd.argoproj.io/hook.
In my example, inside the every child Aplication manifest which is tracked by root Application, I have added the simple Kubernetes Job which sleeps for couple of seconds to demonstrate that sync waves do in fact work sequentially.