Continuous delivery with ArgoCD and SKS

One of the main drivers behind Kubernetes adoption is that it allows teams to deploy faster and safer. In this series of blog posts we will cover how we can use ArgoCD to turn an Exoscale Scalable Kubernetes Service (SKS) cluster into a GitOps compatible Continuous Delivery (CD) platform.

The series is split into 3 parts:

Part 1: covers the creation of an SKS cluster using Terraform
Part 2: covers basic SKS cluster configuration and how to perform it
Part 3: covers the installation and usage of ArgoCD

The blog posts will assume some familiarity with Terraform and Kubernetes. You should have some basic tools like terraform and kubectl already installed.

All of the code and configuration from this series of blog posts is also available on GitHub

Follow-Along-Video Part1 - 10:00 min

Exoscale Terraform Provider

While we have previously published an article that explains how to create an SKS cluster using Terraform, this one we will be taking a slightly different approach. Instead of relying on a preconfigured module, we will be creating the cluster using regular Terraform resources. Hopefully, this provides more information on how everything wires together, it should make it easier to adapt the content presented to your use cases.

Getting the provider

The first thing we have to do is tell Terraform to use the Exoscale provider. This is done by declaring the provider in any .tf file. A good convention is to use provider.tf for this.

terraform {
  required_providers {
    exoscale = {
      source = "exoscale/exoscale"
      version = "~> 0.30.1"
    }
  }
}

provider "exoscale" {}

The first block is the global Terraform configuration used in the project: it tells Terraform which resource providers the project will use. Since they are all plugins, we also have to tell terraform which version of the providers we want to use. Here we will only be using the Exoscale provider at version 0.30.1, the latest at the time of writing.

We can now instruct terraform to download the provider plugin with terraform init

➜ terraform init

Initializing the backend...

Initializing provider plugins...
- Finding exoscale/exoscale versions matching "~> 0.30.1"...
- Installing exoscale/exoscale v0.30.1...
- Installed exoscale/exoscale v0.30.1 (signed by a HashiCorp partner, key ID 8B58C61D4FFE0C86)

[[ ... ]]

Configuring the provider

The second block in our previous code example declares the Exoscale provider along with its credentials. We have deliberately left the provider configuration empty because we don’t want its sensitive content, such as IAM keys and secrets, to end up in a git repository. Instead, we will be providing this information via environment variables.

A convenient way can be to put them in a secrets.env file. (Don’t forget to put this file in your .gitignore)

export EXOSCALE_API_KEY=yourapikeygoeshere
export EXOSCALE_API_SECRET=yourapisecretgoeshere

You can then load these with source secrets.env.

Terraform Resources

Now that we’ve downloaded and configured the Exoscale provider, we can start declaring our resources. We will explain the parameters we are using, but it’s always nice to keep the documentation handy to find out what all the options are you can play with.

SKS Cluster

Before we declare any resources, we will be declaring a Terraform locals. Terraform’s equivalent of a variable allows us to declare some data without it being tied to any specific resource. In this case, we will define a zone local, which contains the Exoscale zone we want to deploy our resources into and some labels to take our resources with. It prevents us from making any typing mistakes in the various resources that require the zone as input.

locals {
  zone   = "de-fra-1"
  labels = { "project" : "sks-demo"}
}

Now that this is out of the way, we can declare our SKS cluster, named demo (bonus points for originality).

# This resource will create the control plane
# Since we're going for the fully managed option, we will ask sks to preinstall
# the calico network plugin and the exoscale-cloud-controller
resource "exoscale_sks_cluster" "demo" {
  zone            = local.zone
  name            = "demo"
  version         = "1.22.3"
  description     = "Webinar demo cluster"
  service_level   = "pro"
  cni             = "calico"
  exoscale_ccm    = true
  metrics_server  = true
  auto_upgrade    = true
  labels          = local.labels
}

We’ll go through each of the options we’ve configured one-by-one:

zone: the Exoscale zone the cluster provisioning takes place. We’ve set this to the value local we specified earlier, so we can’t mistype it.
name: the name of the cluster in our account. For each reference, we used the same name as the name of the Terraform resource
version: the version of Kubernetes we will deploy. 1.22.3 is the latest version at the time of writing
description: a description of the cluster that will show up in the Exoscale portal
service_level: The pro service level will deploy a highly available control plane configuration. There is also a starter level, but it comes with no SLA and no redundancy.
cni: the to be installed container networking plugin. We only support calico for now. You can set this to the emptry string "" and install your own CNI plugin. Because Kubernetes can’t schedule any workloads without a CNI plugin, we advise you not to override our default unless you have particular needs and know what you are doing.
exoscale_ccm: Whether the Exoscale Cloud Controller Manager (also known as ‘CCM’) will be installed for the cluster. It is providing integration between the Kubernetes cluster and the Exoscale platform to create network load balancers. The installation of this add-on is highly recommended. It defaults to true so if you do not explicitly specify it, the CCM will be installed. The CCM is entirely managed by exoscale and does not consume any resources in the created cluster.
metrics_server: Whether the Kubernetes Metrics Server will be installed in the cluster. It defaults to true. Like the CCM it is entirely managed by exoscale and does not consume any resources in the created cluster.
auto_upgrade: If set to true, the control plane will automatically be upgraded to the latest patch releases.

SKS Node Pool

While the exoscale_sks_cluster resource provisions a fully-functioning Kubernetes control plane, it does not create any nodes on which this control plane can schedule your workloads.

To do that, we need to create an exoscale_sks_nodepool. This is a specialized version of our instance pools: a group of homogeneous machines that is easy to scale up or down. Node Pools come with a specialized VM template, and nodes will securely join your Kubernetes cluster automatically.

You can create multiple node pools for your Kubernetes cluster, so you have different classes of hardware available, but in this blog post, we’ll keep it simple and create a single node pool with medium type instances.

# This provisions an instance pool of nodes which will run the kubernetes
# workloads.
# We can attach multiple nodepools to the cluster
resource "exoscale_sks_nodepool" "workers" {
  zone               = local.zone
  cluster_id         = exoscale_sks_cluster.demo.id
  name               = "workers"
  instance_type      = "standard.medium"
  size               = 3
  security_group_ids = [exoscale_security_group.sks_nodes.id]
  labels             = local.labels
}

As before, we’ll go through each of the options we’ve configured one by one:

zone: the Exoscale zone the Node Pool provisioning takes place
cluster_id: the id of the SKS cluster to which the Node Pool belongs. We use a reference to the id output variable of the SKS cluster we defined earlier. Terraform will know to create the cluster first and will substitute the value here once it is known.
name: the name of the Node Pool in our account. Again we chose to use the same name as the Terraform resource.
instance_type: the Exoscale instance type definition for the VMs created by this Node Pool. standard.medium provides a nice amount of RAM and CPU for general-purpose use cases.
size: the number of VMs to be created. The Resizing at a later point is possible to change the capacity if it is under-or over-provisioned.
security_group_ids: the security groups that the instances should belong to (see next section)

Firewall Rules

The security group setup of your Kubernetes nodes is very use-case specific. For this demo, we assume that all traffic will come into the cluster using an Exoscale NLB and that we do not need to contact any protected services outside the cluster.

For all features of the cluster to work correctly, some security group rules need to be present. There are plans to automate this in the future, but we have to create them ourselves for now.

First, we declare the security group:

# A security group so the nodes can communicate and we can pull logs
resource "exoscale_security_group" "sks_nodes" {
  name        = "sks_nodes"
  description = "Allows traffic between sks nodes and public pulling of logs"
}

We will use a single security group for all the necessary rules. The Node Pool definition references this security group so that the rules apply to the nodes it creates.

Our first rule is the ‘logs_rule’:

resource "exoscale_security_group_rule" "sks_nodes_logs_rule" {
  security_group_id = exoscale_security_group.sks_nodes.id
  type              = "INGRESS"
  protocol          = "TCP"
  cidr              = "0.0.0.0/0"
  start_port        = 10250
  end_port          = 10250
}

The logs_rule opens port 10250 to the internet. kubectl logs uses this port to pull logs from a node. If you don’t set it, you will need another way to extract logs from the cluster.

Our second rule is the ‘calico_rule’:

resource "exoscale_security_group_rule" "sks_nodes_calico" {
  security_group_id      = exoscale_security_group.sks_nodes.id
  type                   = "INGRESS"
  protocol               = "UDP"
  start_port             = 4789
  end_port               = 4789
  user_security_group_id = exoscale_security_group.sks_nodes.id
}

This rule allows UDP traffic on port 4789 between all nodes that are part of the Security Group. It doesn’t open any ports to the internet. It’s needed for Calico to provide the networking inside the cluster. If you do not use Calico in your SKS cluster, you do not need to define a calico_rule, but you’ll have to set some rules specific to your CNI plugin of choice.

The final rule is the ‘ccm_rule’:

resource "exoscale_security_group_rule" "sks_nodes_ccm" {
  security_group_id = exoscale_security_group.sks_nodes.id
  type              = "INGRESS"
  protocol          = "TCP"
  start_port        = 30000
  end_port          = 32767
  cidr              = "0.0.0.0/0"
}

This rule opens port range 30000-32767 to the internet so that any load balancers created by the Exoscale CCM can communicate with the services on the cluster. Whenever the CCM creates a load balancer, it will bind a service to a randomly chosen port in this range on the cluster nodes, which the Exoscale NLB will target. If you do not install the CCM, you do not need this rule.

Terraform Outputs: connect to SKS

The final element of our Terraform project will be a Terraform output. They provide a convenient way to render the data needed to use the resources created.

In our case, we want to get a kubeconfig file that provides access to our newly created cluster.

Because kubeconfig files contain secrets, the Exoscale provider doesn’t directly expose access to kubeconfig files. If it did, they would end up in the terraform state, which poses a security risk. What we’ll do instead is use the exo CLI to retrieve the kubeconfig file, and we’ll prepare a Terraform output that contains the command we need to run:

output "kubectl_command" {
  value = "exo sks kubeconfig ${exoscale_sks_cluster.demo.id} user -z ${local.zone} --group system:masters"
}

A small note about the system:masters-group. It is not a good security practice to have users belong to this group. The permissions of this group can’t be revoked, and because Kubernetes doesn’t allow the revocation of certs used to authenticate, this can pose a persistent threat. This blog post does an excellent job of explaining the risks and countermeasures: https://blog.aquasec.com/kubernetes-authorization For production clusters, you want to have a strategy to drop these permissions during bootstrap.

Now that we have done all the necessary setup, we can do a dry run with terraform plan:

➜ terraform plan

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # exoscale_security_group.sks_nodes will be created
  + resource "exoscale_security_group" "sks_nodes" {
      + description = "Allows traffic between sks nodes and public pulling of logs"
      + id          = (known after apply)
      + name        = "sks_nodes"
    }

  # exoscale_security_group_rule.sks_nodes_calico will be created
  + resource "exoscale_security_group_rule" "sks_nodes_calico" {
      + end_port               = 4789
      + id                     = (known after apply)
      + protocol               = "UDP"
      + security_group         = (known after apply)
      + security_group_id      = (known after apply)
      + start_port             = 4789
      + type                   = "INGRESS"
      + user_security_group    = (known after apply)
      + user_security_group_id = (known after apply)
    }

  # exoscale_security_group_rule.sks_nodes_ccm will be created
  + resource "exoscale_security_group_rule" "sks_nodes_ccm" {
      + cidr                = "0.0.0.0/0"
      + end_port            = 32767
      + id                  = (known after apply)
      + protocol            = "TCP"
      + security_group      = (known after apply)
      + security_group_id   = (known after apply)
      + start_port          = 30000
      + type                = "INGRESS"
      + user_security_group = (known after apply)
    }

  # exoscale_security_group_rule.sks_nodes_logs_rule will be created
  + resource "exoscale_security_group_rule" "sks_nodes_logs_rule" {
      + cidr                = "0.0.0.0/0"
      + end_port            = 10250
      + id                  = (known after apply)
      + protocol            = "TCP"
      + security_group      = (known after apply)
      + security_group_id   = (known after apply)
      + start_port          = 10250
      + type                = "INGRESS"
      + user_security_group = (known after apply)
    }

  # exoscale_sks_cluster.demo will be created
  + resource "exoscale_sks_cluster" "demo" {
      + addons         = (known after apply)
      + auto_upgrade   = true
      + cni            = "calico"
      + created_at     = (known after apply)
      + description    = "Webinar demo cluster"
      + endpoint       = (known after apply)
      + exoscale_ccm   = true
      + id             = (known after apply)
      + labels         = {
          + "project" = "sks-demo"
        }
      + metrics_server = true
      + name           = "demo"
      + nodepools      = (known after apply)
      + service_level  = "pro"
      + state          = (known after apply)
      + version        = "1.22.2"
      + zone           = "de-fra-1"
    }

  # exoscale_sks_nodepool.workers will be created
  + resource "exoscale_sks_nodepool" "workers" {
      + cluster_id           = (known after apply)
      + created_at           = (known after apply)
      + disk_size            = 50
      + id                   = (known after apply)
      + instance_pool_id     = (known after apply)
      + instance_pool_prefix = "pool"
      + instance_type        = "standard.medium"
      + labels               = {
          + "project" = "sks-demo"
        }
      + name                 = "workers"
      + security_group_ids   = (known after apply)
      + size                 = 3
      + state                = (known after apply)
      + template_id          = (known after apply)
      + version              = (known after apply)
      + zone                 = "de-fra-1"
    }

Plan: 6 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + kubectl_command = (known after apply)

[[ ... ]]

This indeed shows all the resources we wanted to create, so it’s now safe to create the infrastructure:

terraform apply
➜ terraform apply   

...

Changes to Outputs:
  + kubectl_command = (known after apply)

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

[[ ... ]]

exoscale_sks_cluster.demo: Creating...
exoscale_sks_cluster.demo: Still creating... [10s elapsed]
exoscale_sks_cluster.demo: Still creating... [20s elapsed]
exoscale_sks_cluster.demo: Still creating... [30s elapsed]
exoscale_sks_cluster.demo: Still creating... [40s elapsed]
exoscale_sks_cluster.demo: Still creating... [50s elapsed]
exoscale_sks_cluster.demo: Still creating... [1m0s elapsed]
exoscale_sks_cluster.demo: Still creating... [1m10s elapsed]
exoscale_sks_cluster.demo: Still creating... [1m20s elapsed]
exoscale_sks_cluster.demo: Still creating... [1m30s elapsed]
exoscale_sks_cluster.demo: Creation complete after 1m36s [id=5bfe7cab-9318-4512-b104-a244a681c35d]
exoscale_sks_nodepool.workers: Creating...
exoscale_sks_nodepool.workers: Creation complete after 4s [id=b3a5ae8a-9db5-4e36-8905-68c42746b737]

Apply complete! Resources: 6 added, 0 changed, 0 destroyed.

Outputs:

kubectl_command = "exo sks kubeconfig 5bfe7cab-9318-4512-b104-a244a681c35d user -z de-fra-1 --group system:masters"

Now that Terraform has created our cluster, we can copy-paste the output command and run it. The exo CLI will pick up our credentials from the same environment variables we have set earlier. We are saving the output of the command in a kubeconfig.yaml file.

exo sks kubeconfig 5bfe7cab-9318-4512-b104-a244a681c35d user -z de-fra-1 --group system:masters > kubeconfig.yaml

Finally, we’re going to set a KUBECONFIG environment variable that points to the file we’ve just created. kubectl will load its configuration from the file this variable specifies, so we can freely move folders and still, be connected to our cluster.

➜ export KUBECONFIG="$(pwd)/kubeconfig.yaml"

And to wrap things up, we will query our newly created cluster for its nodes:

➜ kubectl get nodes
NAME               STATUS   ROLES    AGE     VERSION
pool-4bb74-dtxlb   Ready    <none>   2m53s   v1.22.2
pool-4bb74-mtahy   Ready    <none>   2m45s   v1.22.2
pool-4bb74-ryuwa   Ready    <none>   2m37s   v1.22.2

We now have a Kubernetes cluster with three nodes ready to run our software.