Skip to main content

Self-hosted Gitlab Runners on AKS, with Managed Identities

This post is long overdue, I've been meaning to write this for a long time but did not get around to doing it. So here goes!

The code used in this post is located in https://gitlab.com/ascodenl/gitlab-runner-aks

The great thing about using Managed Identities in Azure is that they cannot be (ab)used to run elsewhere, like Service Principals can. Yes, you can use workload identities conditional access policies nowadays, but SP's are basically just another username/password combination that can be abused for other purposes than they were meant for. Managed Identities are a special type of Service Principal (which is an Azure Enterprise Application Registration under the hood) that can only be used when attached to an Azure resource. When connected to a Kubernetes service account, the managed identity permissions can be assumed by the pod that has that service account attached. This allows for fine-grained permissions for specific purposes, for specific Gitlab runners. For example, a runner that is purpose-built for building container images that pushes to Azure Container Registry - using a Managed Identity that only has the AcrPush permission on the ACR(s) it needs to push to.

The case for self-hosted Gitlab Runners

"Why not use the Gitlab provided runners"? I hear that quite often. In summary, self-hosted GitLab runners are ideal when you need maximum control, security, and flexibility for your CI/CD jobs, want to optimize costs at scale, or have specific compliance and infrastructure requirements that (what Gitlab calls) Instance runners cannot meet. Yes, it requires maintenance but it outweighs the added benefits - especially in environments with heavy compliance requirements like the Financial industry.

Slightly more detailed, for several reasons:

1. Full Control Over the Build Environment

2. Security and Compliance

3. Cost Efficiency and Scalability

4. Performance and Flexibility

5. Advanced Customization

Types of runners

Gitlab has 3 types of runners:

Types of runtime environments

When registering a GitLab runner, you must select an executor, which determines the environment in which your CI/CD jobs will run. Each executor offers different levels of isolation, scalability, and compatibility, making them suitable for various scenarios.

Executor Isolation Typical Use Case Pros Cons
Shell Low Simple, local jobs Easy, minimal setup Low isolation, less secure
Docker High Reproducible, isolated builds Clean, scalable, supports services Needs Docker, some limits
Docker Autoscaler High Scalable cloud builds Auto-scales, cloud support Complex setup
Instance Very High Full VM per job, high isolation Max isolation, flexibility Resource intensive
Kubernetes High Cloud-native, Kubernetes environments Scalable, cloud integration Needs Kubernetes
SSH Varies Remote, legacy, or custom environments Remote execution Limited support
VirtualBox/Parallels High VM-based isolation on local hardware Good isolation Slower, needs virtualization
Custom Varies Anything not covered above Flexible Requires custom scripts

Choosing the right executor depends on your project's requirements for isolation, scalability, environment, and available infrastructure.

Our default platform of choice is Kubernetes, this article covers the Azure implementation of Kubernetes called Azure Kubernetes Service (AKS)

Creating infrastructure in Azure (or any environment for that matter) is done using Infrastructure as Code (IaC). The tool of choice is Terraform or Tofu, whatever your preference. The idea is to let a CI/CD pipeline handle the creating, updating and destruction of Azure resources, using Gitlab Runners on AKS. For this, we need several resources to make that happen.

Here is a quick overview of what we are building:

Gitlab runners AKS

Azure configuration

The runner will use a Kubernetes Service account, which is "connected" to an Azure Managed Identity, that will be assigned roles with permissions to create resources in Azure. You will need an AKS cluster with OIDC issuer enabled. Read here how to enable if not configured yet.

The Managed Identity is created as a separate resource outside of any Terraform Module. The main reason for this is, we use RBAC to assign permissions on Azure Resources. If you want to allow a Managed Identity access to Entra ID (to read groups, primarily), assigning permissions in Entra ID requires elevated privileges that we do not want to delegate to a Managed Identity. To prevent (re)creation of a Managed Identity as part of a module, we create it separately.

locals {
  runners = {
    tf = {
...

Note that we make the MSI specific to deploying Terraform resources (azurerm_user_assigned_identity.gitlab_runner["tf"].principal_id) owner of the subscriptions. Contributor is not going to be enough as we also want to use the pipeline to do role assignments and RBAC, therefor it needs owner permissions.

⚠️ Warning: This means that this MSI has very powerful privileges! Make sure you lock down your pipelines so that not just anyone can run them and make sure you do proper Merge Requests and code reviews!

Kubernetes resources

Remember we are using Kubernetes as the Gitlab Executor. What we are deploying is what can be described as a "runner manager", which will spin off containers (pods, actually) that will run the pipeline. Once the pipeline is finished, the pod is destroyed.

The Gitlab Runner is deployed using Helm. Gitlab maintains a Helm chart that you can find on https://gitlab.com/gitlab-org/charts/gitlab-runner/.

Gitlab Runner configuration is done in a config.toml file that we deploy using a template.

gitlabUrl: "${gitlab_url}"
unregisterRunners: true
terminationGracePeriodSeconds: 3600
...

Then we use Terraform to tranform the template and deploy the Helm chart:

locals {
  gitlab_runner_vars = {
    gitlab_url                = var.gitlab_url
...

To check for the latest version(s) of the chart:

helm repo add gitlab-runner https://charts.gitlab.io
helm repo update
helm search repo-l gitlab/gitlab-runner | head -5

The last part is to create the Kubernetes Service Account that "glues" the Managed Identity to the Gitlab Runner:

resource "kubernetes_service_account_v1" "gitlab_runner" {
  metadata {
    name      = "gitlab-runner-${random_id.this.hex}"
    namespace = kubernetes_namespace_v1.gitlab.metadata[0].name
    annotations = {
      "azure.workload.identity/client-id" = var.msi_client_id
      "azure.workload.identity/tenant-id" = var.tenant_id
    }
    labels = {
      "azure.workload.identity/use" = "true"
    }
  }
}

Finally some required Kubernetes resources to make this all work (I use rbac.create = false in the config.toml because I like to be in control of what is created. Setting this to true means you have to annotate the Service Account with the correct values as it gets auto-created).

resource "kubernetes_namespace_v1" "gitlab" {
  metadata {
    name = "gitlab-${random_id.this.hex}"
...

Runner registration

ℹ️ Note: This article describes the new way of registering runners. Please see https://docs.gitlab.com/ci/runners/new_creation_workflow/ how to migrate.

When deploying a runner, it needs to be registered against a group or repository. Each group or repository has its own unique token. You can do this from the CI/CD settings of the repository or group, create a project or group runner, fill in the details and out comes a registration token. But who wants do do manual? Let's automate this.

Terraform has a great provider for Gitlab, found on https://registry.terraform.io/providers/gitlabhq/gitlab/latest/docs. You can use it to fully automate your Gitlab environment, including repositories, groups, authorizations, integrations, etc. We'll focus on the gitlab_user_runner resource to get the registration token.

Each group or project in Gitlab has a unique id, whiich is hard to find and even harder to remember. We use the path to find the id, which is a lot easier to remember. If you use Terraform to also create your groups and projects, you can even reference the Terraform resource!

Terraform provider configuration is required for Gitlab and Kubernetes (the registration token is stored in a Kubernetes secret):

provider "kubernetes" {
  config_path = "~/.kube/config" # Need to create this file from the pipeline or run locally
}

provider "gitlab" {
  base_url = "https://gitlab.com/"
  token    = data.azurerm_key_vault_secret.gitlab_token.value # this can be a Group token or a PAT token with the create_runner scope
}

First, we need to determine if we are deploying a group runner or a project runner:

data "gitlab_group" "group" {
  count     = var.runner_type == "group_type" ? 1 : 0
  full_path = var.repo_path
}

data "gitlab_project" "project" {
  count               = var.runner_type == "project_type" ? 1 : 0
  path_with_namespace = var.repo_path
}

repo_path is the path to your repo, for example ascodenl/infra/tools.

Then, dependent on what type of runner you want, create a token and store it in a Kubernetes secret. Note the reference to a Kubernetes Service Account, this will become clear later on.

resource "gitlab_user_runner" "gitlab_runner_project" {
  count       = var.runner_type == "project_type" ? 1 : 0
  runner_type = var.runner_type
...

Using this in a Gitlab pipeline

Now that the runner is deployed with the proper permissions, it is time to create a pipeline to implement this in CI/CD.

Creating a full multi environment pipline is enough for a separate blog post, so here is the most important part:

before_script:
  - |
    if ! [ -x "$(command -v az)" ]; then
      echo -e "\e[33mWarn: az is not installed.\e[0m"
      exit 1
    else
      echo "Logging in to Azure using client_id $AZURE_CLIENT_ID..."
      az login --service-principal -u $AZURE_CLIENT_ID --tenant $AZURE_TENANT_ID --federated-token $(cat $AZURE_FEDERATED_TOKEN_FILE)
      if [[ ! -z ${ARM_SUBSCRIPTION_NAME} ]]; then az account set -n ${ARM_SUBSCRIPTION_NAME}; fi
      export ARM_OIDC_TOKEN=$(cat $AZURE_FEDERATED_TOKEN_FILE)
      export ARM_CLIENT_ID=$AZURE_CLIENT_ID
      export ARM_TENANT_ID=$AZURE_TENANT_ID
    fi

If OIDC is working correctly, the Azure token is stored in a file that is referenced in $AZURE_FEDERATED_TOKEN_FILE which usually points to /var/run/secrets/azure/tokens/azure-identity-token. Enabling OIDC on AKS deploys something called a "Mutating Admission Webhook" which takes care of getting a token that has a limited lifetime, and refreshes the token on expiration. If you are interested in how this works under the hood, look here.

I hope you enjoyed this, thanks for sticking around till the end. Until the next!