Spinning-up your own auto-scalable Selenium Grid in Kubernetes: Part 2

7 min readApr 19, 2020

In the Part 1 of this blog series, we have learnt how to spin up a Selenium Grid from scratch inside a Kubernetes cluster. In this second part, we will learn more about the auto-scalability aspect of this Selenium Grid Cluster.

Auto-scalability is an essential attribute for any application to make it more elastic, scalable and available.

Kubernetes has an in-built auto-scaling feature called Horizontal Pod Auto-Scaler (HPA) which automatically scales the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization.

But, this native HPA does not fit in to our requirement as it can only scale on the basis of metrics like CPU/Memory/Requests per second of each Pod and it is more suitable for state-less containers.

Selenium Grid Auto-Scalability Requirement

In case of selenium, our auto-scalability rule will not be governed by the resource consumption of each browser pod, in-fact we would need to monitor the Selenium Hub console to know how many browser requests have been queued and accordingly we have to scale up/down.

Let’s try to list down the different scalability rules for our Grid:

Scaling Up:

By default at-least one browser pod should always be running in our cluster. This long running Pod will be re-started once per day.
We have to continuously poll the Selenium Grid Console to know if there are any queued requests and then we have to increment the scale of our selenium-node deployment by number of queued requests.
We have to keep an upper limit to this auto-scaling so that our selenium grid should not end up consuming all the resources of our K8s cluster. So if we touch that max scale limit then we should stop auto-scaling and let all the new in-coming requests to be queued up.

New Scale = Present Scale + No.of Queued Requests

Scaling Down:

Lets’ say there are four pods running and only one pod is busy doing the execution so if we will apply the similar rule which we have applied for auto scaling then in case we have to bring down three of the pods.

New Scale = Present Scale - No. of free nodesP.S. This rule will only be applied when all the pods are free.

Here is one limitation of Kubernetes which is an opened bug , when we scale down the pods, k8s can bring down any pods randomly and we cannot say that I want these specific pods to be brought down. This is highly critical for our requirement as it can affect any running execution.
So, here is one caveat, for scaling down we have to wait for all the executions to be finished and then we can bring down our pods scale back to one.

Building our own Custom Auto-Scaler

Now as we know, what needs to be done, so let’s start implementing the requirement. First we need to figure out a way to change the scale of a K8s deployment programmatically. Kubernetes exposes a rest API to perform any administrative tasks on the cluster like changing the deployment scale.

Let’s dive more deeper in to this Kubernetes Rest API by going through the API documentation.

When you (a human) access the cluster (for example, using kubectl), you are authenticated by the apiserver as a particular User Account. Processes in containers inside pods can also contact the apiserver. When they do, they are authenticated as a particular Service Account (for example, default).

Every namespace has a default service account resource called default. You can list this and any other serviceAccount resources in the namespace with this command:

kubectl get serviceaccountsThe output is similar to this:NAME      SECRETS    AGE
default   1          1d

But the problem is that this default service account does not have permissions to change the scale of the deployments so we would need to define a new service account for our Selenium auto-scaler.

Creating a New Service Account:

YAML for our new service account:

Now, we have to create a Role to which we will assign all the required permissions to modify our deployment scale.

YAML for Role:

Now we have to bind our role with the service account using this role-binding object.

YAML for Role-Binding:

Lets’ create all these three objects defined in the above three YAML files.

kubectl create -f auto-scale-robot-sa.yaml && \
kubectl create -f auto-scale-robot-role.yaml && \
kubectl create -f auto-scale-robot-rb.yaml

Now we can see a new service account created in our namespace:

Generating the API Token for our service account:

Now we would need to extract the api token for our service account so that we can authenticate ourselves while trying to change the scale of our selenium-node deployments using the Rest APIs.

First we will extract the secret details which holds this api token:

kubectl get serviceaccounts/auto-scale-robot-sa -o yaml

Now we can extract the actual token from the secret highlighted in the above image:

kubectl describe secrets/auto-scale-robot-sa-token-7z9db

Let’s try to use this token in one of the CURL request to change the scale of our chrome node from 1 to 2

API_TOKEN=<token value extracted above>
curl -X PATCH https://192.168.99.100:8443/apis/apps/v1/namespaces/default/deployments/selenium-node-chrome-deployment/scale \
--header "Authorization: Bearer $API_TOKEN" \
--header 'Accept: application/json' \
--header 'Content-Type: application/strategic-merge-patch+json' \
--insecure \
--data '{"spec": {"replicas":  2}}'

Amazing! Now we can see there are two nodes on grid console, which means our CURL command did the magic. So we kind of found a way to programmatically change the scale on demand basis.

Packaging the auto-scaler Application:

Now what we have to do is to build a small application in any language of our choice which will read the state of the Grid from the console and accordingly change the scale of our node deployment using the above Rest API calls.

I preferred to build this application in Java using Spring Boot and Spring Schedule Tasks which will simply read the following parameters at run time.

Also it is important that we containerize our application so that it can be executed inside the K8s cluster. I have already containerized my application which I built and it can be downloaded from docker hub.

Here is the link for the Application on docker hub and here is the source code.

Here is the YAML to run this auto-scaler spring boot application inside our K8S cluster.

Remember to update the values for k8s.host and k8s.token in this YAML under the env section.

Now, lets try to run our auto-scaler application.

kubectl create -f selenium-grid-k8s-autoscaler-deployment.yaml

We can see the deployment of our auto-scaler is successfully done.

Once the auto-scaler application pod is running we can simply start monitoring the logs and can see the automatic scaling of the Selenium Grid is happening.

kubectl logs -f selenium-grid-k8s-autoscaler-deployment-7ccc88dcd5-cnhsf

We can see that every 10 seconds we are polling the Selenium Grid console and if the scaling is required, it will change the scale and also it will wait till that time the updated scale is reflected on the Selenium Grid Console.

Now we can change the value for this property “node.chrome.max.scale.limit“ from “2” to any large value and which can bring the scale to that level provided our K8s cluster has sufficient bandwidth.

Here is a quick live demo where you can see the request coming in real time and our auto-scaler is scaling up the grid automatically.

I have been using this kind of auto-scalable selenium grid inside an enterprise environment for last couple of years and I haven’t seen any outage in our grid and also there is no manual maintenance required.

Please feel free to provide feedback in your comments.