Does the Next.js Rendering Host performance depend on available CPU count of the Kubernetes pod and defining HPA rule

Overview

Nextjs is a modern and widely accepted JAMStack friendly React framework and is part of Sitecore’s headless development suite. It offers various benefits such as improved Performance, enhanced SEO capabilities , data fetching at component level as well as features like SSG, SSR and ISR etc.,

Implementing Sitecore Nextjs Rendering Host on Kubernetes entails deploying and managing the infrastructure through the k8s cluster. By utlizing container orchestration features, we can effectively oversee the resource usage and scale the environment as required. In this blog, we will explore the optimization of CPU limits for the Rendering Host pod in order to harness the full potential of Nextjs. We will also delve into how this optimization contributes to enhancing throughput and response time. Eventually, we will also explore how Horizontal Pod Autoscaler rule can be defined for better auto scaling during peek load of the website.

Rendering Host pod configuration and its performance with No CPU Limit

In short, the Rendering Host I’m using is hosted in AKS, specifically the cluster falls under the Standard tire. It offers 3 node pools to accomodate our Authoring, Delivery and various Jobs instances. Individual nodes within the node pool possess capacity of 8 CPUs.

Initially, there are no set CPU limits for the Rendering Host pods. As a result, they are expected to utilize the complete available CPU capacity of the node as required. Also the pod is scaled to 2 Replicas as you can observe in it’s below specification.

Let us check the Blaze meter report how this pod responds against a load test with 1000 virtual users.

From the BM Graphs provided above, it’s evident that the Average Throughput and Average Response Time is not at the antipated levels. Additionally, we can notice that the response time increases gradually as the load increases and it stays as it is.

📍 Notes

The various factors listed below to enhance the Nextjs performance, have been already reviewed and validated for correctness.

Implementing proper caching on Content Delivery servers
ISR, SSR Caching and ISR revalidation interval is proper
Enhancing data fetching strategies to minimize unnecessary requests to content delivery servers.
Optimizing the Layout Response to exclude unnecessary data and retain only the essential elements required for proper component functionality

If we take a look at the Azure Insights of the AKS cluster, the CPU and Memory usage of the Rendering Host pods are under the control and it is only consuming around 45 % of its maximum CPU capacity of the node and it not pushing more.

Furthermore, I have not seen any issues with other aspects such as the Total Number of Failed Requests, Failed Connections on the web database.

Nextjs performance with a certain CPU Limit

The outcomes were nearly similar to the above when I executed various Blazemeter load tests by assigning a specific CPU, Memory limits while scaling the Rendering Host to 2 pods.

For example, the below load tests were conducted on Rendering Host pods with a replica set of 2, each configured with a CPU limit of 2000 milliCPU, equivalent to 2 CPUs each.

Multiple BM rests have been run with this configuration and there are some fluctuations but there is no significant difference in terms of Average Throughput and Response Time. Yet, the other Azure Metrics remain same as previously observed.

This looks weird !! Although there is plenty of available CPU capacity, we would have expected varying results in the Blazemeter load tests.

Let us see in detail about this.

Understanding the Nextjs Workload – Under the hood, it’s Node.js

Upto this point, we have been thinking that giving enough CPU limit/No CPU limit to the Rendering Host pod container will outperform. But there is more happening under the hood.

Next.js is built on top of the Node.js and it relies on Node.js as its underlying runtime environment to handle the requests, to execute server-side Javascript code, to handle SSR and to perform various other server-side tasks.

And Node.js is Single-threaded by its design. It uses single main thread to perform all the aforementioned tasks. Despite being a single-threaded in nature, it follows event-driven architecture to effectively handle multiple concurrent operations taking full advantage of the multi-core systems.

A Single-thread cannot run on multiple CPUs simultanously. And this explains why the Nextjs workload performs poor, despite not setting any CPU limits for our Nextjs Rendering Host pod. That’s why no matter how many CPUs we throw at it, it always limited to 1 CPU. Hence there have been no improvements in the BM results.

What is the best approach to running a Node.js workload?

A more effective approach to running Node.js workloads is to only allocate them just 1 CPU and then scale them horizontally as needed.

Let us apply this change to our Rendering Host yaml definition.

Scaling the Rendering Host Horizontally

As depicted in the below image, now our Rendering Host container is scaled horizontally, now encompassing 6 pods, each set with a CPU limit of 1000mc.

Rendering Host Scaled Horizontally with 1 CPU Limit

With this above changes in our RH pod, I have run the same load test with 1000VUs from BM and the results are as below.

Hooray 🎉✨ !!! A great results in both Throughput and Response time !!!

The table below provides a concise comparison between running Rendering Host Pods without any CPU limits versus horizontally scaling the Rendering Host to 6 pods, each with 1 CPU core.

	2 RH: No CPU Limit	6 RH: 1 CPU Core
Avg. Response Time	~ 4 Seconds	~ 96 Milliseconds
Avg. Throughput	~ 279 Hits/Sec	~ 450 Hits/Sec

Comparision

✔ Verdict

Based on the above different analysis with BM, coupled with the inherent single-threaded nature of Node.js, it is evident that more effective way to run a Nextjs Rendering Host workload involves provisioning only a single CPU and then scaling it horizontally based on the load
Running multiple instances of Rendering Host application in a seperate nodes can lead to better performance, as each instance can process the requests independently

Defining Horizontal Pod Autoscaler rule

The Horizontal Pod Autoscaler feature in kubernetes can be employed to adjust the number of the rendering host pod count based on the website load. Based on the various BM tests conducted so far, I have defined a rule to increase the pod count to maximum of 12 when the average CPU utilization surpasses 80 % of the node’s total CPU thresold. Reversely, when the CPU utilization drops below the given threshold, Kubernetes will automatically decreases the pod count to defined minimum thresold. The below given yaml definition depicts it.

📍Notes

Ensure that your Kubernetes cluster’s tier configuration provides sufficient space for accommodating additional pods, as doing so may result in extra infrastructure costs
Check your Kubernetes cluster’s Node pool and node configurations to define your scaling strategy
Remember that achieving improved website performance doesn’t solely rely on distributing the load across horizontally scaled Next.js Rendering Host instances. It also involves considering other aspects, such as enhancing memory, optimizing network bandwidth, refining CD server configurations, and optimizing data fetching strategies etc.,
Offload the asset types such as image, scripts, fonts etc., of your website to CDN as much as possible
Make sure to apply proper cache headers
Perform load testing with different CPU limit of your Rendering Host pod to determine the initial replica count. And employ a Horizontal Pod Autoscaler rule to scale the container based on the website load

Share this:

Related

Leave a comment Cancel reply