Why You Should Go with AKS when Deploying a Big Data Cluster

A few months ago I posted a blog on deploying a BDC using the built-in ADS notebook. This blog post will talk about the deployment options available for Big Data Clusters and benefits of going with Azure Kubernetes Service (AKS).

BDC Deployment Options

According to the Microsoft documentation, there are three ways to deploy a Big Data Cluster:

  1. Minikube
  2. Kubeadm
  3. AKS

I’ll go into each and list the pros and cons.

Minikube – This method is for a stand-alone, single node cluster deployment. It is mainly used for “dev/test” environments. I don’t think there are any “pros” worth discussing. The cons are that you will need a minimum of 64 GB of RAM, and at the end of the day you’re stuck with a single node Kubernetes cluster. Pointless if you ask me.

Kubeadm – This option is if you want to host the Big Data Cluster on-premises. You will have to deploy your own Kubernetes cluster with a minimum of 64 GB RAM on each host/node. The pro to this option is that you are in total control of the underlying Kubernetes cluster. The con is that you are in the control of the Kubernetes cluster. :) You will be in charge of not only the worker nodes but the master node. You will need to ramp up on your Kubernetes administration skills. It is doable, just a bit of a steep learning curve for the DBA, or data professional, who’s looking to get their hands wet with BDCs.

Azure Kubernetes Service (AKS) – This option allows you to deploy a Big Data Cluster on a managed Kubernetes cluster in Azure. The pros are many. From a Kubernetes Admin perspective, you only have to worry about managing and maintaining the worker nodes. The master node is managed by Azure. You choose the VM and node count before deploying the AKS cluster. The rest is taken care of. You do not have to worry about a shortage of RAM, CPU or even storage. The only con that I can think of is cost. If you are a small business or an individual who is looking to learn BDCs then you want be cautious of the costs associated with deploying multiple virtual machines, etc. Azure does offer a free 30-day trial and $200 credit that you can use to learn how to deploy a BDC. I *highly * recommend using that to get your hands wet. That is plenty of credit to learn the deployment process. (I created a 4-part series on deploying a BDC that provides all the links to get started. Check out the series here.)

Why You Should Go With AKS

In a nutshell, ease and convenience. As a DBA, learning something new can be daunting. Especially if that new thing has a steep learning curve like Big Data Clusters. Microsoft has helped ease that “struggle” with AKS. You can use Azure Data Studio’s built in notebook that will deploy a BDC on AKS in no time. You don’t need to become a Kubernetes expert, or even know much about containers to get your hands dirty with BDCs. I love deploying Big Data Clusters on AKS because it allows me to focus on investigating the technology rather than spending time figuring out how to deploy a Kubernetes cluster, figure out storage, networking, etc. I am *not* saying do not learn Kubernetes.

What You Won’t Have to Worry About

Remember, a BDC is nothing without Kubernetes. What I *am* saying is you can speed up the learning process of BDCs by deploying it on an AKS cluster. That way Microsoft Azure handles the master node, CPU, RAM, storage, networking, etc. while you focus on learning and exploring how BDCs are setup and work. For example, deploying a BDC on AKS will provide you with load balancing such as Azure Load Balancer. This provides an external IP and DNS names. Azure will also manage the master Kubernetes node as well as upgrades of Kubernetes version.

One Reply to “Why You Should Go with AKS when Deploying a Big Data Cluster”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.