How to Deploy a Big Data Cluster to a Multi Node Kubeadm Cluster

In my previous post, I talked about deploying a Big Data Cluster on a single node Kubernetes cluster. That’s cool and all but what if you’re a business or organization that cannot have your data on the cloud for whatever reason? Is there a way to deploy a Big Data Cluster on-premise? Absolutely! I’ll walk you through setting that up this blog post. I will walk you through deploying a 3-node Kubernetes cluster, then deploying a Big Data Cluster on top of that.

Assumptions

There are a few assumptions before we get started:

  1. You have at least 3 virtual machines running with the minimum hardware requirements
  2. All your virtual machines are running Ubuntu Server 16.04, or 18.04, and have OpenSSH installed
  3. All the virtual machines have static IPs and on the same subnet
  4. All the virtual machines are updated and have been rebooted (see below for the command):

Command to update and reboot the servers:

sudo apt update && sudo apt upgrade -y
sudo systemctl reboot

Prepare All Nodes

Now that the above is done, it’s time to start preparing all the nodes (master and worker nodes). I will refer to servers as nodes from this point forward.

Let’s start by connecting to all the nodes via SSH and run the below command on each node to add the node to the /etc/hosts file:

Note: The /etc/hosts translate hostnames or domain names to IP addresses.
echo $(hostname -i) $(hostname) | sudo tee -a /etc/hosts

1. Disable swapping by running the below command on each of your nodes:

sudo sed -i "/ swap / s/^/#/" /etc/fstab
sudo swapoff -a

2. It’s time to import the keys and register the repository for Kubernetes by executing the below command on each of your nodes:

sudo curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo 'deb http://apt.kubernetes.io/ kubernetes-xenial main' | sudo tee -a /etc/apt/sources.list.d/kubernetes.list

3. You will need to configure docker and Kubernetes prerequisites on the machine by running the below command on each of your nodes:

KUBE_DPKG_VERSION=1.17.0-00 
sudo apt-get update && \
sudo apt-get install -y ebtables ethtool && \
sudo apt-get install -y docker.io && \
sudo apt-get install -y apt-transport-https && \
sudo apt-get install -y kubelet=$KUBE_DPKG_VERSION kubeadm=$KUBE_DPKG_VERSION kubectl=$KUBE_DPKG_VERSION && \
curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash

4. Set net.bridge.bridge-nf-call-iptables=1. (On Ubuntu 18.04, the following commands first enable br_netfilter).

. /etc/os-release
if [ "$VERSION_CODENAME" == "bionic" ]; then sudo modprobe br_netfilter; fi
sudo sysctl net.bridge.bridge-nf-call-iptables=1
Note: You can read more about iptables and Kubernetes here

You are now done getting all the nodes prepared. Now we will focus on the node that will become the master.

Configure the Kubernetes Master

1.  Create an rbac.yaml file in your current directory with the following command:

cat <<EOF > rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: default-rbac
subjects:
- kind: ServiceAccount
name: default
namespace: default
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
EOF
Note: You can read more about using RBAC authorization in Kubernetes here

2. It’s time to initialize the Kubernetes master on this node. The example script below specifies Kubernetes version 1.17.0. The version you use depends on your Kubernetes cluster:

KUBE_VERSION=1.17.0
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --kubernetes-version=$KUBE_VERSION

3. Note the kubeadm join command (the last two lines in the screenshot below. Your output will be unique). You will need this in upcoming steps when joining worker nodes to the cluster.

Deploy BDC to Multi Node Kubernetes Cluster

If for some reason you cannot find it, run the below command on the master node to print it out:

kubeadm token create --print-join-command

4. Set up a Kubernetes configuration file in your home directory by executing the below command on your master node:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

5. Configure the cluster and the Kubernetes dashboard by running the below command on your master node:

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
helm init
kubectl apply -f rbac.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml
kubectl create clusterrolebinding kubernetes-dashboard --clusterrole=cluster-admin --serviceaccount=kube-system:kubernetes-dashboard

You can check the master status by running kubectl get nodes (as shown below):

kubectl get nodes

You can find the status by running the kubectl get nodes command on the master node.

Configure the Worker Nodes

Now that the master node is configured and the status is ready, it’s time to join the worker nodes to the Kubernetes cluster and configure them.

1. Execute the kubeadm join command output on each of your worker nodes. Wait until you see the following output (and brought back to the prompt):

kubeadm join command

Now if you go back to the master node and execute the kubectl get nodes command, you will see all the nodes in the cluster (as shown in the screenshot below):

kubectl get nodes all cluster nodes

2. Run below command on each of the worker nodes to create local storage volumes:

wget setup-volumes-agent.sh https://raw.githubusercontent.com/microsoft/sql-server-samples/master/samples/features/sql-big-data-cluster/deployment/kubeadm/ubuntu/setup-volumes-agent.sh

3. Run the below command to grant the .sh file execution privilege:

chmod +x setup-volumes-agent.sh

4. Run the script by executing the below command:

sudo ./setup-volumes-agent.sh

This created 25 volumes in the /mnt/local-storage/ folder. (If you do an ls /mnt/local-storage/ you will see all the folders.)

Back to the Master

1. Download the following file on master node:

wget local-storage-provisioner.yaml https://raw.githubusercontent.com/microsoft/sql-server-samples/master/samples/features/sql-big-data-cluster/deployment/kubeadm/ubuntu/local-storage-provisioner.yaml

2. Run the below command on the master node to apply it:

kubectl apply -f local-storage-provisioner.yaml
Note: You can read more about the local storage class in Kubernetes here

3. It’s time to install azdata on the master node. First you need to get the packages needed for the installation by running the command below:

sudo apt-get update
sudo apt-get install gnupg ca-certificates curl wget software-properties-common apt-transport-https lsb-release -y

4. Download and install the signing key on the master node:

curl -sL https://packages.microsoft.com/keys/microsoft.asc |
gpg --dearmor |
sudo tee /etc/apt/trusted.gpg.d/microsoft.asc.gpg > /dev/null

5. Add the azdata repository information by running the below command against the master node:

Ubuntu 16.04, run:

sudo add-apt-repository "$(wget -qO- https://packages.microsoft.com/config/ubuntu/16.04/mssql-server-2019.list)"

Ubuntu 18.04, run:

sudo add-apt-repository "$(wget -qO- https://packages.microsoft.com/config/ubuntu/18.04/mssql-server-2019.list)"

6. Update repository information and install azdata by running the below commands against the master node:

sudo apt-get update
sudo apt-get install -y azdata-cli

7. Verify the azdata install by running the below command against the master node :

azdata --version

You should see the latest version of azdata printed out. To get the latest about azdata go here.

Deploy the Big Data Cluster

We are almost across the finish line, trust me! Now that we have the Kubernetes cluster working and all the worker nodes joined, it’s time to kick off the Big Data Cluster deployment.

1. SSH onto the master node and execute the below azdata command:

azdata bdc create

2. Accept the terms by pressing y then pressing return.

3. For the deployment type, choose option 3 (“kubeadm-dev-test”) and press return.

4. Type azdata username/password (choose whatever you want)

5. Type “local-storage” for the kubernetes storage class prompts (twice).

You should see the “Starting cluster deployment” as show in the screenshot below:

Start BDC deployment on Multi Node K8s Cluster

After 20-30 minutes you should see a print out similar to the below screenshot:

BDC deployed successful

Finally, run the below command to get a list of all the endpoints (see screenshot below):

Note: You will have to log into azdata by doing an azdata login then giving it the namespace: mssql-cluster, then your username and password. That will log you into azdata and set the mssql-cluster as the default namespace. Then you can run the command below.
azdata bdc endpoint list -o table

BDC endpoints

Now you can use the SQL Server Master Instance Front-End endpoint to connect via Azure Data Studio or SSMS. If you have any questions or encounter any issues, feel free to contact me so I can help you out.

Additional Resources

If you are eager to learn more about Kubernetes I highly recommend Anthony Nocentino’s courses on Pluralsight. You can view his Pluralsight author page here. What? You don’t have a Pluralsight subscription? No problem! You can also reach out to him on Twitter @nocentino for a FREE 30-day trial code. Tell him “Mo” sent you.

Enjoy!

7 Replies to “How to Deploy a Big Data Cluster to a Multi Node Kubeadm Cluster”

  1. getting error : libreadline.so.6: cannot open shared object file: No such file or directory when attempting the azdata bdc create after I accept the license terms

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.