In my previous post, I talked about deploying a Big Data Cluster on a single node Kubernetes cluster. That’s cool and all but what if you’re a business or organization that cannot have your data on the cloud for whatever reason? Is there a way to deploy a Big Data Cluster on-premise? Absolutely! I’ll walk you through setting that up this blog post. I will walk you through deploying a 3-node Kubernetes cluster, then deploying a Big Data Cluster on top of that.
There are a few assumptions before we get started:
- You have at least 3 virtual machines running with the minimum hardware requirements
- All your virtual machines are running Ubuntu Server 16.04, or 18.04, and have OpenSSH installed
- All the virtual machines have static IPs and on the same subnet
- All the virtual machines are updated and have been rebooted (see below for the command):
Command to update and reboot the servers:
sudo apt update && sudo apt upgrade -y sudo systemctl reboot
Prepare All Nodes
Now that the above is done, it’s time to start preparing all the nodes (master and worker nodes). I will refer to servers as nodes from this point forward.
Let’s start by connecting to all the nodes via SSH and run the below command on each node to add the node to the /etc/hosts file:
echo $(hostname -i) $(hostname) | sudo tee -a /etc/hosts
1. Disable swapping by running the below command on each of your nodes:
sudo sed -i "/ swap / s/^/#/" /etc/fstab sudo swapoff -a
2. It’s time to import the keys and register the repository for Kubernetes by executing the below command on each of your nodes:
sudo curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - echo 'deb http://apt.kubernetes.io/ kubernetes-xenial main' | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
3. You will need to configure docker and Kubernetes prerequisites on the machine by running the below command on each of your nodes:
KUBE_DPKG_VERSION=1.17.0-00 sudo apt-get update && \ sudo apt-get install -y ebtables ethtool && \ sudo apt-get install -y docker.io && \ sudo apt-get install -y apt-transport-https && \ sudo apt-get install -y kubelet=$KUBE_DPKG_VERSION kubeadm=$KUBE_DPKG_VERSION kubectl=$KUBE_DPKG_VERSION && \ curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash
4. Set net.bridge.bridge-nf-call-iptables=1. (On Ubuntu 18.04, the following commands first enable br_netfilter).
. /etc/os-release if [ "$VERSION_CODENAME" == "bionic" ]; then sudo modprobe br_netfilter; fi sudo sysctl net.bridge.bridge-nf-call-iptables=1
You are now done getting all the nodes prepared. Now we will focus on the node that will become the master.
Configure the Kubernetes Master
1. Create an rbac.yaml file in your current directory with the following command:
cat <<EOF > rbac.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: default-rbac subjects: - kind: ServiceAccount name: default namespace: default roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io EOF
2. It’s time to initialize the Kubernetes master on this node. The example script below specifies Kubernetes version 1.17.0. The version you use depends on your Kubernetes cluster:
KUBE_VERSION=1.17.0 sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --kubernetes-version=$KUBE_VERSION
3. Note the kubeadm join command (the last two lines in the screenshot below. Your output will be unique). You will need this in upcoming steps when joining worker nodes to the cluster.
If for some reason you cannot find it, run the below command on the master node to print it out:
kubeadm token create --print-join-command
4. Set up a Kubernetes configuration file in your home directory by executing the below command on your master node:
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
5. Configure the cluster and the Kubernetes dashboard by running the below command on your master node:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml helm init kubectl apply -f rbac.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml kubectl create clusterrolebinding kubernetes-dashboard --clusterrole=cluster-admin --serviceaccount=kube-system:kubernetes-dashboard
You can check the master status by running kubectl get nodes (as shown below):
You can find the status by running the kubectl get nodes command on the master node.
Configure the Worker Nodes
Now that the master node is configured and the status is ready, it’s time to join the worker nodes to the Kubernetes cluster and configure them.
1. Execute the kubeadm join command output on each of your worker nodes. Wait until you see the following output (and brought back to the prompt):
Now if you go back to the master node and execute the kubectl get nodes command, you will see all the nodes in the cluster (as shown in the screenshot below):
2. Run below command on each of the worker nodes to create local storage volumes:
wget setup-volumes-agent.sh https://raw.githubusercontent.com/microsoft/sql-server-samples/master/samples/features/sql-big-data-cluster/deployment/kubeadm/ubuntu/setup-volumes-agent.sh
3. Run the below command to grant the .sh file execution privilege:
chmod +x setup-volumes-agent.sh
4. Run the script by executing the below command:
This created 25 volumes in the /mnt/local-storage/ folder. (If you do an ls /mnt/local-storage/ you will see all the folders.)
Back to the Master
1. Download the following file on master node:
wget local-storage-provisioner.yaml https://raw.githubusercontent.com/microsoft/sql-server-samples/master/samples/features/sql-big-data-cluster/deployment/kubeadm/ubuntu/local-storage-provisioner.yaml
2. Run the below command on the master node to apply it:
kubectl apply -f local-storage-provisioner.yaml
3. It’s time to install azdata on the master node. First you need to get the packages needed for the installation by running the command below:
sudo apt-get update sudo apt-get install gnupg ca-certificates curl wget software-properties-common apt-transport-https lsb-release -y
4. Download and install the signing key on the master node:
curl -sL https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/microsoft.asc.gpg > /dev/null
5. Add the azdata repository information by running the below command against the master node:
Ubuntu 16.04, run:
sudo add-apt-repository "$(wget -qO- https://packages.microsoft.com/config/ubuntu/16.04/mssql-server-2019.list)"
Ubuntu 18.04, run:
sudo add-apt-repository "$(wget -qO- https://packages.microsoft.com/config/ubuntu/18.04/mssql-server-2019.list)"
6. Update repository information and install azdata by running the below commands against the master node:
sudo apt-get update sudo apt-get install -y azdata-cli
7. Verify the azdata install by running the below command against the master node :
You should see the latest version of azdata printed out. To get the latest about azdata go here.
Deploy the Big Data Cluster
We are almost across the finish line, trust me! Now that we have the Kubernetes cluster working and all the worker nodes joined, it’s time to kick off the Big Data Cluster deployment.
1. SSH onto the master node and execute the below azdata command:
azdata bdc create
2. Accept the terms by pressing y then pressing return.
3. For the deployment type, choose option 3 (“kubeadm-dev-test”) and press return.
4. Type azdata username/password (choose whatever you want)
5. Type “local-storage” for the kubernetes storage class prompts (twice).
You should see the “Starting cluster deployment” as show in the screenshot below:
After 20-30 minutes you should see a print out similar to the below screenshot:
Finally, run the below command to get a list of all the endpoints (see screenshot below):
azdata bdc endpoint list -o table
Now you can use the SQL Server Master Instance Front-End endpoint to connect via Azure Data Studio or SSMS. If you have any questions or encounter any issues, feel free to contact me so I can help you out.
If you are eager to learn more about Kubernetes I highly recommend Anthony Nocentino’s courses on Pluralsight. You can view his Pluralsight author page here. What? You don’t have a Pluralsight subscription? No problem! You can also reach out to him on Twitter @nocentino for a FREE 30-day trial code. Tell him “Mo” sent you.