If you’d like to stay updated, without doing the heavy work, feel free to register for my newsletter. I will email out blog posts of my journey down the wonderful road of BDCs.
So far, Microsoft does not have a simple way to create a Big Data Cluster. It’s a bit cumbersome of a process and the learning curve is a bit steep. However, Microsoft is currently working on making it easier to deploy a Big Data Cluster via Notebook in Azure Data Studio and eventually some type of “deployment wizard.” But for now, the only option is to do it the long way.
This blog post will be the “pre-staging” of the machine needed to create the BDC. This can be a physical or virtual machine. It doesn’t need much RAM or CPU. The sole purpose of this machine is to connect to Azure and run the commands necessary to create and deploy the BDC. There are certain tools and utilities that need to be installed on this machine. You can watch the YouTube video I created for this or continue to read below.
Let’s get started.
I have VMware Fusion running on my Mac Mini (2018). I created a VM with 4 GB RAM. It’s running Windows Server 2016 and I applied all the server updates. By the way, you don’t need Windows Server. You can use Windows 10. I’ve done this exact same thing on my Surface Book 2.
There are a few things in terms of access/credentials that you’ll need first:
- An Azure account – Sign up for a free account (comes with $200 credit)
- Make sure to download and install Azure Data Studio Insider version (download link)
Once you have everything above, then you will need to install the following tools/packages:
- Python version 3.5+
- pip3 Package
- Azure CLI
- kubectl
- azdata
- curl for Windows
To make the package installation process easy, I will install the Chocolatey Windows Package manager. (You can read about Chocolatey here.)
Open Powershell (Run as admin) and run the following commands:
Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1')) choco feature enable -n allowGlobalConfirmation
Next, I will use Chocolatey to install the Azure CLI:
choco install azure-cli
Install Python and then git by executing the below command:
choco install python3 choco install git
Next, I will install kubectl:
choco install kubernetes-cli
And finally I will install azdata:
pip3 install -r https://aka.ms/azdata
Make sure to download curl for Windows as you will need it in later on. That’s pretty much it for now. You can find part 2 of my BDC series here. It goes into creating the Kubernetes cluster on Azure Kubernetes Service (AKS). Fun times ahead!
ERROR: Cannot install -r https://aka.ms/azdata (line 18), azdata_cli_notebook and azdata_cli_sql because these package versions have conflicting dependencies.
The conflict is caused by:
azdata-cli-core 20.3.14 depends on prompt-toolkit==3.0.7; python_version >= “3.6”
jupyter-console 6.2.0 depends on prompt-toolkit!=3.0.0, !=3.0.1, =2.0.0
mssql-cli 1.0.0 depends on prompt-toolkit=2.0.0
got error:
ERROR: Command errored out with exit status 1:
command: ‘c:\python38\python.exe’ -c ‘import sys, setuptools, tokenize; sys.argv[0] = ‘”‘”‘C:\\Users\\dnavi\\AppData\\Local\\Temp\\pip-install-kttvl_6r\\pymssql\\setup.py'”‘”‘; __file__='”‘”‘C:\\Users\\dnavi\\AppData\\Local\\Temp\\pip-install-kttvl_6r\\pymssql\\setup.py'”‘”‘;f=getattr(tokenize, ‘”‘”‘open'”‘”‘, open)(__file__);code=f.read().replace(‘”‘”‘\r\n'”‘”‘, ‘”‘”‘\n'”‘”‘);f.close();exec(compile(code, __file__, ‘”‘”‘exec'”‘”‘))’ egg_info –egg-base pip-egg-info
cwd: C:\Users\dnavi\AppData\Local\Temp\pip-install-kttvl_6r\pymssql\
Complete output (7 lines):
c:\python38\lib\site-packages\setuptools\dist.py:45: DistDeprecationWarning: Do not call this function
warnings.warn(“Do not call this function”, DistDeprecationWarning)
Traceback (most recent call last):
File “”, line 1, in
File “C:\Users\dnavi\AppData\Local\Temp\pip-install-kttvl_6r\pymssql\setup.py”, line 88, in
from Cython.Distutils import build_ext as _build_ext
ModuleNotFoundError: No module named ‘Cython’
—————————————-
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
Please watch https://youtu.be/FDHZ3itTfaI as it tells you exactly how to get your base machine setup for BDCs