My Neighbor
I love to share real-life stories when I give talks. I usually start out my sessions with a story on how I got interested in Big Data Clusters. It all starts with my neighbor Tom (not his real name) last year (2018). I was at the bus stop with my 5 yr old son on his first day of Kindergarten. As I am waiting patiently for the bus to arrive I hear a voice say,
“Are you in IT?”
I ignore it as I thought it wasn’t meant for me then I heard the question again. This time I turn around and notice my neighbor, Tom. He saw one of my SQLSaturday shirts I was wearing and that sparked the question. So we talk a little and I reply,
“yes I am, how about you? What do you do?”
He replies, “I work for a pharmaceutical company as a data scientist.”
Immediately I thought he’s too smart for me! So I start asking questions about his day-to-day activities. (Mind you, as a SQL DBA, I never worked with (or needed to know) big data, data scientists, etc.) He told me that he’s currently working on speeding up the first phase of FDA drug approval by utilizing ML (and algorithms). Tom continued to elaborate on what the FDA drug approval phases are. I’m not going to go into details (you can read the link if you want) but the first phase (or the beginning phases) require studies. A lot of this can be sped up with ML sifting and analyzing huge amounts of data. So instead of it taking (for example) 2-3 years for the first phase to complete, now you can do it in 8 months (or even less). Around the same time I started hearing about SQL Server 2019 and a new thing they are calling “Big Data Clusters.”
As I started doing research on what exactly is big data, big data clusters, ML and AI, Hadoop / HDFS, etc. I came across some jaw dropping statistics that I’d like to share with you.
Dr Evil
I came across a study that estimated, 90% of all data today was generated in the last two years. (like whoa!) For example, Walmart collects 2.5 Petabytes of data from 1 million customers every hour. Billions of Facebook messages, tweets, YouTube videos, etc.
Here’s a quote from Walmart’s CEO back in 2013,
We want to know what every product in the world is. We
want to know who every person in the world is. And we want
to have the ability to connect them together in a transaction.Kind of like Dr Evil eh? (Now, I’m not saying Walmart’s CEO is evil, but that quote is quite eerie?)
If you want to read more chilling facts on how much data Walmart collects on its customers, feel free to read this.
Industry Use Cases for SQL Server Big Data Clusters
Healthcare and retail are not the only industries that provide perfect use cases for Big Data Clusters. A couple more industries are finance, manufacturing, agriculture and public sector. There are tons of scenarios in which BDCs would be a perfect fit to analyze the organization’s big data.
For example, imagine the agriculture industry. Data can now be collected from soil sensors, GPS-equipped tractors, and external sources such as local weather channels, farmers who implement these precision agriculture technologies are gaining greater visibility into their operations. This in turn gives them better ability to asses risk and increase yields.
The future of big data is looking brighter than ever. Microsoft is making a huge, and wise, investment with SQL Server Big Data Clusters. BDCs is the “all-in-one” location to ingest, store, analyze, and prepare and train high volume and high value data. All under one umbrella. It would be wise to get on board sooner than later!