We Need More Data Science

Christian Garrett
5 min readOct 17, 2020

I have become more and more fascinated with Enterprise AI lately. Some of the work being done in how we manage, interpret, and apply data is incredible. One thing I have noticed is that the once nascent market of Data Science is exploding. What’s equally astounding is not just a need for more data scientists, but also the development of tools that enable better and more scalable Data Science — while enabling other business functions to do Data Science. What gets me excited is what happens when data scientists, engineers, business analysts, marketing directors, product leads, and HR leaders are all able to build and deploy AI based applications.

But first, let’s briefly revisit Artificial Intelligence (AI) — which can simply be defined as computer systems with the ability to perform tasks that ordinarily require human intelligence. Within AI, there are subsets of capabilities that are powered by Machine Learning (ML) and Deep Learning (DL). Machine Learning is the ability for computer systems to automatically learn and improve from experience without being explicitly programmed — accessing and parsing through data to repeat tasks or notice patterns. Deep Learning is a part of ML, but involves using neural networks to mimic how our brains learn. This allows machines to solve complex problems even when using a data set that is very diverse, unstructured and inter-connected. ML generally works best with numerical data, categorical data, time-series data, and text data. Deep Learning is more specialized for images, video, audio, and more difficult types of data.

Data Science is the practice of analyzing and interpreting complex digital data in order to assist in decision-making. Data Science applies ML and DL to data (numbers, text, images, video, audio, etc.) and produces specialized AI systems to do specific tasks — such as checking for risks in supply chains, or looking for fraud within banking transactions. These AI systems produce enterprise value in automating, optimizing, or producing actionable insight — which impacts earnings.

Now that we generally covered data science and the field of AI. Let’s go a bit deeper. First, it’s important to note that being a data scientist requires a diverse array of skills, and there are not enough of these highly skilled individuals. There are predicted to be 2.7 million open jobs in data analysis, data science, and related careers in 2020, with 39% growth in employer demand for both data scientists and data engineers by 2020 (source: IBM). In fact, data scientists have an average earning potential of $8,736 more per year than any other bachelor’s degree job (source: IBM).

Second, it’s important to know that data science sits in the center of a skills venn diagram of domain expertise (do you know your industry), programming skills (are you technically capable), and mathematical/statistical skills (can you apply the right thinking).

Third, the skills and tools for data science are rapidly advancing. The old way for making predictions and getting insights followed these steps:

  1. Prepare you dataset from the data source
  2. Import data
  3. Structure your dataset
  4. Model assessment and validation
  5. Collecting new data & retrain the model
  6. Deploy
  7. Monitoring & Management
  8. Make predictions and get insights

Most machine learning algorithms need parameterization and even if some empirical strategies can help, this is still complex and there is generally no deterministic way to find the optimal solution. There is also risk for error as the creation and maintenance of ML/DL models and AI systems involve choices and manual interventions that will impact the efficiency of the ML/DL pipeline.

The new way involves MLOps. Machine Learning Operations (MLOps) is an ML engineering culture and practice that aims at unifying ML system development (Dev) and ML system operations (Ops). Practicing MLOps means that you advocate for automation and monitoring at all steps of ML system construction — including integration, testing, releasing, deployment and infrastructure management. Simply said, MLOps is the technology and practices that provide a scalable and governed means to rapidly deploy and manage ML applications in production environments.

MLOps is being enhanced by enterprise tools to automate AI processes into simpler and more efficient steps. This is called AutoML. Automated machine learning (AutoML) is a general discipline that involves automating any part of the entire process of AI system applications. By working with various stages of the machine learning/deep learning process, engineers develop solutions to expedite, enhance and automate parts of the AI system pipeline. These tools enable data scientists to do their job better and faster. But these tools also will allow anyone to do data science work. Business analysts already use these tools, and soon we will see data science tools across all orgs within a business — from HR to Marketing.

So what’s the actual impact we are talking about here? For one, companies who lead in AI adoption are the ones who are investing more in their future. Look below at which sectors are the leading sectors measured against their adoption of AI. In fact, the global projected spend on AI technologies in 2020 was $125B and the projected global GDP impact by 2030 is $15.7T — yes trillion.

In conclusion, we need more data scientists to implement AI systems, and we need to empower data scientists with the best tools. We also need AutoML and data infrastructure and management tools to enable all kinds of business functions to do data science to scale AI systems and applications within their organization. This is still a growing market with immense potential and we are just beginning to see the breadth of impact it will have.

— Opinions expressed are solely my own and do not express the views or opinions of my employer, 137 Ventures

--

--

Christian Garrett

137 Ventures. Kansas Jayhawk. Revivalist. Futurist. I enjoy writing about all the diverse (and random) subjects that interest me. All opinions are my own.