2021 Predictions for Data Science Teams and Technologies

December 14, 2020 kenneth sanford

2020 was a year of irrational responses and overcorrections. And I’m not just talking about our responses to Covid-19 or movements in the stock market. I am talking about how data science was treated in the modern enterprise.

Data science “took it on the chin” in 2020. Across industries (except for the unexpected covid-winners), my contacts have indicated reduced budget for technology and people. Data science headcounts and technology spend was down in 2020. As we look to 2021, here is my forecast.

1) Move to a Federated Data Science Organization

Centralized teams have been gutted and they will not be rehired quickly. My industry contacts say they have been tasked to improve reporting while reducing headcount. This has led to changing priorities and an emphasis on reporting skills over predictive modeling.

Changing budgets and remote work will lead to smaller and more agile teams in the coming year. As headcount returns, look for analysts and data scientists to be hired and managed in a federated environment, closer to the business and less centralized.

2) Data Science cloud budgets under attack

For many data scientists and analysts 2020 was the year they finally were allowed to work in cloud-based environments. AWS Sagemaker and Databricks (primarily Azure) were introduced to IT and quickly adopted for prototyping and production jobs. But then a dangerous reality set in….

99% of data science is experimentation. And Cloud doesn’t differentiate.

Costs are EXPLODING. Look for IT to reign in these costs by encouraging a hybrid and Multi-cloud infrastructure approach that emphasizes experimentation on cheap compute.

3) Model explainability takes a back seat to data explainability

I know its early in the lifecycle of ML explainability companies but I’m still tossing water on this. MLOps has been around for at least 4 years and the leader in the space (Algorithmia) is just now only starting to get real commercial traction. Fiddler.ai, Truera, and features of Dataiku, Domino, Datarobot, H2O.ai and others are all trying to explain why models predicted certain outcomes. These products are nice-to-have features but they are far from essential. Data explainability is essential and fortunately several companies are starting to make chatter.

Tellius, Sisu are two of the companies talking about auto-EDA. Datarobot, Alteryx and Dataiku have some features as well. There are open-source packages (SweetViz is one) too. Good Auto-EDA tools use ML concepts to help users explain interesting observations.

There are sooo many companies that are trying to auto-explain models. But, let’s get our data story straight first.

4) Mainstream IT begins to understand “Flattened data”: Take a hike SQL

SQL is great to query data from Relational Database Management Systems (RDMS) but SQL is terrible for analytics. Is IT finally figuring this out?

You need “flat data” in order to build machine learning models. That is, the data must be in their most atomistic form to be of any use. There are 3 reasons why I see hope for flattened data in 2021.

Feature Store Companies: Companies like Tecton.ai, dotdata, and others help data scientists build “features” for machine learning models. That is, data that are stored at the observation level for use in a model of individual behavior.
Databricks: Databricks is really just hosted Spark. And Spark is a way to keep atomistic data in memory for use in downstream models. The more Databricks usage we see by IT, the more they are understanding the value of “flat data”.
Virtualization plays: Data scientists need access to the data from transactional systems to build features. Data virtualization (Dremio, Denodo, Apache Arrow) provide self-service access of transactional data systems. This is a step forward.

5) Analytics Literacy 2.0

Everyone has PowerBI so what! While I am thrilled to see so many clients adopting this powerful dashboard tool, technology is not enough. Companies need to change processes to get value out of these new technologies.

Analytics literacy 2.0 goes beyond the “buzzword bingo” of most organizational change approaches to analytics literacy. Instead, companies to change the culture around analytics by involving SME’s in the analytics process. How does this happen?

Look for more posts on the subject but here is a graphic explaining the process. In short, you need your SME’s to convey the data generating process to the analyst/data scientist.

Agile Data Science Process Flow

So that’s it. What did I miss? What do you think is in store for data science in the next 12 months? Let us know what you think.

See this content in the original post