Andrew Ng has fueled the engine of Artificial Intelligence (AI) for more than 15 years. In a recent interview he talked about Data-Centric AI (DCAI) that is the next big step in the field of AI. He also co-organized a workshop at NeurIPS 2021 which focused on bringing key players working in DCAI. The key principle behind DCAI is “moving from model-centric to data-centric AI”.
I have worked on many aspects of Data and AI at IBM and now at Kyndryl for last 20 years. I found that Data and AI for enterprise domain was not as simple as building models by crunching data and pushing the models to production to solve real customer problems. We spent 80% of the time on data engineering such as ETL (extract, transform and load), cleansing the data, standardizing the data, etc. These 80% of data engineering is done independent of the models. The AI community mostly focused on collecting “big data for building deep learning models”. Enterprise clients have lots of “big data”, but they are not usable to build sophisticated deep learning models. There are many ways to create models, create synthetic data, scalable AI algorithms, etc. DCAI is the next big thing for dealing with complex enterprise problems where we shift from building sophisticated models using lots of data to engineering the data that helps the models to become consumable.
Enterprise clients face many challenges to adopt AI systems and realize business value.
- Enterprise clients have data spread across multiple repositories
- Data is stored and consumed in many different formats.
- It is not easy to get access to all these different sources due to regulatory compliance.
- It is impossible to bring all these data to one place to crunch.
- Enterprise clients have many siloed business units with shadow IT (information technology) and shadow data.
- Enterprise clients lack resources, talent people, cost, budget, and management approvals), legacy process, etc.
- Enterprise clients do not have AI models or share AI models even across siloed business units.
Enterprise clients need lots of help in picking right AI models and this usually involves building new models or borrowing models from similar problem domain. Industry-specific models, such as financial services, healthcare, and retail, can provide base AI models for enterprise clients. There are two levels of AI iteration going on as illustrated in the figure below. The outer (slower) iteration focuses on building training models with some acceptable level of performance using limited amount of data. The inner (faster) iteration focuses on fine tuning data for a model generated from the outer iteration. The inner iteration can even generate new (synthetic) data that can support the model. The dual iteration AI system continuous until some acceptable level of performance is achieved, within some bounded cost.
Enterprise clients have a deep understanding of their domain and of the data. Due to cost cutting and retiring talent, especially data in legacy system, enterprise clients are struggling to extract knowledge and build automation systems. So, it is important to develop and create tools and services to help enterprise clients to transform their domain knowledge and core business data into valuable assets to drive business outcomes.
We should explore DCAI as a key strategy to help the enterprise customers with their Data and AI needs. There are five areas that require immediate attention to make DCAI consumable to enterprise clients.
- DCAI Professional Services: Enterprise clients need help in navigating the data, assessing potentials and possibilities, including model building, value creation from data, etc.
- Industry Vertical DCAI: Industry verticals such as financial services, healthcare, public sector etc., can share models across business units and clients. We need to create base set of industry AI models that can be used accelerate DCAI. We need to create DCAI Hub where companies can share AI models (like Docker Hub or Git Hub).
- DCAI Management: Managing DCAI process can be quite complex, including data privacy, governance, people, tools, etc.
- DCAI Engineering: DCAI engineering should start with “as code” model, changes tracked through DevOps and CI/CD process and apply software engineering discipline.
- DCAI Operations: DCAI operations should follow approaches like SRE (site reliability engineering), production engineering, change management, and other service management process.
At Kyndryl we are exploring DCAI to help our enterprise clients to adopt AI systems. Please reach out to me if you want to co-create DCAI strategy and solution to improve your business outcomes.