Rebuilding the data stack for AI

Artificial intelligence may be dominating boardroom agendas, but many enterprises are discovering that the biggest obstacle to meaningful adoption is the state of their data. While consumer-facing AI tools have dazzled users with speed and ease, enterprise leaders are discovering that deploying AI at scale requires something far less glamorous but far more consequential: data infrastructure that is unified, governed, and fit for purpose.

That gap between AI ambition and enterprise readiness is becoming one of the defining challenges of this next phase of digital transformation. As Bavesh Patel, senior vice president of Databricks, puts it, “the quality of that AI and how effective that AI is, is really dependent on information in your organization.” Yet in many companies, that information remains fragmented across legacy systems, siloed applications, and disconnected formats, making it nearly impossible for AI systems to generate trustworthy, context-rich outputs.

“Really, the big competitive differentiator for most organizations is their own data and then their third-party data that they can add to it,” says Patel.

For enterprise AI to deliver value, data must be consolidated into open formats, governed with precision, and made accessible across functions. Without that foundation, businesses risk “terrible AI,” as Patel bluntly describes it. That means moving beyond siloed SaaS platforms and disconnected dashboards toward a unified, open data architecture capable of combining structured and unstructured data, preserving real-time context, and enforcing rigorous access controls. When the groundwork is laid correctly, organizations can move toward measurable outcomes, unlocking efficiencies, automating complex workflows, and even launching entirely new lines of business.

That value focus is critical, says Rajan Padmanabhan, unit technology officer at Infosys, especially as enterprises seek precision in the outputs driving business decisions. Rather than treating AI initiatives as isolated innovation projects, leading companies are tying AI deployment directly to business metrics, using governance frameworks to determine what delivers results and what should be abandoned quickly.

“We see this big opportunity just with AI literacy with business users, where they’re very eager to understand how they should be thinking about AI,” adds Patel. “What does AI mean when you peel the covers? What are the pieces and the building blocks that you need to put in place, both from a technology and a training and an enablement standpoint?”

The possibilities ahead are substantial. As AI agents evolve from copilots into autonomous operators capable of managing workflows and transactions, the organizations that win will be those that build the right foundation now.

“What we are seeing as a new way of thinking is moving from a system of execution or a system of engagement to a system of action,” notes  Padmanabhan. “That is the new way we see the road ahead.”

The future of AI in the enterprise will be determined by whether businesses can turn fragmented information into a strategic asset capable of powering both smarter decisions and entirely new ways of operating.

This episode of Business Lab is produced in partnership with Infosys Topaz.

Full Transcript:

Megan Tatum: From MIT Technology Review, I’m Megan Tatum, and this is Business Lab, the show that helps business leaders make sense of new technologies coming out of the lab and into the marketplace.

This episode is produced in partnership with Infosys Topaz.

Now, recent advancements in AI may have unlocked some compelling new industrial applications, but a reliance on inadequate data models means that many enterprises are hitting a brick wall. AI and agentic AI in particular place a whole new set of demands on data. The technology requires greater access, context, and guardrails to operate effectively. Existing data models often fall short. They’re too fragmented or siloed. Data itself often lacks quality. To bridge the gap, they require an AI-ready upgrade.

Two words for you: data reconfigured.

My guest today, are Bavesh Patel, senior vice president for Go-to-Market at Databricks, and Rajan Padmanabhan, unit technology officer for data analytics and AI at Infosys.

Welcome, Bavesh and Rajan.

Rajan Padmanabhan: Thank you. Thanks for having us.

Bavesh Patel: Thanks for having us.

Megan: Fantastic. Thank you both so much for joining us today. Bavesh, if I could come to you first, when we talk about AI-ready data, what exactly do we mean? What new demands does AI place on data, and how does this impact the way it needs to be structured and used?

Bavesh: Yeah. Great question. Appreciate you hosting us today. I think that obviously the whole world is enamored with AI because of all of the power that we can all see as users. AI is now democratized across hundreds of millions of users. And when we think about enterprises and businesses using AI, the quality of that AI and how effective that AI is really dependent on information in your organization, and that’s data. And what we found is that most enterprises, their data is kind of locked away in these different applications and different systems. And it’s very difficult to get a good view of, what is all my data? How trustworthy is it? How recent and fresh is it? And all of that is being injected into the AI. Unless you have a proper understanding of your data, the ability to ensure that it’s data that’s accurate and that can be used so that the AI can take advantage of it, you’re actually going to end up having terrible AI.

We see a lot of customers spend time on cleansing their data, organizing their data, making sure it’s access controlled correctly, and that tends to be the fuel of good AI.

Megan: Yeah. It’s such a foundational thing, isn’t it? But it can be missed, I think, quite easily. Rajan, what difference can having AI-ready data really make for enterprises as they unlock that full potential of AI and its applications?

Rajan: First and foremost, thanks for having us. It’s a pleasure. I think in continuation of what Bavesh talked about, see, data and AI is pretty synonymous. And similarly, the consumer AI and enterprise AI and enterprise agentic AI are different because first and foremost, the business needs to have the context. That context from your enterprise information, which is not only structured, both structured and unstructured and user-generated contents and all forms of data is going to be very, very critical to really get the context right, and really get any model that you pick. That’s where the platforms like Databricks really help with the plethora of models or whether you want to build your own models or whether you want to ground the model based on your data. That is going to be very, very critical. That is where getting the data for AI is going to be very, very critical.

The third critical part, and this actually will be one of the roadblocks for adoption of AI. That’s why if you see the AI adoption on the consumer side is skyrocketing, but on the enterprise side, the enterprises are struggling is primarily around the precision of their output, because you are taking a business decisions where you are taking a buy decision, you are taking a sell decision, or you are trying to recommend something, recommend the content. It could be 20 different use cases. For that, the precision is going to be very critical. We are seeing our customers, the successful customers, definitely for the precision to be more than 92% is not aspiration, that is a must-have. If you have that, definitely being that AI data is going to be the entrepreneur right now for that.

Megan: And I suppose if we’ve outlined there how critical this is, where should enterprises start then, professional perhaps, the level, what are the foundations when it comes to building an AI-ready data model?

Bavesh: Yeah. And I think Rajan hit the nail on the head. I mean, enterprises are grappling with a different set of problems than consumer AI. The first thing is that you’ve got to get a handle on your data. As I mentioned, a lot of the data is locked in. Ensuring that you have ability to put your data in a place where you can understand the holistic view of as much of your data as possible. That kind of starts with putting your data in open formats. A lot of the valuable data today in an organization is locked away in some proprietary SaaS app or some system, and all the datasets aren’t connected together to form that context. The first step is to really do an analysis of what is your data estate? What are the critical pieces of data that need to be put into a place where you can start to understand them and how they’re connected to one another?

Thinking about how do you set up your data catalog, thinking about how do the relationships between the data assets work, putting data governance around it, that seems to be the first step. And if you think about how ChatGPT was built, it took all the data on the internet and then aggregated it, synthesized it, and then built these transformer models, while enterprises, they don’t really have a handle of all their data within the organization. That’s the first foundation that you really want to think about. The second thing is that you don’t want to just go ad hoc, go and do random AI projects. You really need to be thinking about business value. A lot of our customers are looking at AI much more strategically in that they want to be able to get projects on the board with wins and then generate business value.

Building an AI value roadmap, which is connected to how well your data is organized, those two things seem to be foundational to how do you launch AI successfully in your organization.

Megan: That value piece is so important, isn’t it? And as I understand it, Infosys and Databricks have worked closely together to guide organizations through this transformation. I wondered, can you share some examples of the impact you’ve seen enterprises you’ve worked with, Rajan, what difference has it made to the ways in which they can integrate more sophisticated AI and agentic AI applications?

Rajan: Well, that’s a very, very good question. What both Databricks and Infosys has done is we have come up with, a kind of a framework first. First and foremost, it all needs to start with the value. One of the largest food products company where we collaborated together, what we have done is we have applied this framework. The framework consists of six different things. First and foremost, very critical is the value management, which Bavesh touched upon. We have worked together to come up with a 3M measurement framework, what we call adaptability, business value, and then responsible. You can’t just go and do a garage project. It has to be measurable. It should be responsible, follow all those things. That is going to be very critical. And we helped this client to prioritize, which will give them the most value for money, the investments that they are making.

The second critical part here is it is not like most of the enterprises today are not everybody’s AI-born companies. Most of them were born during analog days; most of them were born in digital days. There are companies which are applying AI for modernization, because a lot of your historical information, which is actually helping you to build that long-term context. And that is where we have worked closely with some of the native tools of Databricks, like Lakebridge or the AI assistants that are there, and then create composable services on top of it to help the clients unlock the value bringing into Databricks. And then the second part where we help the client is exactly to the point, the readying of data. Now you brought in the data, now you have to bring both the structured, unstructured, analytical and all these aspects.

And that is where the third layer, we closely work with the Databricks, which is part of leveraging all the great capabilities within the Databricks, be it Unity Catalog, be it the open formats, or be it the gateways and other aspects. We were able to make the data available for this client. What has really helped our client, the third part, is Agent Bricks, which is one of the differentiatiors. It gives you the flavor for the enterprise. That is where we have closely worked, and we built some of our industry-specific agents, be it CPG, be it energy, be it FS. And for this client, what we have done is we have taken some of those CPG-specific use cases. Either it could be on the HR space or the procurement space or on the marketing space. And this has really helped our client be able to build a business capability surrounding this and unlock eight to nine use cases, we call it as a products, agentic AI products, which can really drive more value for them, solving the real business problems.

And this kind of a comprehensive set of frameworks plus set of suites of services, plus our solution assets, Infosys solution assets, as well asunlocking the value from Databricks has really helped these clients. And we see similar patents for a lot of these successful engagements where we were able to continuously drive the value by applying this framework actually.

Megan: Right. Sounds like it made a real material difference. Rajan mentioned a few of the tools in Databricks catalog there, Bavesh. I know you’ve recently worked to launch an operational database for AI agents and apps. I wonder how does a platform like that help organizations in this journey? What makes it different from some of the other platforms out there right now?

Bavesh: Databricks has come to market with a new offering called Lakebase, which is really an OLTP database where you can build your AI apps. And if you think about it, there’s really two main types of data in an enterprise. There’s all the historical data, which is all the things that have happened, and that’s really what your analytics is based on. You have an old app system where you have put all your historical data and Databricks has come to market with what we call the Lakehouse, which is essentially a data warehouse with all of your data that is not operational in nature. It’s historical data. And I think that Lakehouse concept is really pushing forward with AI because a lot of our customers have thousands of users within their business and they need to get data. And what they’ve done is they’ve actually gone down the BI route, which is really building a dashboard or a report.

Most organizations have had thousands of these dashboards and reports proliferate across the organization and then they need to be customized. It just takes a long time for users inside of the business to actually get access to the data. AI now is really making that a lot easier from just the analytics perspective where we can now democratize access to the data, which has really been the holy grail for most data teams. They really want to get out of the way and just give the right data to the right people inside of the business with the right access.

With a product like Genie at Databricks, you can just use English language or whatever your language is to ask questions of the data. And it’ll give you back data that answers your questions in context. It’ll give you not just what ChatGPT will give you, which is information about a topic that’s on the internet, but it will actually tell you, “Well, why did my sales numbers not reflect what I expected in the month of April?”

It’ll give you some root cause analysis based on your enterprise data. Genie is going to be one of these things that’s really important where it’s going to truly kind of democratize data inside of the business. That’s kind of this OLAP world, which is what the Lakehouse is. More recently, we’ve come to market with what we call the Lakebase, which is the OLTP world. What we’re finding is that agents are now being deployed in these organizations, and those agents need a place to keep all of their orchestration, all of the context of what’s happening in that particular workflow. On the one hand, you’ve got users just asking questions. On the other hand, the next chapter is going to be around automating an entire business process. If you’re taking a function like generating a campaign in marketing, right? There are a lot of tools you use and a lot of steps you use.

An agent can come in and really automate a lot of that. But on the back end of that agent, you’re going to need to stand up a real-time database to keep track of all the things that the agent is doing. That’s what Databricks has brought to market, which is this OLTP Lakebase solution. The innovation that we have brought to market is that it’s a modern kind of Postgres database where we have separated the compute and storage, very much like what we did with the data Lakehouse with the data warehouse. But on the Lakebase, the data is on one copy inside of your cloud storage, and then the compute is separated and it’s serverless. You can do things like branching and you can start up the OLTP database really quickly. What we found is that agents are actually starting these Lakebases because they can very quickly go start one up, keep it running, put it down when it needs to, make a copy of it.

Agents are doing this, then they need the velocity, they need a cost-effective solution. And the beauty of all this is when you take the OLTP, which is all around the Lakebase and the real time, and you take the OLAP, you now have one system for all your data. You don’t have to copy the data around, you don’t have to manage all the permissions, you can set the context against it. We see these AI apps being really the future of how businesses run, where they’re going to take away all of the bottlenecks that humans are having to do repetitive work and automate these using LLMs and all these new technologies. We want to be the default for powering all that because we believe that our Lakebase technology is going to be faster, cheaper, and more secure for an AI database.

Megan: Sounds like a real game changer. And we’ve touched on this a couple of times already, I mean, this idea of value. We know that engaging the commercial value of investments into AI is really high on the priorities right now for senior leaders. How important is this value measure piece when it comes to creating AI-ready data systems, Rajan? How can organizations ensure they’re monitoring what is delivering and what isn’t?

Rajan: This is the paramount importance and most of the successful AI implementations or agentic AI implementations really required this value measurement. I’ll just extend the client example that I talked about, the large food products company, the global products company, to explain this question. I just want to create a metaphor. When the initial digital world came, we have a lot of these analytics primarily around defining those performance management KPIs, fact-based decisioning and other things were evolving over a period of time. Typically, a lot of these metrics are going to be very critical for them to measure how a function, how a business is doing. On a similar line for the value measurement, if I take the same example of the client, what is very critical for an organization is actually to map your outcome that you are expecting.

Iin this case, how do I optimize my spend on direct and indirect purchases? So by applying AI, I would like to identify the areas where I can optimize the spend. That means one of the critical measures that you have is, what is your indirect expense classification and what spends you have been classified and how much you are able to reduce by bringing in this. Establishing these measures and the metrics is going to be very, very critical. And once you establish these base metrics and the measurement, and the beauty of it is some of these metrics, to just extend what Bavesh was talking about, the capabilities that Databricks gives you, like metrics view, features, tools, and other things would actually help you to translate those AI telemetries, business telemetries that is coming from your applications into a measurable metrics in terms of an outcome, which you can actually measure using the Genie room for value management measurement.

Then what happens is two things that you can take, the use case, the products that as I said for this client, the products that we build either on the procurement side or on the marketing research side, if you find there is a value either because of VAC, they identify that they’re able to optimize or it is able to reachability, what is the reach, you can either accelerate that use case and further fine tune that product to expand it. Or there are, if you find it is not really driving the value or I’m not able to see the value that it is going to deliver, you can very well do a fast failure method rather than trying to make it work, you can understand and then you can take a call to pivot it to something else different.

There are three aspects here. What we see from our experience, not only with this client across some of our other clients from industrial manufacturing or FS or in the energy, is by setting up this metrics-driven valuation method upfront and then leveraging the capabilities to establish, transform these telemetries, signals into a measurement, what we call an AI compass room so that you really measure the business stakeholders, whether it is coming from a marketing office or whether it is coming from supply chain office or whether it is coming from a CFO office where they can say, “Hey, this is what it is intended to do, this is what the current measurement, and this is where it’s failing that can help them to pivot.” And this will actually drive and democratize AI, all the agent decay across the enterprise, and that really drives the value.

This is going to be one of the critical part that enterprise needs to do it. And that is where the six part framework that I talked about, applying that framework like value office, applying the ready for AI, applying the transformation fabric. Then the third part is the governance, which is going to be the entrepreneur of this. Then running your operations, not based on SLA, based on the experience level agreements and business metrics for you to continually measure, bringing all these six layers is going to be very critical. That’s when we see the organizations are very successful, and some of our proven examples exactly do the same that this is going to be very critical for organizations from a measurement standpoint.

Megan: Lots of tangible ways there that you can actually gauge value here. And you touched on governance and the impact of AI on governance is another huge talking point among senior leaders and interactions with data are a core part of that. To what extent is having the right governance and security protocols an integral part of having AI-ready data? To Bavesh, what scenarios do these systems need to handle? What does that mean for data models?

Bavesh: This is becoming kind of the prerequisite to deploying a successful AI project. I think MIT produced a report that said 95% of these new AI projects fail to actually generate business value. A big reason for that is you can go and prototype and stand up and vibe code a pilot, but when you’re actually moving a workload into production, you realize that governance becomes so critical.

So what do we really mean by governance? I think the first thing is getting your data in order, like I said, in open formats. Most companies realize now that the way they engage with their customers, the way they develop a drug, the way they approve a person for a credit limit increase, all of that enterprise information is actually their competitive advantage. Because you can go and use a frontier model like ChatGPT or Claude that everybody has access to.

Really the big competitive differentiator for most organizations is their own data and then their third-party data that they can add to it. Getting your data into an open format so you can understand your data and understanding your data is where governance comes in. Because when you think about governance, you really want to be able to find the data.

If I’m an end user or if I’m building an AI product, I want to know what data’s available to me. Can I trust the data? How fresh is the data? Is it coming from my analytics world or do I need a real-time system like a OLTP system? I need to find the data. I also need to make sure that access is controlled in a way that doesn’t cause any huge headaches from my organization. This becomes critical. If I have a whole bunch of PDFs that have purchase orders in them, who actually has access to all that data?

In a clinical trial, for example, in healthcare, you really want to ensure that people across trials don’t have visibility to patient data. Maybe the model that was used to build that was running across trial. Who has access to all the data? Who has access to only parts of the data? You really have to think about this. We also look at semantics of the data. Rajan brought this up right at the beginning of this, which is what is the context? How do we think about the metrics and all the things that the business users know in their head? We need to start codifying that somewhere. We have a product at Databricks called Unity Catalog where you can do the discovery, the access and the business semantics. You also want to share the data.

And in the world of agents, what we see is something called agent sprawl. In a very short order, just like how SaaS applications became very prevalent within any organization where they really solved a business problem. You go to a line of business and you say, “I need to be able to do credit underwriting” or “I am doing a prior authorization use case or pick thousands of use cases.” There’s a SaaS app for that. Much like that, there’s going to be this world in which agents are going to come into play, and most organizations are going to have lots of agents running all the time, but the reality of it is that how did that agent perform? What was the feedback loop from the user? What was the cost of running that workload and is it going up dramatically? And if you don’t have a way to monitor, to understand, and trace all the questions and answers and responses at scale, you’re going to find yourself in a big pickle. This actually could hurt your organization because users will be very confused about what to do.

When you look at governance, most organizations are recognizing that they have to start to understand what is it that they have put in place from a systems, from a process, from a tooling standpoint, focus on one use case, build out the governance for that, but build it in a way that’s going to allow you to become repeatable. AI is not going to be about one use case or two use cases. It’s whoever builds the flywheel of building many use cases in a safe, secure way, in a cost-effective way that’s driving a business outcome. If you don’t apply governance, it’s going to be very hard.

At Databricks, we made a big bet on governance four or five years ago. This is one of the main reasons our company’s growing right now because we can ensure that there’s quality data that’s going into all of your AI. You can use things like Genie and you can use things like Agent Bricks and you can build apps using Lakebase. None of that really works without governance. It’s really what we call the brain inside of Databricks.

Most of our customers spend a lot of time inside of Unity Catalog. And the great news is that AI is helping governance get set up much more quickly. We have a customer that three years ago, they were trying to get all of the data assets across all their domains from the customer, from the loyalty app, from the e-commerce engine. They had to go and map out all this data assets. AI is now doing a lot of their work for them. The human in the loop is just checking things.

We’ve made this much easier with AI. We always think about AI as a business use case and an outcome, which I think is going to be where the biggest value is. But at Databricks, we’re using AI inside of our platform to make it much easier to operate and to make it much easier to provide all the right things for your business. This is a super critical part of how we plan to innovate as AI takes fruition in the market.

Megan: And Rajan, Bavesh touched on this a little bit there, but does the integration of Agentic AI add another layer of complexity here too? What new consideration around governance does that raise?

Rajan: That’s a very, very valid question. I would like to take a metaphor to really explain. We are getting into the world of self-driving cars, robotaxis, and other things. While that takes us to the autonomous world, but still there are rules that you need to adhere to when you are driving on a road. The reason I’m bringing this metaphor is because what is actually required is actually adhering to the rules and different topographies, different things, depends upon where you are driving is going to be very, very critical. The complexity that agents are going to add is basically how you operate with those constraints.

For example, as a UTO, I can do 10 things, but say if I cannot approve a discount for more than 70% or I cannot give something as a bonus for someone because that is a part of the CFO, which an agent should be aware of.

That is one aspect, applying the constraints around it and making sure that the agents are adhering to the constraints. The second set of complexity that it builds is the tools to access. As a business, in today’s world, when you define a process, certain processes need a certain set of tools to really actionize it. There are certain entitlements, only people entitled to do certain things based on their identity, based on the need or the situation need, you need to govern. The third is information sharing. While MCP and other aspects are great, UCP and other aspects are great, but one critical thing is what you need to share, what you don’t need to share. And those are the critical considerations.

The last part is learning and relearning. Sometimes when you learn good things, you should keep something. Sometimes it is better for you to completely remove it and reevaluate in a newer way, relearn it in a newer way. These are all the critical things that are required. On the similar line for agents, it is going to be paramount, because when you are operating agents for an enterprise, you need to know, learn, and adhere to certain compliance related rules, business related constraints, and then the entitlement identity, and then sharing whatever that apply to a physical human will also start applying to an agent. That is where this is going to be very critical. This requires a new set of operating systems. That doesn’t really mean now get out of a new thing. That is where I’m just interpreting how Bavesh touched upon the Unity Catalog.

The best part that which we see and some of our clients that which are implementing is extending the Unity Catalog and the capabilities like now you can catalog the tools, catalog the MCP as well as catalog these agents, and then govern those agents based on the constraints, ground them based on the constraints.

It’s going to be very, very critical. Doing it not later, but starting that as part of your strategy and enforcing this as one of the critical dimensions of when you measure the value is also going to be very critical for an organization. It is like making sure that not only building the autonomous car, but as well as making sure that the car drives as per the rules of the road, not going rogue.

Megan: Lots to think about there. Fascinating stuff. Thank you. Just to close, with a quick look ahead, we all know the pace of development in AI and Agentic AI is so rapid. For those organizations that can prioritize AI-ready data now, what are the most compelling use cases for the technology that you can see coming to the fore in the next few years, Bavesh?

Bavesh: I think the excitement level is at its peak. We’ve seen so much investment in AI. I think the reason why there’s a lot of excitement is because you can look at the early adopters and you can see massive amounts of gains that these organizations are seeing. The one thing I will tell you is that the companies that there’s really three categories and the companies that I think are doing well, a lot of them started out with just copilots and things that are just giving people quick answers. Think about it as making an individual productive. That is the first phase. And the ROI on that has been somewhat questionable. With something like Genie, it makes it a lot more effective because it’s actually on your data and your data is contextualized in your organization. I think that’s one level of area that we’re going to see a lot of innovation. We’ll see most organizations just start to get the right information to the right person at the right time. And that has been a dream for a lot of organizations.

The second one is around automating entire business processes. We see functions within marketing, like I described earlier, or whether you’re going through a process of rebates for a company. There’s a whole bunch of steps involved where you have to go into three different apps and export data from Excel and put it over here. There’s thousands of people doing very laborious, monotonous, repeatable work. These agents are really going to help get an immense amount of not only productivity for the business process, but it’s just going to make things faster. Processes that took weeks are now going to take days. Processes that took days are going to take hours and minutes now.

One trend we’ve seen is that the AI world is so dynamic. In a world where you got lots of different players, you want to think about first principles, what are the foundations? You want to think about owning your data, making sure you have a handle on your structured and unstructured data. You want to put governance on that. But the other thing that you want to make sure that you don’t do is lock yourself in.

Today, if you think about it, Gemini is really good with multimodal. Anytime you have pictures or videos or things like that, Gemini just is super good. Whereas if you’re writing code, Claude is really good. If you’re just doing certain types of questions around introspection, ChatGPT is really good. What you really want is an open data platform where you can build your open AI on multiple clouds, which is what we built at Databricks.

I think that’ll help with the second piece, which is you can pick and choose because when you build these agents, you don’t have to be locked into just one. You should be picking the best quality and the best security and the best ROI and cost for a particular workload. One workload may use multiple of these models, and they might be even specific industry models. You need a system and a platform that can really handle this complexity.

I think the third category is business reimagination. A lot of people talk about this where, yes, you’re going to go and take the data and make it available and give everybody access to the data. You’re going to make existing processes much more efficient. But the third thing is there’s going to be brand new things that come out of it.

We have a very large customer who’s a bank and they have built a product that they didn’t have a year ago. Essentially, it’s machine learning and LLMs helping treasury departments forecast what their balances are going to be because they have more data at their fingertips. Historically, it took a long time for the data to get to the bankers. They were not able to really predict what a balance would be for a treasury department. Think about this for a big enterprise company, they have now built a brand new data AI solution that they’re monetizing and it’s generated hundreds of millions of dollars in the first six months. We’re seeing brand new lines of business open up and that is going to be really exciting because that’s where a lot of the transformation is going to happen. There’s going to be productivity. There’s going to be kind of automation at the business process level. Then there’s going to be these big new things that we didn’t even imagine that people are going to come up with.

We are actually seeing the early signals of this in every industry. We see retailers getting data at the hourly and the minute level so that they can integrate much more closely with their supply chains. We’re seeing much more targeted customer 360-degree use cases where as retailers or as consumers, we get annoyed by ads, but now it’s so contextualized and you have so much information about what really matters to your target customer, you’re giving them value added kind of information and that’s engaging them more. There’s a whole bunch of innovation happening with agentic commerce and things like concierge and virtualized shopping.

You look at any industry, there’s definitely new ways of doing things. This is what’s really exciting about AI, but you really have to not get too far ahead without thinking about what are the foundational things. You mentioned this earlier, which is open data platform, making sure you have governance correctly, making sure you think about your historical analytical data and your application data that’s going to be real time, having a good foundation to build on, that’s going to allow you to scale and move more quickly and compete in this new world.

We’re very excited about what we’re seeing with our customers and what they’re building. And honestly, that’s the best part about being in my role at Databricks, which is our teams really go to customers and say, “What are the outcomes you’re driving?” The early signals have been super positive. We’re seeing companies that get serious about all the foundational elements and really are methodical about building really outcome-based AI solutions, that 5% of projects that are being successful, those are wildly successful. That’s why we’re growing as a company because once you get a good project under your belt, that gets visibility within executives.

The last thing is that historically, a lot of tech has been in the IT department. You get the business designing how they want to go to market and how they’re going to compete and what products and services they want to offer. IT was the enabler and in many cases became the cost center and was relegated to rationalizing the portfolio of spend and tools.

But now we’re seeing the business kind of take the lead with AI where they want to understand, they want to know, “Hey, what can I be doing now that was not possible before?” We see this big opportunity just with AI literacy with business users where they’re very eager to understand how they should be thinking about AI. What does AI mean when you peel the covers? What are the pieces and the building blocks that you need to put in place, both from a technology and a training and an enablement standpoint? We’re spending a lot of time with executives helping them along this journey. We definitely see a lot of amazing opportunities ahead.

Megan: Yeah. So much innovation going on. And finally, how about yourself, Rajan? What on the horizon is exciting you the most?

Rajan: I think Bavesh covered quite a bit, but I think the way I’m seeing is today predominantly we are talking about labor shift. That means unlocking the potential of human or shifting the current way of working to the new way of working with the more efficiency game. It’s predominantly more of an efficiency game. I think that is what we are seeing now and the majority of the successful use cases around the labor shift. But what is pretty promising is the two kinds of shift, the business shifts.

What we are seeing as a new way of thinking or the new thing that is coming up is moving from system of execution or a system of engagement to system of action. That is the new way we see the road ahead. That is where some of the points that I touched upon. The business wants to have access to it, but how does it really make the real difference for it?

One classical example that I could clearly see which we have implemented for one of our customers primarily in the manufacturing space, is around the lifecycle of creation of a product and then publishing the content around the product in line with their different B2B marketplaces. Some of those, you are not just talking about recommending, creating, but actually you are able to reimagine this process, which used to involve five different departments, now can be done much faster, but at the same time gives you that veracity in terms of the decisioning that you are able to do and as far as how you’re able to actionize. That is the second thing which we are seeing.

The third part I think is also going to be is the way how the commerce has evolved. There is also not beyond that agentic commerce, but I think what we are seeing is that agent to agent commerce, agent to human commerce and agent to agent payments, agent to human payments, and then the content monetization.

These are the new set of business opportunities like building new business agentic products. It could be for family techs, it could be for on the consumer side, or it could be on the industrial technology side. These are going to be what I’m calling the economy shift, labor shift, business shift, because that is going to bring a new set of system of actions, moving them from the system of executions or the typical SaaS application with the bolt-on agentic, the so called agentic application. That is going to be a major transformation, and we are underway. But on the technology side, what is very critical for entrepreneuring is in today’s world you have data, analytical data, operational data, and then there is intelligence, there are different facets of it.

I think both this analytical core and operational core is going to really come into one. That’s why we are so gung-ho about the releases of Lakebase and other things because that is the way the future is going to drive. When they are really thinking about being ready for AI technology use cases, they should really think, how do you really create this unified core for the newer world?

The second part is people have to reimagine today, if I take SAP as an example, you do hundreds of edge applications, business applications needed to integrate another thing. Typically, we create sprawl of these integrations. One technology use case, people can say, “Hey, how do I really create a domain-based service mesh on top of this unified core and how do I make it more agentic integration ready?” That is one of the technology use cases that we are advising to the client.

I think now with a lot of the new areas that are coming around SAP, BDC with the Databricks, and this zero-based integration, that makes them rethink the way they need to integrate, the way they need to do things.

The third part, I think from a technology investment and technology, the use cases that most come for the technology that I would talk about is don’t just talk about now. This is the time that you have to, the way you own the people, the FTEs for your organizations. Agents are going to be your new FTEs.

That means that some of the new technology paradigm is going to be you will end up creating these co-intellects within your organization. That means you need to invest on what we call this agentic grid, where it becomes like a unified agentic fabric where every other agents can really collaborate and integrate and building on top of the same, the unified operational analytical core, the unified agentic integration on top of it, which is going to create a new set of experiences, agentic experiences rather than the traditional experiences or conversational experiences.

Then the new collaboration methods are going to be some of the critical aspects from a technology side that people have to really think from a technology standpoint. To start with, I would say you start looking at it from a data standpoint, building that unified core, building that unified integration and building that collaboration layer for both sharing and collaborating with intelligence as well as the agentic collaboration all governed under single umbrella. That is going to be the one critical use case which no one will feel bad about, and they are going to get really a 100X of their investments out of it.

Megan: Certainly no shortage of exciting developments on the horizon. Thank you both so much for that conversation. That was Bavesh Patel, senior vice president for Go-to-Market at Databricks and Rajan Padmanabhan, unit technology officer for data analytics and AI at Infosys, whom I spoke with from Brighton, England.

That’s it for this episode of Business Lab. I’m your host, Megan Tatum. I’m a contributing editor and host for Insights, the custom publishing division of MIT Technology Review. We were founded in 1899 at the Massachusetts Institute of Technology, and you can find us in print, on the web and at events each year around the world. For more information about us and the show, please check out our website at technologyreview.com.

This show is available wherever you get your podcasts, and if you enjoyed this episode, we hope you’ll take a moment to rate and review us. Business Lab is a production of MIT Technology Review, and this episode was produced by Giro Studios. Thanks for listening.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

The missing step between hype and profit

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

In February, I picked up a flyer at an anti-AI march in London. I can’t say for sure whether or not its writers meant to riff on South Park’s underpants gnomes. But if they did, they nailed it: “Step 1: Grow a digital super mind,” it read. “Step 2: ? Step 3: ?”

Produced by Pause AI, an international activist group that co-organized the protest, it ended with this plea to the reader: “Pause AI until we know what the hell Step 2 is.” 

In the South Park episode “Gnomes,” which first aired in 1998, Kenny, Kyle, Cartman, and Stan discover a community of gnomes that sneak out at night to steal underpants from dressers. Why? The gnomes present their pitch deck. “Phase 1: Collect underpants. Phase 2: ? Phase 3: Profit.”

The gnomes’ business plan has since become one of the greats among internet memes, used to satirize everything from startup strategies to policy proposals. Memelord in chief Elon Musk once invoked it in a talk about how he planned to fund a mission to Mars. Right now, it captures the state of AI. Companies have built the tech (Step 1) and promised transformation (Step 3). How they get there is still a big question mark.

As far as Pause AI is concerned, Step 2 must involve some kind of regulation. But exactly what it will call for and who will enforce it are up for debate.

AI boosters, on the other hand, are convinced that Step 3 is salvation and tend to glaze over the middle bit. They see us racing toward sunny uplands on the back of an “economically transformative technology,” as OpenAI’s chief scientist, Jakub Pachocki, put it to me a few weeks ago. They know where they want to go—more or less: It’s hazy up there and still some way off. But everyone’s taking a different route. Will they all make it? Will anyone?

For every big claim about the future, there is a more sober assessment of how the rubber meets the road—one that quells the hype. Consider two recent studies. One, from Anthropic, predicted what types of jobs are going to be most affected by LLMs. (A takeaway: Managers, architects, and people in the media should prepare for change; groundskeepers, construction workers, and those in hospitality, not so much.) But their predictions are really just guesses, based on what kinds of tasks LLMs seem to be good at rather than how they really perform in the workplace.   

Another study, put out in February by researchers at Mercor, an AI hiring startup, tested several AI agents powered by top-tier models from OpenAI, Anthropic, and Google DeepMind on 480 workplace tasks frequently carried out by human bankers, consultants, and lawyers. Every agent they tested failed to complete most of its duties.   

Why is there such wide disagreement? There are a number of factors. For a start, it’s crucial to consider who is making the claims (and why). Anthropic has skin in the game. What’s more, most of the people telling us that something big is about to happen have reached that conclusion largely on the basis of how fast AI coding tools are getting. But not all tasks can be hacked with coding. Other studies have found that LLMs are bad at making strategic judgment calls, for example.

What’s more, when they’re deployed, the tools aren’t just dropped into a cleanroom. They need to work in places contaminated with people and existing workflows. And sometimes adding AI will make things worse. Sure, maybe those workflows need to be torn up and refashioned around the new technology for it to achieve transformative status, but that will take time (and guts).  

That big hole? It’s right where Step 2 should be. The lack of agreement on exactly what’s about to happen—and how—creates an information vacuum that gets filled by the latest wild claim of the week, evidence be damned. We’re so unmoored from any real understanding of what’s coming and how it will be deployed that a single social media post can (and does) shake markets.

We need fewer guesses and more evidence. But that’s going to require transparency from the model makers, coordination between researchers and businesses, and new ways to evaluate this technology that tell us what really happens when it’s rolled out in the real world.

The tech industry (and with it the world’s economy) rests on the held-out promise that AI really will be transformative. But that is not yet a sure bet. Next time you hear bold claims about the future, remember that most businesses are still figuring out what to do with their underpants.

Elon Musk and Sam Altman are going to court over OpenAI’s future

After a yearslong legal feud, Elon Musk and OpenAI CEO Sam Altman are heading to trial this week in Northern California in a case that could have sweeping consequences. Ahead of OpenAI’s highly anticipated IPO, the court could rule on whether the company is allowed to exist as a for-profit enterprise and might even oust its current executive leadership, including Altman.

Musk is suing OpenAI, alleging that Altman and OpenAI president Greg Brockman deceived him into bankrolling the company in its early days by promising to maintain it as a nonprofit dedicated to developing AI that benefits humanity, only to later restructure the company to operate a for-profit subsidiary. Musk cofounded OpenAI with Altman and others in 2015, but he left in 2018 after a bitter power struggle. 

Musk is seeking as much as $134 billion in damages from OpenAI and Microsoft, one of OpenAI’s biggest financial backers. He is also asking the court to remove Altman and Brockman from their roles and to restore OpenAI as a nonprofit. Musk has asked the court to award any damages to OpenAI’s nonprofit rather than to him personally. 

Nine jurors will deliver an advisory verdict, a non-binding recommendation, to guide the judge in deciding Musk’s claims against Altman. Musk, Altman, and Brockman will take the stand. Former OpenAI chief scientist Ilya Sutskever, former OpenAI CTO Mira Murati, and Microsoft CEO Satya Nadella are also expected to testify. Cringey texts, raw diary entries, and endless scheming behind the founding and growth of OpenAI are expected to come to light.

In an industry enveloped in secrecy, the trial will be a rare opportunity for the public to look behind the curtain and find out what’s going on in the companies creating the most transformative technology ever built. 

What are they fighting about?

When OpenAI was originally founded as a nonprofit, backed by a $38 million donation from Musk, the company vowed to create open-source technology for the public’s benefit, unconstrained by a need to generate financial returns. But over the years, the company began to claim that intensifying competition could make it dangerous to share how it develops its AI models and that a nonprofit structure could not raise enough money to keep building AI. (MIT Technology Review was first to report on OpenAI’s internal conflicts around its mission.)

The court has already found that in 2017 Altman and Brockman wanted to establish a for-profit arm, while Musk proposed merging OpenAI with his electric-car company, Tesla. When Musk threatened to stop funding, Altman and Brockman told him that they were committed to keeping the company a nonprofit. Musk alleges that they pursued plans to pivot to a for-profit without informing him. According to OpenAI, Musk agreed that the company needed a for-profit entity and even wanted to be its CEO. 

But even if Musk proves he was duped by Altman and Brockman, he may not have standing in the first place to sue them for restructuring the company to operate a for-profit subsidiary. Some legal scholars are puzzled over why the judge allowed him to bring this claim. “The idea that Elon Musk can sue because he was a donor or used to be on the board is pretty puzzling,” says Jill Horwitz, a law professor who studies nonprofit law at Northwestern University. “Typically, it’s up to the attorneys general to bring such a claim to enforce the charitable purposes. And that’s already happened.” 

In October 2025, state attorneys general of California, where OpenAI is headquartered, and Delaware, where OpenAI is incorporated, struck a deal with OpenAI to approve its new corporate structure on a series of conditions. For example, a safety and security committee at the nonprofit would review safety-related decisions made by the for-profit subsidiary. Critics of the restructuring, including Musk, AI safety advocates, and civil society groups, have tried to stop it. 

California’s attorney general has declined to join Musk’s lawsuit, saying that the office did not see how his action serves the public interest.

Still, whether the deal holds OpenAI to its nonprofit mission is an open question. “Elon Musk should have to show … what the deficiencies are in what’s been agreed to by OpenAI with the attorneys general,” says Rose Chan Loui, the director of the UCLA School of Law’s philanthropy and nonprofit program. Even with the terms in place, holding OpenAI to them depends on “how much they can enforce it and how much transparency they get into OpenAI’s work.”

More importantly, legal experts say the case is being considered under the wrong body of law. Musk argues that Altman and Brockman breached OpenAI’s charitable trust by creating a closed-source, for-profit subsidiary. As a result, the court has been analyzing the claim under the law of trusts. “But OpenAI is not a trust. OpenAI is a corporation. And so really they should be looking at … the law of charitable nonprofit organizations,” says Chan Loui.

What’s on the line?

Despite all the legal muddiness, the outcome of the trial could upend the AI race. Any one of the remedies that Musk seeks could cripple OpenAI as it races to go public by the end of the year. OpenAI, which is valued at over $850 billion, has described the litigation with Musk as a potential risk to its business. Musk’s rival company xAI, which makes the chatbot Grok, is expected to go public as a part of his rocket company SpaceX as early as June. If Musk prevails, xAI, which in combination with SpaceX is valued at $1.25 trillion, could get a big advantage in the AI race. 

And the trial has helped expose the bitter schism between Musk and the company he once helped to found. An OpenAI spokesperson referred MIT Technology Review to a post on X: “This lawsuit has always been a baseless and jealous bid to derail a competitor.” Although Musk’s lawyers did not immediately respond to a request for comment, he has posted on X that “Scam Altman lies as easily as he breathes.”  

MIT Technology Review will have ongoing coverage of Musk v. Altman until its conclusion. Follow @techreview or @michelletomkim on X for up-to-the-minute reporting. 

Three reasons why DeepSeek’s new model matters

On Friday, Chinese AI firm DeepSeek released a preview of V4, its long-awaited new flagship model. Notably, the model can process much longer prompts than its last generation, thanks to a new design that helps it handle large amounts of text more efficiently. Like DeepSeek’s previous models, V4 is open source, meaning it is available for anyone to download, use, and modify.

V4 marks DeepSeek’s most significant release since R1, the reasoning model it launched in January 2025. R1, which was trained on limited computing resources, stunned the global AI industry with its strong performance and efficiency, turning DeepSeek from a little-known research team into China’s best-known AI company almost overnight. It also helped set off a wave of open-weight model releases from other Chinese AI firms. 

DeepSeek has kept a relatively low profile since then—but earlier this month, it effectively teased V4’s release when it added “expert” and “flash” modes to the online version of its model, prompting speculation that the updates were tied to a bigger upcoming release.

While the company has become a powerful symbol of China’s AI ambitions, its big return to cutting-edge frontier models comes after months of scrutiny—including major personnel departures, delays to previous model launches, and growing scrutiny from both the US and Chinese governments. 

So, will V4 shake the AI field the way R1 did? Almost certainly not, but here are three big reasons why this release matters.

1. It breaks new ground for an open-source model.

As with R1 before it, DeepSeek claims that V4’s performance rivals the best models available at a fraction of the price. This is great news for developers and for companies using the tech, because it means they can access frontier AI capabilities on their own terms, and without worrying about skyrocketing costs.

The new model comes in two versions, both of which are available on DeepSeek’s website and in its app, with API access also open to developers. V4-Pro is a larger model built for coding and complex agent tasks, and V4-Flash is a smaller version designed to be faster and cheaper to run. Both versions offer reasoning modes, in which the model can carefully parse a user’s prompt and show each step as it works through the problem.

For V4-Pro, DeepSeek charges $1.74 per million input tokens and $3.48 per million output tokens, a fraction of the cost of comparable models from OpenAI and Anthropic. V4-Flash is even cheaper, at about $0.14 per million input tokens and about $0.28 per million output tokens, making it one of the cheapest top-tier models available. This would make it a very appealing model to build applications on.

In terms of performance, V4 is, perhaps unsurprisingly, a huge jump from R1—and it seems to be a strong alternative to just about all the latest big AI models. On the major benchmarks, according to results shared by the company, DeepSeek V4-Pro competes with leading closed-source models, matching the performance of Anthropic’s Claude-Opus-4.6, OpenAI’s GPT-5.4, and Google’s Gemini-3.1. And compared to other open-source models, such as Alibaba’s Qwen-3.5 or Z.ai’s GLM-5.1, DeepSeek V4 exceeds them all on coding, math, and STEM problems, making it one of the strongest open-source models ever released. 

DeepSeek also says that V4-Pro now ranks among the strongest open-source models on benchmarks for agentic coding tasks and performs well on other tests that measure ability to carry out multistep problems. Its writing ability and world knowledge also lead the field, according to benchmarking results shared by the company. 

In a technical report released alongside the model, DeepSeek shared results from an internal survey of 85 experienced developers: More than 90% included V4-Pro among their top model choices for coding tasks.

DeepSeek says it has specifically optimized V4 for popular agent frameworks such as Claude Code, OpenClaw, and CodeBuddy.

2. It delivers on a new approach to memory efficiency.

One of the key innovations of V4 is its long context window—the amount of text the model can process at once. Both versions can handle 1 million tokens, which is large enough to fit all three volumes of The Lord of the Rings and The Hobbit combined. The company says this context window size is now the default across all DeepSeek services and it matches what is offered by cutting-edge versions of models like Gemini and Claude. 

But it’s important to know not just that DeepSeek has made this leap, but how it did so. V4 makes significant architectural changes to the company’s former models—especially in the attention mechanism, which is the feature of AI models that helps them understand each part of a prompt in relation to the rest. As the prompt text gets longer, these comparisons become much more costly, making attention one of the main bottlenecks for long-context models.

DeepSeek’s innovation was to make the model more selective about what it pays attention to. Instead of treating all earlier text as equally important, V4 compresses older information and focuses on the parts most likely to matter in the present moment, while still keeping nearby text in full so it does not miss important details. 

DeepSeek says this sharply reduces the cost of using long context. In a 1-million-token context, V4-Pro uses only 27% of the computing power required by its previous model, V3.2, while cutting memory use to 10%. The reduction in V4-Flash is even larger, using just 10% of the computing power and 7% of the memory. In practice, this could make it cheaper to build tools that need to work across huge amounts of material, such as an AI coding assistant that can read an entire codebase or a research agent that can analyze a long archive of documents without constantly forgetting what came before.

DeepSeek’s interest in long context windows didn’t start with V4. Over the past year and a half, the company has quietly published a series of papers on how AI models “remember” information, experimenting with compression and mathematical techniques to extend what AI models could realistically handle.

3. It marks the first steps on the hard road away from Nvidia.

V4 is DeepSeek’s first model optimized for domestic Chinese chips, such as Huawei’s Ascend—a move that has turned the launch into something of a test of whether China’s homegrown AI industry can begin to loosen its dependence on US chip giant Nvidia. 

This was largely expected, since The Information reported earlier this month that DeepSeek did not give American chipmakers like Nvidia and AMD early access to V4, though prerelease access is common to allow chipmakers to optimize support of the new model ahead of a launch. Instead, the company reportedly gave early access only to Chinese chipmakers. 

On Friday, Huawei said its Ascend supernode products, based on the Ascend 950 series, would support DeepSeek V4. This means that companies and individuals who want to run their own modified version of Deepseek V4 will be able to use Huawei chips easily.

Reuters previously reported that Chinese government officials recommended that DeepSeek integrate Huawei chips in its training process. And this pressure fits a broader pattern in China’s industrial policy: Strategic sectors are often pushed, and sometimes effectively required, to align with national self-reliance goals. But there’s a particular urgency when it comes to AI. Since 2022, US export controls have cut Chinese firms off from Nvidia’s most powerful chips, and they later also restricted access to downgraded China-market versions. Beijing’s response has been to accelerate the push for a domestic AI stack, from chips to software frameworks to data centers.

Chinese authorities have reportedly been pushing data centers and public computing projects to use more domestic chips, including through reported bans on foreign-made chips, sourcing quotas, and requirements to pair Nvidia chips with Chinese alternatives from companies such as Huawei and Cambricon. 

Still, replacing Nvidia is not as simple as swapping one chip for another. Nvidia’s advantage lies not only in its chips, but in the software ecosystem developers have spent years building around them. Moving to Huawei’s Ascend chips means adapting model code, rebuilding tools, and proving that systems built around those chips are stable enough for serious use.

To be clear, DeepSeek does not appear to have fully moved beyond Nvidia. The company’s technical report reveals that it is using Chinese chips to run the model for inference, or when someone asks the model to complete a task. But Liu Zhiyuan, a computer science professor at Tsinghua University, told MIT Technology Review that DeepSeek appears to have adapted only part of V4’s training process for Chinese chips. The report does not say whether some key long-context features were adapted to domestic chips, so Liu says V4 may still have been trained mainly on Nvidia chips. Multiple sources who spoke on the condition of anonymity, due to political sensitivity around these issues, told MIT Technology Review that Chinese chips still don’t perform as well as Nvidia chips but are better suited for inference than training.

DeepSeek is also tying the future costs of V4 to this hardware shift. The company says V4-Pro prices could fall significantly after Huawei’s Ascend 950 supernodes begin shipping at scale in the second half of this year. 

If that works, V4 could be an early sign that China is successfully building a parallel AI infrastructure.

AI needs a strong data fabric to deliver business value

Artificial intelligence is moving quickly in the enterprise, from experimentation to everyday use. Organizations are deploying copilots, agents, and predictive systems across finance, supply chains, human resources, and customer operations. By the end of 2025, half of companies used AI in at least three business functions, according to a recent survey.

But as AI becomes embedded in core workflows, business leaders are discovering that the biggest obstacle is not model performance or computing power but the quality and the context of the data on which those systems rely. AI essentially introduces a new requirement: Systems must not only access data — they must understand the business context behind it. 

Without that context, AI can generate answers quickly but still make the wrong decision, says Irfan Khan, president and chief product officer of SAP Data & Analytics. 

“AI is incredibly good at producing results,” he says. “It moves fast, but without context it can’t exercise good judgment, and good judgment is what creates a return on investment for the business. Speed without judgment doesn’t help. It can actually hurt us.”

In the emerging era of autonomous systems and intelligent applications, that context layer is becoming essential. To provide context, companies need a well-designed data fabric that does more than just integrate data, Khan says. The right data fabric allows organizations to scale AI safely, coordinate decisions across systems and agents, and ensure that automation reflects real business priorities rather than making decisions in isolation. 

Recognizing this, many organizations are rethinking their data architecture. Instead of simply moving data into a single repository, they are looking for ways to connect information across applications, clouds, and operational systems while preserving the semantics that describe how the business works. That shift is driving growing interest in data fabric as a foundation for AI infrastructure.

Losing context is a critical AI problem

Traditional data strategies have largely focused on aggregation. Over the past two decades, organizations have invested heavily in extracting information from operational systems and loading it into centralized warehouses, lakes, and dashboards. This approach makes it easier to run reports, monitor performance, and generate insights across the business, but in the process, much of the meaning attached to that data — how it relates to policies, processes, and real-world decisions — is lost. 

Take two companies using AI to manage supply-chain disruptions. If one uses raw signals such as inventory levels, lead times, and supply scores, while the other adds context across business processes, policies, and metadata, both systems will rapidly analyze the data but likely come up with different conclusions. 

Information such as which customers are strategic accounts, what tradeoffs are acceptable during shortages, and the status of extended supply chains will allow one AI system to make strategic decisions, while the other will not have the proper context, Khan says. 

“Both systems move very quickly, but only one moves in the right direction,” he says. “This is the context premium and the advantage you gain when your data foundation preserves context across processes, policies and data by design.”

In the past, companies implicitly managed a lack of context because human experts provided the missing information, but with AI, there is a shortfall and that creates serious limitations. AI systems do not just display information; they act on it. If a system does not explain why data matters, an AI model may optimize for the wrong outcome. Inventory numbers, payment histories, or demand signals might be accurate, but they do not necessarily reveal which customers must be prioritized, which contractual obligations apply, or which products are strategically important. As a result, the system can produce answers that are technically correct but operationally flawed.

This realization is changing how companies think about AI readiness. Most acknowledge that they do not have the mature data processes and infrastructure in place to trust their data and their AI systems. Only one in five organizations consider their approach to data to be highly mature, and only 9% feel fully prepared to integrate and interoperate with their data systems.

Don’t consolidate, integrate

The emerging solution is a data fabric: An abstraction layer that spans infrastructure, architecture, and logical organization. For agentic AI, the fabric becomes the primary interface, allowing agents to interact with business knowledge rather than raw storage systems. Knowledge graphs play a central role, enabling agents to query enterprise data using natural language and business logic.

The value of the data fabric relies on three components: Intelligent compute to provide speed, a knowledge pool to provide business understanding and context, and agents to provide autonomous action are grounded in that understanding. What makes this powerful is how these capabilities work together, says Khan. 

The technology provides the architecture — a foundation that makes agent-to-agent communication and coordination possible. The process will define how businesses and IT share ownership, and establish governance and a culture in which people trust enough to adopt it. Now all three things must work together for a business data fabric to truly be successful.

“It empowers confident, consistent decisions, and when these elements all come together, AI just doesn’t analyze and interpret the data — it drives smarter, faster decisions that really create business impact,” he says. “This is the promise of a thoughtfully designed business data fabric, where every part reinforces the other, and every insight is grounded in trust and clarity.”

Technically, building a data-fabric layer requires several capabilities. Data must be accessible across multiple environments through federation rather than forced consolidation. A semantic or knowledge layer is needed to harmonize meaning across systems, often supported by knowledge graphs and catalog-driven metadata. Governance and policy enforcement must also operate across the fabric so that AI systems can access data securely and consistently.

Together, these elements create a foundation where AI interacts with business knowledge instead of raw storage systems — an essential step for moving from experimentation to real enterprise automation.

Beyond data isolation and dashboards

In the emerging era of agentic AI, the responsibility for monitoring, analyzing, and making decisions based on data increasingly shifts to software. AI agents can monitor events, trigger workflows, and make decisions in real time, often without direct human intervention. That speed creates new opportunities, but it also raises the stakes. When multiple agents operate across finance, supply chain, procurement, or customer operations, they must be guided by the same understanding of business priorities.

Without a common knowledge layer connecting disparate data together, coordination between systems quickly breaks down. One system might optimize for margin, another for liquidity, and another for compliance, each working from a different slice of data. 

Importantly, most enterprises already possess much of the knowledge needed to make this work, says Khan. Years of operational data, master data, workflows, and policy logic already exist across business applications — companies just need to make it accessible. Companies that deploy data fabrics gain greater trust in their data, with more than two thirds of enterprises seeing improved data accessibility, data visibility, and exerting more control over their data. 

“The opportunity isn’t just inventing context from scratch, it’s activating and connecting the context across your business that already exists,” he continues, adding that a data fabric is the “architecture that ensures data semantics, business processes and policies are connected as a unified system across all the clouds.”

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Chinese tech workers are starting to train their AI doubles–and pushing back

Tech workers in China are being instructed by their bosses to train AI agents to replace them—and it’s prompting a wave of soul-searching among otherwise enthusiastic early adopters. 

Earlier this month a GitHub project called Colleague Skill, which claimed workers could use it to “distill” their colleagues’ skills and personality traits and replicate them with an AI agent, went viral on Chinese social media. Though the project was created as a spoof, it struck a nerve among tech workers, a number of whom told MIT Technology Review that their bosses are encouraging them to document their workflows in order to automate specific tasks and processes using AI agent tools like OpenClaw or Claude Code. 

To set up Colleague Skill, a user names the coworker whose tasks they want to replicate and adds basic profile details. The tool then automatically imports chat history and files from Lark and DingTalk, both popular workplace apps in China, and generates reusable manuals describing that coworker’s duties—and even their unique quirks—for an AI agent to replicate. 

Colleague Skill was created by Tianyi Zhou, who works as an engineer at the Shanghai Artificial Intelligence Laboratory. Earlier this week he told Chinese outlet Southern Metropolis Daily that the project was started as a stunt, prompted by AI-related layoffs and by the growing tendency of companies to ask employees to automate themselves. He didn’t respond to requests for further comment.

Internet users have found humor in the idea behind the tool, joking about automating their coworkers before themselves. However, Colleague Skill’s virality has sparked a lot of debate about workers’ dignity and individuality in the age of AI.

After seeing Colleague Skill on social media, Amber Li, 27, a tech worker in Shanghai, used it to recreate a former coworker as a personal experiment. Within minutes, the tool created a file detailing how that person did their job. “It is surprisingly good,” Li says. “It even captures the person’s little quirks, like how they react and their punctuation habits.” With this skill, Li can use an AI agent as a new “coworker” that helps debug her code and replies instantly. It felt uncanny and uncomfortable, Li says. 

Even so,  replacing coworkers with agents could become a norm. Since OpenClaw became a national craze, bosses in China have been pushing tech workers to experiment with agents. 

Although AI agents can take control of your computer, read and summarize news, reply to emails, and book restaurant reservations for you, tech workers on the ground say their utility has so far proven to be limited in business contexts. Asking employees to make manuals describing the minutiae of their day-to-day jobs the way Colleague Skill does is one way to help bridge that gap. 

Hancheng Cao, an assistant professor at Emory University who studies AI and work, believes that companies have good reasons to push employees to create work blueprints like these, beyond simply following a trend. “Firms gain not only internal experience with the tools, but also richer data on employee know-how, workflows, and decision patterns. That helps companies see which parts of work can be standardized or codified into systems, and which still depend on human judgment,” he says.

To employees, though, making agents or even blueprints for them can feel strange and alienating. One software engineer, who spoke with MIT Technology Review anonymously because of concerns about their job security, trained an AI (not Colleague Skill) on their workflow and found that the process felt reductive—as if their work had been flattened into modules in a way that made them easier to replace. On social media, workers have turned to bleak humor to express similar feelings. In one comment on Rednote, a user wrote that “a cold farewell can be turned into warm tokens,” quipping that if they use Colleague Skill to distill their coworkers into tasks first, they themselves might survive a little longer.

The push for creating agents has also spurred clever countermeasures. Irritated by the idea of reducing a person to a skill, Koki Xu, 26 an AI product manager in Beijing, published an “anti-distillation” skill on GitHub on April 4. The tool, which took Xu about an hour to build, is designed to sabotage the process of creating workflows for agents. Users can choose between light, medium, and heavy sabotage modes depending on how closely their boss is observing the process, and the agent rewrites the material into generic, non-actionable language that would produce a less useful AI stand-in. A video Xu posted about the project went viral, drawing more than 5 million likes across platforms.

Xu told MIT Technology Review that she has been following the Colleague Skill trend from the start and that it has made her think about alienation, disempowerment, and broader implications for labor. “I originally wanted to write an op-ed, but decided it would be more useful to make something that pushes back against it,” she says.

Xu, who has undergraduate and master’s degrees in law, said the trend also raises legal questions. While a company may be able to argue that work chat histories and materials created on a work laptop are corporate property, a skill like this can also capture elements of personality, tone, and judgment, making ownership much less clear. She said she hopes Colleague Skill prompts more discussion about how to protect workers’ dignity and identity in the age of AI. “I believe it’s important to keep up with these trends so we (employees) can participate in shaping how they are used,” she says. Xu herself is an avid AI adopter, with seven OpenClaw agents set up across her personal and work devices.

Li, the tech worker in Shanghai, says her company has not yet found a way to replace actual workers with AI tools, largely because they remain unreliable and require constant supervision. “I don’t feel like my job is immediately at risk,” she says. “But I do feel that my value is being cheapened, and I don’t know what to do about it.”

How robots learn: A brief, contemporary history

Roboticists used to dream big but build small. They’d hope to match or exceed the extraordinary complexity of the human body, and then they’d spend their career refining robotic arms for auto plants. Aim for C-3P0; end up with the Roomba. 

The real ambition for many of these researchers was the robot of science fiction—one that could move through the world, adapt to different environments, and interact safely and helpfully with people. For the socially minded, such a machine could help those with mobility issues, ease loneliness, or do work too dangerous for humans. For the more financially inclined, it would mean a bottomless source of wage-free labor. Either way, a long history of failure left most of Silicon Valley hesitant to bet on helpful robots.

That has changed. The machines are yet unbuilt, but the money is flowing: Companies and investors put $6.1 billion into humanoid robots in 2025 alone, four times what was invested in 2024. 

What happened? A revolution in how machines have learned to interact with the world. 

Imagine you’d like a pair of robot arms installed in your home purely to do one thing: fold clothes. How would it learn to do that? You could start by writing rules. Check the fabric to figure out how much deformation it can tolerate before tearing. Identify a shirt’s collar. Move the gripper to the left sleeve, lift it, and fold it inward by exactly this distance. Repeat for the right sleeve. If the shirt is rotated, turn the plan accordingly. If the sleeve is twisted, correct it. Very quickly the number of rules explodes, but a complete accounting of them could produce reliable results. This was the original craft of robotics: anticipating every possibility and encoding it in advance.

Around 2015, the cutting edge started to do things differently: Build a digital simulation of the robotic arms and the clothes, and give the program a reward signal every time it folds successfully and a ding every time it fails. This way, it gets better by trying all sorts of techniques through trial and error, with millions of iterations—the same way AI got good at playing games.

The arrival of ChatGPT in 2022 catalyzed the current boom. Trained on vast amounts of text, large language models work not through trial and error but by learning to predict what word should come next in a sentence. Similar models adapted to robotics were soon able to absorb pictures, sensor readings, and the position of a robot’s joints and predict the next action the machine should take, issuing dozens of motor commands every second.

This conceptual shift—to reliance on AI models that ingest large amounts of data—seems to work whether that helpful robot is supposed to talk to people, move through an environment, or even do complicated tasks. And it was paired with other ideas about how to accomplish this new way of learning, like deploying robots even if they aren’t yet perfect so they can learn from the environment they’re meant to work in. Today, Silicon Valley roboticists are dreaming big again. Here’s how that happened. 


Jibo

A movable social robot carried out conversations long before the age of LLMs.

An MIT robotics researcher named Cynthia Breazeal introduced an armless, legless, faceless robot called Jibo to the world in 2014. It looked, in fact, like a lamp. Breazeal’s aim was to create a social robot for families, and the idea pulled in $3.7 million in a crowdsourced funding campaign. Early preorders cost $749.

The early Jibo could introduce itself and dance to entertain kids, but that was about it. The vision was always for it to become a sort of embodied assistant that could handle everything from scheduling and emails to telling stories. It earned a number of devoted users, but ultimately the company shut down in 2019.

A crowdfunding campaign started in 2014 and drew 4,800 Jibo preorders.
COURTESY OF MIT MEDIA LAB

In retrospect, one thing that Jibo really needed was better language capabilities. It was competing against Apple’s Siri and Amazon’s Alexa, and all those technologies at the time relied on heavy scripting. In broad terms, when you spoke to them, software would translate your speech into text, analyze what you wanted, and create a response pulled from preapproved snippets. Those snippets could be charming, but they were also repetitive and simply boring—downright robotic. That was especially a challenge for a robot that was supposed to be social and family oriented. 

What has happened since, of course, is a revolution in how machines can generate language. Voice mode from any leading AI provider is now engaging and impressive, and multiple hardware startups are trying (and failing) to build products that take advantage of it. 

But that comes with a new risk: While scripted conversations can’t really go off the rails, ones generated by AI certainly can. Some popular AI toys have, for example, talked to kids about how to find matches and knives. 


Dactyl

A robot hand trained with simulations tries to model the unpredictability and variation of the real world.

By 2018, every leading robotics lab was trying to scrap the old scripted rules and train robots through trial and error. OpenAI tried to train its robotic hand, Dactyl, virtually—with digital models of the hand and of the palm-size cubes Dactyl was supposed to manipulate. The cubes had letters and numbers on their faces; the model might set a task like “Rotate the cube so the red side with the letter O faces upward.”

Here’s the problem: A robotic hand might get really good at doing this in its simulated world, but when you take that program and ask it to work on a real version in the real world, the slight differences between the two can cause things to go awry. Colors might be slightly different, or the deformable rubber in the robot’s fingertips could turn out to be stretchier than it was in simulation.

a Dactyl robot hand holds a Rubix cube
Dactyl, part of OpenAI’s first attempt at robotics, was trained in simulation to solve Rubik’s Cubes.
COURTESY OF OPENAI

The solution is called domain randomization. You essentially create millions of simulated worlds that all vary slightly and randomly from one another. In each one the friction might be less, or the lighting more harsh, or the colors darkened. Exposure to enough of this variation means the robots will be better able to manipulate the cube in the real world. The approach worked on Dactyl, and one year later it was able to use the same core techniques to do something harder: solving Rubik’s Cubes (though it worked only 60% of the time, and just 20% when the scrambles were particularly hard). 

Still, the limits of simulation mean that this technique plays a far smaller role today than it did in 2018. OpenAI shuttered its robotics effort in 2021 but has recently started the division up again—reportedly focusing on humanoids. 


RT-2

Training on images from across the internet helps robots translate language into action.

Around 2022, Google’s robotics team was up to some strange things. It spent 17 months handing people robot controllers and filming them doing everything from picking up bags of chips to opening jars. The team ended up cataloguing 700 different tasks.

The point was to build and test one of the first large-scale foundation models for robotics. As with large language models, the idea was to input lots of text, tokenize it into a format an algorithm could work with, and then generate an output. Google’s RT-1 received input about what the robot was looking at and how the many parts of the robotic arm were positioned; then it took an instruction and translated it into motor commands to move the robot. When it had seen tasks before, it carried out 97% of them successfully; it succeeded at 76% of the instructions it hadn’t seen before. 

a robot at a table of small toys
The model RT-2, for Robotic Transformer 2, incorporated internet data to help robots process what they were seeing.
COURTESY OF GOOGLE DEEPMIND

The second iteration, RT-2, came out the following year and went even further. Instead of training on data specific to robotics, it went broad: It trained on more general images from across the internet, like the vision-language models lots of researchers were working on at the time. That allowed the robot to interpret where certain objects were in the scene.

“All these other things were unlocked,” says Kanishka Rao, a roboticist at Google DeepMind who led work on both iterations. “We could do things now like ‘Put the Coke can near the picture of Taylor Swift.’” 

In 2025, Google DeepMind further fused the worlds of large language models and robotics, releasing a Gemini Robotics model with improved ability to understand commands in natural language. 


RFM-1

An AI model that allows robotic arms to act like coworkers.

In 2017, before OpenAI shuttered its first robotics team, a group of its engineers spun out a project called Covariant, aiming to build not sci-fi humanoids but the most pragmatic of all robots: an arm that could pick up and move things in warehouses. After building a system based on foundation models similar to Google’s, Covariant deployed this platform in warehouses like those operated by Crate & Barrel and treated it as a data collection pipeline. 

By 2024, Covariant had released a robotics model, RFM-1, that you could interact with like a coworker. If you showed an arm many sleeves of tennis balls, for example, you could then instruct it to move each sleeve to a separate area. And the robot could respond—perhaps predicting that it wouldn’t be able to get a good grip on the item and then asking for advice on which particular suction cups it should use. 

This sort of thing had been done in experiments, but Covariant was launching it at significant scale. The company now had cameras and data collection machines in every customer location, feeding back even more data for the model to train on.

a warehouse robot arm lifts object with many suckers to place in a bin
A Covariant robot demonstrates “induction”—the common warehouse task of placing objects on sorters or conveyors.
COURTESY OF COVARIANT

It wasn’t perfect. In a demo in March 2024 with an array of kitchen items, the robot struggled when it was asked to “return the banana” to its original location. It picked up a sponge, then an apple, then a host of other items before it finally accomplished the task. 

It “doesn’t understand the new concept” of retracing its steps, cofounder Peter Chen told me at the time. “But it’s a good example—it might not work well yet in the places where you don’t have good training data.”

Chen and fellow founder Pieter Abbeel were soon hired by Amazon, which is currently licensing Covariant’s robotics model (Amazon did not respond to questions about how it’s being used, but the company runs an estimated 1,300 warehouses in the US alone). 


Digit

Companies are putting this humanoid to the test in real-world settings.

The new investment dollars flowing to robotics startups are aimed largely at robots shaped not like lamps or arms but like people. Humanoid robots are supposed to be able to seamlessly enter the spaces and jobs where humans currently work, avoiding the need to retool assembly lines to accommodate new shapes such as giant arms. 

It’s easier said than done. In the rare cases where humanoids appear in real warehouses, they’re often confined to test zones and pilot programs. 

Digit humanoid robot putting a plastic bin on a conveyor belt
Amazon and other companies are using Digit to help move shipping totes.
COURTESY OF AGILITY ROBOTICS

That said, Agility’s humanoid Digit appears to be doing some real work. The design—with exposed joints and a distinctly unhuman head—is driven more by function than by sci-fi aesthetics. Amazon, Toyota, and GXO (a logistics giant with customers like Apple and Nike) have all deployed it—making it one of the first examples of a humanoid robot that companies see as providing actual cost savings rather than novelty. Their Digits spend their days picking up, moving, and stacking shipping totes.

The current Digit is still a long way from the humanlike helper Silicon Valley is betting on, though. It can lift only 35 pounds, for example—and every time Agility makes Digit stronger, its battery gets heavier and it has to recharge more often. And standards organizations say humanoids need stricter safety rules than most industrial robots, because they’re designed to be mobile and spend time in proximity to people. 

But Digit shows that this revolution in robot training isn’t converging on a single method. Agility relies on simulation techniques like those OpenAI used to train its hand, and the company has worked with Google’s Gemini models to help its robots adapt to new environments. That’s where more than a decade of experiments have gotten the industry: Now it’s building big.

Why having “humans in the loop” in an AI war is an illusion

The availability of artificial intelligence for use in warfare is at the center of a legal battle between Anthropic and the Pentagon. This debate has become urgent, with AI playing a bigger role than ever before in the current conflict with Iran. AI is no longer just helping humans analyze intelligence. It is now an active player—generating targets in real time, controlling and coordinating missile interceptions, and guiding lethal swarms of autonomous drones.

Most of the public conversation regarding the use of AI-driven autonomous lethal weapons centers on how much humans should remain “in the loop.” Under the Pentagon’s current guidelines, human oversight supposedly provides accountability, context, and nuance while reducing the risk of hacking.

AI systems are opaque “black boxes”

But the debate over “humans in the loop” is a comforting distraction. The immediate danger is not that machines will act without human oversight; it is that human overseers have no idea what the machines are actually “thinking.” The Pentagon’s guidelines are fundamentally flawed because they rest on the dangerous assumption that humans understand how AI systems work.

Having studied intentions in the human brain for decades and in AI systems more recently, I can attest that state-of-the-art AI systems are essentially “black boxes.” We know the inputs and outputs, but the artificial “brain” processing them remains opaque. Even their creators cannot fully interpret them or understand how they work. And when AIs do provide reasons, they are not always trustworthy.

The illusion of human oversight in autonomous systems

In the debate over human oversight, a fundamental question is going unasked: Can we understand what an AI system intends to do before it acts?

Imagine an autonomous drone tasked with destroying an enemy munitions factory. The automated command and control system determines that the optimal target is a munitions storage building. It reports a 92% probability of mission success because secondary explosions of the munitions in the building will thoroughly destroy the facility. A human operator reviews the legitimate military objective, sees the high success rate, and approves the strike.

But what the operator does not know is that the AI system’s calculation included a hidden factor: Beyond devastating the munitions factory, the secondary explosions would also severely damage a nearby children’s hospital. The emergency response would then focus on the hospital, ensuring the factory burns down. To the AI, maximizing disruption in this way meets its given objective. But to a human, it is potentially committing a war crime by violating the rules regarding civilian life. 

Keeping a human in the loop may not provide the safeguard people imagine, because the human cannot know the AI’s intention before it acts. Advanced AI systems do not simply execute instructions; they interpret them. If operators fail to define their objectives carefully enough—a highly likely scenario in high-pressure situations—the “black box” system could be doing exactly what it was told and still not acting as humans intended.

This “intention gap” between AI systems and human operators is precisely why we hesitate to deploy frontier black-box AI in civilian health care or air traffic control, and why its integration into the workplace remains fraught—yet we are rushing to deploy it on the battlefield.

To make matters worse, if one side in a conflict deploys fully autonomous weapons, which operate at machine speed and scale, the pressure to remain competitive would push the other side to rely on such weapons too. This means the use of increasingly autonomous—and opaque—AI decision-making in war is only likely to grow.

The solution: Advance the science of AI intentions

The science of AI must comprise both building highly capable AI technology and understanding how this technology works. Huge advances have been made in developing and building more capable models, driven by record investments—forecast by Gartner to grow to around $2.5 trillion in 2026 alone. In contrast, the investment in understanding how the technology works has been minuscule.

We need a massive paradigm shift. Engineers are building increasingly capable systems. But understanding how these systems work is not just an engineering problem—it requires an interdisciplinary effort. We must build the tools to characterize, measure, and intervene in the intentions of AI agents before they act. We need to map the internal pathways of the neural networks that drive these agents so that we can build a true causal understanding of their decision-making, moving beyond merely observing inputs and outputs. 

A promising way forward is to combine techniques from mechanistic interpretability (breaking neural networks down into human-understandable components) with insights, tools, and models from the neuroscience of intentions. Another idea is to develop transparent, interpretable “auditor” AIs designed to monitor the behavior and emergent goals of more capable black-box systems in real time.  

Developing a better understanding of how AI functions will enable us to rely on AI systems for mission-critical applications. It will also make it easier to build more efficient, more capable, and safer systems.

Colleagues and I are exploring how ideas from neuroscience, cognitive science, and philosophy—fields that study how intentions arise in human decision-making—might help us understand the intentions of artificial systems. We must prioritize these kinds of interdisciplinary efforts, including collaborations between academia, government, and industry.

However, we need more than just academic exploration. The tech industry—and the philanthropists funding AI alignment, which strives to encode human values and goals into these models—must direct substantial investments toward interdisciplinary interpretability research. Furthermore, as the Pentagon pursues increasingly autonomous systems, Congress must mandate rigorous testing of AI systems’ intentions, not just their performance.

Until we achieve that, human oversight over AI may be more illusion than safeguard.

Uri Maoz is a cognitive and computational neuroscientist specializing in how the brain transforms intentions into actions. A professor at Chapman University with appointments at UCLA and Caltech, he leads an interdisciplinary initiative focused on understanding and measuring intentions in artificial intelligence systems (ai-intentions.org).

Making AI operational in constrained public sector environments

The AI boom has hit across industries, and public sector organizations are facing pressure to accelerate adoption. At the same time, government institutions face distinct constraints around security, governance, and operations that set them apart from their business counterparts. For this reason, purpose-built small language models (SLMs) offer a promising path to operationalize AI in these environments.  

A Capgemini study found that 79 percent of public sector executives globally are wary about AI’s data security, an understandable figure given the heightened sensitivity of government data and the legal obligations surrounding its use. As Han Xiao, vice president of AI at Elastic, says, “Government agencies must be very restricted about what kind of data they send to the network. This sets a lot of boundaries on how they think about and manage their data.”

The fundamental need for control over sensitive information is one of many factors complicating AI deployment, particularly when compared against the private sector’s standard operational assumptions.

Unique operational challenges

When private-sector entities expand AI, they typically assume certain conditions will be in place, including continuous connectivity to the cloud, reliance on centralized infrastructure, acceptance of incomplete model transparency, and limited restrictions on data movement. For many state institutions, however, accepting these conditions could be anything from dangerous to impossible. 

Government agencies must ensure that their data stays under their control, that information can be checked and verified, and that operational disruptions are kept to an absolute minimum. At the same time, they often have to run their systems in environments where internet connectivity is limited, unreliable, or unavailable. These complexities prevent many promising public sector AI pilots from moving beyond experimentation. “Many people undervalue the operating challenge of AI,” Xiao says. “The public sector needs AI to perform reliably on all kinds of data, and then to be able to grow without breaking. Continuity of operations is often underestimated.” An Elastic survey of public sector leaders found that 65 percent struggle to use data continuously in real time and at scale. 

Infrastructure constraints compound the problem. Government organizations may also struggle to obtain the graphics processing units (GPUs) used to train and access complex AI models. As Xiao points out, “Government doesn’t often purchase GPUs, unlike the private sector—they’re not used to managing GPU infrastructure. So accessing a GPU to run the model is a bottleneck for much of the public sector.” 

A smaller, more practical model

The many nonnegotiable requirements in the public sector make large language models (LLMs) untenable. But SLMs can be housed locally, offering greater security and control. SLMs are specialized AI models that typically use billions rather than hundreds of billions of parameters, making them far less computationally demanding than the largest LLMs.

The public sector does not need to build ever-larger models housed in offsite, centralized locations. An empirical study found that SLMs performed as well or better than LLMs. SLMs allow sensitive information to be used effectively and efficiently while avoiding the operational complexity of maintaining large models. Xiao puts it this way: “It is easy to use ChatGPT to do proofreading. It’s very difficult to run your own large language models just as smoothly in an environment with no network access.” 

SLMs are purpose-built for the needs of the department or agency that will use them. The data is stored securely outside the model, and is only accessed when queried. Carefully engineered prompts ensure that only the most relevant information is retrieved, providing more accurate responses. Using methods such as smart retrieval, vector search, and verifiable source grounding, AI systems can be built that cater to public sector needs. 

Thus, the next phase of AI adoption in the public sector may be to bring the AI tool to the data, rather than sending the data out into the cloud. Gartner predicts that by 2027, small, specialized AI models will be used three times more than LLMs.

Superior search capabilities

“When people in the public sector hear AI, they probably think about ChatGPT. But we can be much more ambitious,” says Xiao. “AI can revolutionize how the government searches and manages the large amounts of data they have.”

Looking beyond chatbots reveals one of AI’s most immediate opportunities: dramatically improved search. Like many organizations, the public sector has mountains of unstructured data—including technical reports, procurement documents, minutes, and invoices. Today’s AI, however, can deliver results sourced from mixed media, like readable PDFs, scans, images, spreadsheets, and recordings, and in multiple languages. All of this can be indexed by SLM-powered systems to provide tailored responses and to draft complex texts in any language, while ensuring outputs are legally compliant. “The public sector has a lot of data, and they don’t always know how to use this data. They don’t know what the possibilities are,” says Xiao.

Even more powerful, AI can help government employees interpret the data they access. “Today’s AI can provide you with a completely new view of how to harness that data,” says Xiao. A well-trained SLM can interpret legal norms, extract insights from public consultations, support data-driven executive decision-making, and improve public access to services and administrative information. This can contribute to dramatic improvements in how the public sector conducts its operations.

The small-language promise

Focusing on SLMs shifts the conversation from how comprehensive the model can be to how efficient it is. LLMs incur significant performance and computational costs and require specialized hardware that many public entities cannot afford. Despite requiring some capital expenses, SLMs are less resource-intensive than LLMs, so they tend to be cheaper and reduce environmental impact. 

Public sector agencies often face stringent audit requirements, and SLM algorithms can be documented and certified as transparent. Some countries, particularly in Europe, also have privacy regulations such as GDPR that SLMs can be designed to meet.

Tailored training data produces more targeted results, reducing errors, bias, and hallucinations that AI is prone to. As Xiao puts it, “Large language models generate text based on what they were trained on, so there is a cut-off date when they were trained. If you ask about anything after that, it will hallucinate. We can solve this by forcing the model to work from verified sources.”

Risks are also minimized by keeping data on local servers, or even on a specific device. This isn’t about isolation but about strategic autonomy to enable trust, resilience, and relevance.

By prioritizing task-specific models designed for environments that process data locally, and by continuously monitoring performance and impact, public sector organizations can build lasting AI capabilities that support real-world decisions. “Do not start with a chatbot; start with search,” Xiao advises. “Much of what we think of as AI intelligence is really about finding the right information.”

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Treating enterprise AI as an operating layer

There’s a fault line running through enterprise AI, and it’s not the one getting the most attention. The public conversation still tracks foundation models and benchmarks—GPT versus Gemini, reasoning scores, and marginal capability gains. But in practice, the more durable advantage is structural: who owns the operating layer where intelligence is applied, governed, and improved. One model treats AI as an on-demand utility; the other embeds it as an operating layer—the combination of operation software, data capture, feedback loops and governance that sits between models and real work—that compounds with use.

Model providers like OpenAI and Anthropic sell intelligence as a service: you have a problem, you call an API, you get an answer. That intelligence is general-purpose, largely stateless, and only loosely connected to the day-to-day operations where decisions are made. It’s highly capable and increasingly interchangeable. The distinction that matters is whether intelligence resets on every prompt or accumulates over time.

Incumbent organizations, by contrast, can treat AI as an operating layer: instrumentation across operations, feedback loops from human decisions, and governance that turns individual tasks into reusable policy. In that setup, every exception, correction, and approval becomes a chance to learn—and intelligence can improve as the platform absorbs more of the organization’s work. The organizations most likely to shape the enterprise AI era are those that can embed intelligence directly into operational platforms and instrument those platforms so work generates usable signals.

The prevailing narrative says nimble startups will out-innovate incumbents by building AI-native from scratch. If AI is primarily a model problem, that story holds. But in many enterprise domains, AI is a systems problem—integrations, permissions, evaluation, and change management—where advantage accrues to whomever already sits inside high-volume, high-stakes operations and converts that position into learning and automation.

The inversion: AI executes, humans adjudicate

Traditional services organizations are built on a simple architecture: humans use software to do expert work. Operators log into systems, navigate operations, make decisions, and process cases. Technology is the medium. Human judgment is the product.

An AI-native platform inverts this. It ingests a problem, applies accumulated domain knowledge, executes autonomously what it can with high confidence, and routes targeted sub-tasks to human experts when the situation demands judgment that the system can’t yet reliably provide.

But inverting human-AI interaction isn’t just a UI redesign—it requires raw material. It’s only possible when the platform is built on a foundation of domain expertise, behavioral data, and operational knowledge accumulated over years.

The three compounding assets incumbents already own

AI-native startups begin with a clean architectural slate and can move quickly. What they can’t easily manufacture is the raw material that makes domain AI defensible at scale:

  • Proprietary operational data
  • A large workforce of domain experts whose day-to-day decisions generate training signals
  • Accumulated tacit knowledge about how complex work actually gets done

Services companies already have all three. But these ingredients aren’t moats on their own. They become an advantage only when a company can systematically convert messy operations into AI-ready signals and institutional knowledge—then feed the results back into operations so the system keeps improving.

Codifying expertise into reusable signals

In most services organizations, expertise is tacit and perishable. The best operators know things they cannot easily articulate: heuristics developed over the years, edge-case intuitions, and pattern recognition that operate below the level of conscious reasoning.

At Ensemble, the strategy for addressing this challenge is knowledge distillation. The systematic conversion of expert judgment and operational decisions into machine-readable training signals.

In health-care revenue cycle management, for example, systems can be seeded with explicit domain knowledge and then deepen their coverage through structured daily interaction with operators. In Ensemble’s implementation, the system identifies gaps, formulates targeted questions, and cross-checks answers across multiple experts to capture both consensus and edge-case nuance. It then synthesizes these inputs into a living knowledge base that reflects the situational reasoning behind expert-level performance.

Turning decisions into a learning flywheel

Once a system is constrained enough to be trusted, the next question is how it gets better without waiting for annual model upgrades. Every time a skilled operator makes a decision, they generate more than a completed task. They generate a potential labeled example—context paired with an expert action (and sometimes an outcome). At scale, across thousands of operators and millions of decisions, that stream can power supervised learning, evaluation, and targeted forms of reinforcement—teaching systems to behave more like experts in real conditions.

For example, if an organization processes 50,000 cases a week and captures just three high-quality decision points per case, that’s 150,000 labeled examples every week without creating a separate data-collection program.

A more advanced human-in-the-loop design places experts inside the decision process, so systems learn not just what the right answer was, but how ambiguity gets resolved. Practically, humans intervene at branch points—selecting from AI-generated options, correcting assumptions, and redirecting operations. Each intervention becomes a high-value training signal. When the platform detects an edge case or a deviation from the expected process, it can prompt for a brief, structured rationale, capturing decision factors without requiring lengthy free-form reasoning logs.

Building toward expertise amplification

The goal is to permanently embed the accumulated expertise of thousands of domain experts—their knowledge, decisions, and reasoning—into an AI platform that amplifies what every operator can accomplish. Done well, this produces a quality of execution that neither humans nor AI achieve independently: higher consistency, improved throughput, and measurable operational gains. Operators can focus on more consequential work, supported by an AI that has already completed the analytical groundwork across thousands of analogous prior cases.

The broader implication for enterprise leaders is straightforward. Advantages in AI won’t be determined by access to general-purpose models alone. It will come from an organization’s ability to capture, refine, and compound what it knows, its data, decisions, and operational judgment, while building the controls required for high-stakes environments. As AI shifts from experimentation to infrastructure, the most durable edge may belong to the companies that understand the work well enough to instrument it and can turn that understanding into systems that improve with use.

This content was produced by Ensemble. It was not written by MIT Technology Review’s editorial staff.