When did Confluent, Inc. (CFLT) hold its May 20, 2025 earnings call?

Confluent, Inc. held its May 20, 2025 earnings call on May 20, 2025.

How can I access Confluent, Inc. earnings call transcripts via API?

Confluent, Inc. (CFLT) earnings call transcripts and historical archives are available through the EarningsCalls.dev API. The API returns full transcripts, speaker segments and AI-generated summaries for Confluent, Inc. and 9,000+ listed companies across 70+ countries, typically within ~30 minutes of the call's publication.

Confluent, Inc. (CFLT) Earnings Call Transcript & Summary

May 20, 2025

executive

#16

All right. Well, that was really cool. I think if there was two topics in technology, I really like it would be data streaming and AI. But if there's three topics in technology, I really like, it would be data streaming, AI and what's happening in the world of energy, which is really fascinating. So I really enjoyed that. I'm going to be talking here at the end a little bit about this unification of streaming and batch, a little bit of what it means in the age of AI and agents, a little bit about the kind of data-intensive applications that Shaun talked about. And how these all fit together with some of the functionality we talked about, the snapshot queries, what's happening in the world of Flink. But to frame all of this, let me step back a little bit and start with the big picture. So what does all this mean? What's the point of all these data-intensive applications. I think there's a larger theme here, which is, in some sense, companies are becoming much more software. It used to be a purely human activity, but now we started to build software systems that don't just support the business. They actually run it. They're actually operating core activities, the interaction with customers, the production and goods and services, really things right at the foundation. I think what you heard from E.ON is very similar to that, where right in the flow, there's a feedback loop that's optimizing that business. And I think it's not unique to energy. It's happening in every industry. So how did we get here? What does it mean? What are the implications for data systems and infrastructure? Well, to talk about that, let me rewind a little bit and start with kind of the quick history. If you think about how software was adopted in organizations maybe going back a few decades, it was byte-by-byte. Applications, which kind of mostly stood alone. They were kind of a silo with their own data, their own database, the user types in the data and sees it back in some way. They just basically are kind of passive repository of data. And this is kind of the paradigm. These are UI-centric tools that are meant to show things back to humans. And the humans are ultimately driving the intelligence, the decision-making, a lot of the interesting stuff. The software is just there to kind of hold it for the human. And if you think about how this evolves? Well, we started to need to plug these together mostly for business intelligence. And so the rise of data warehousing was all about extracting all this data from these little silos and putting it together in one place where you could run analysis on it, where you could look across where you could do reporting across these things. And now if we fast forward over the last few decades, what's happened to this architecture? Well, it's mostly accelerated. Mostly, we have a lot more of all these things. We have more applications. We have more databases. We have more analytics systems, we have more environments that all this is running in, and it's all more interconnected. So part of this is just, well, there's a lot more of it. There's a lot more data. But I think there's been a change. It's not just that there's more software or more applications. The nature of these applications is shifting, right? This is -- there's really a transition from that kind of UI-centric tool which is there to support user input to something that's much more significant backbone of the business, the rise of these data-intensive applications that we were talking about. And I think if you think about what this means fundamentally. It's actually a pretty significant change in paradigm. And it's a change that goes all the way to how the applications are built. After all, if you're making a decision with people and the software is just supporting that then ultimately, the role of the software is to hold on to the data until the user wants to see it, and then retrieve it and put it on the screen. And so your concept of data infrastructure is mostly about storage and retrieval. That's really what it does. And the action is going to be very periodic. It's when the user shows up at the desk and happens to look at the thing. As you move into a world where software is doing more of the work, where it's making more of the decisions on its own, suddenly things change because it's not like the software comes into work at 9 a.m. and finally decides to open up that browser window, the software is working all the time. And so things that used to happen periodically start to happen continuously. They become a much more real-time activity. And the problem with data isn't just about storage. You still need to store data, but that isn't the primary thing. It is about the flow of real-time activity and working continuously off of it. And as this happens, the loop inside businesses speeds up. This is no longer something that happens periodically. It's something where one thing happens which triggers another things, which triggers another thing. And this kind of closed-loop decision-making in this translation of business into software systems, what do you think is happening now in this new world of AI? Well, this is a trend that's massively accelerating. This is something that I think didn't start with AI. We were doing this probably for the last 10 years in some sense, but it's -- the capabilities are massively accelerating. And for really obvious reasons, right, the limiting factor -- if you're trying to put decision-making and autonomy into the software, the limiting factor is the things that you can express in a bunch of hard-coded rules, right? It's really hard to capture certain business processes in a formal algorithm. And it's a lot easier if you can apply intelligence directly, right? The intelligence is no longer just in people's heads, which requires getting all the data through UI and then back again, the intelligence can now be out there with the software itself. And this really redefines the scope of what the software systems can do. I think it really is going to accelerate the adoption of software in companies, the automation. I think if we think about what's going to happen in the next 10 years of software development, it's going to focus on this problem, how can we internalize this, how can we take it on, how can we apply it in the organizations where we work. That's what's happening. And so ultimately, the question is, how can we build this type of data-driven, data-intensive application? What are these new agents in AI-driven applications going to look like? What's the architecture for them? What are the problems we have to solve? How hard is this ultimately to do? Now after all, we've all used AI in some form. You probably use ChatGPT or cloud or something. That's not that hard. So is it the same if we're trying to integrate this into a business? And the answer is, well, it's actually a lot harder. If you're actually trying to take something a business does and translate it into an AI-driven software system. There's a bunch of challenges that you have to solve. And I'm going to talk about three of the things that are really different, I think, from traditional software engineering that we have to adapt to if we want to be able to do this. I think these are kind of three principles that we have to get right for building this kind of system. The first difference is these applications are ultimately built with data, in a way that's really different from a traditional enterprise app. You can absolutely sit down and build some piece of software and write a bunch of unit tests and performance tests and integration tests, all with fake data, and you can deploy that to production and have it work correctly. And say, yes, this software works, it's logically correct. I've validated all of its rules, and it totally works. With AI-driven systems, if you haven't seen it work with the real data, you have no ability to say if it works at all, right? Think about a practical example of this. Let's say, I'm trying to build an agent that's going to answer support requests, right? I have no ability with just wiring that up to some other things to say if it works or not, until I've seen it actually run with real support requests from real customers acting on real data from the actual parts of the business applicable to that customer. And so I've seen that. I can't say that, that software works or doesn't work. I can't say anything about it, right? So the data in this new world is much more closely connected to the application than ever. And you have to be able to directly build on that, and you have to be able to harness a much wider ecosystem of data to actually build this type of application. It's not just about the stuff that's in its local database. It's about the stuff that's all over the organization that can help it. The second thing that's different is the ability to actually iterate on these data-driven applications is quite different, right? I need to be able to actually take this, run this on real support requests in my support example, be able to see output, benchmark that and say, yes, these are good. These are better. And then I have to be able to do that as change the model, change the prompt to add more data that's going to act as context, I have to be able to do that in development and then out into production as the system runs. Is it actually producing good results? Is it still working as I expected? If I need to make future changes to that, is that still working as I expected, that evolution and iteration is now a very different life cycle that's much more metrics driven. And finally, I need to be able to integrate this back into the actual operation of the business. I need to somehow take something that's very data-intensive and apply it in the flow of a company, actually run it like a production system, how it actually act in real time. So I think these are the three characteristics. So when we think about, well, what's the right way of building this kind of thing. We have different paradigms for building programs. And one that we might lean to right away, if we say, hey, where is the data? We might say, well, maybe we can build this kind of thing as a kind of batch process? Why not? And you might expect you to kind of beat up on that, kind of make fun of the batch processing agent. But actually, batch processing isn't that bad, in some ways. In fact, I would say we should have 2 cheers for batch processing. And it's not quite 3 cheers. There's not going to be quite 3 cheers, but there's at least 2 solid cheers because there are some things that it gets really right. If you ever built a simple process that just takes input from a text file, some Python script that munges a text file, this could be simple rules. It could be a more sophisticated machine learning process. It could be something with Gen AI. It's actually really easy. You take your inputs, you produce your outputs, you run it over and over again until it works, right? Similar thing if you work in a data warehouse, you have all your data there, you write your SQL script. You can actually build that with the data, you run the process, it produces an output. It's pretty easy. You can actually really do a good job of working with that. And the reasons I think are kind of simple. First of all, the batch systems actually have the data. You go to a data warehouse, it's full of data. It has a bunch of tables. It has the full history of everything. It comes across different domains, if you're writing a data-intensive thing, this is actually not too bad. And secondly, the batch systems have this really nice iteration loop. I can actually take my inputs, transform them into outputs, tweak my logic, do it again and again until I get the right thing. And I can do this in a way that's productive and that I can test on data at scale, if I need you to. So these things are pretty good. So maybe the world should just run on batch agents that kind of kick off at midnight and run until morning. And I think that's kind of where the problem is, right? In those three criteria, you kind of get the first two, but you don't really get the third thing. And anyone who's ever tried to take batch processing and integrate it back into the operation of a real-time business, you start to find this very hard. There's a lot of things that show up that just aren't right, right? Customers expect all the things that they see to be up to date and in sync. The business is ultimately happening continuously. And you end up having to do all these kind of weird hacks to integrate this back into production. You get all kinds of delays. This is ultimately kind of a hopeless process. Reality is out there happening in the world all the time. The applications we build have to do that as well. The more of the business, the software is taking over, the more it's going to need to do that. And so we're probably not going to fix batch processing to be this. But there was something good about it. So what about -- what we usually do? What about request response applications? Well, these are great. We build lots of them, rest, services, microservices, web apps, we're very familiar with this. We know how to run it in production. We know how to scale it. Maybe this is the right way to do it. Is this going to be the path to building AI agents? And the answer is, well, kind of like we definitely know how to scale it. We know how to run it reliably. It is real time, but it's really hard to iterate on data this way. We've lost something of what was good in a data warehouse. The ability to actually draw on many data sources, the ability to actually test and iterate, this is harder to do against a bunch of rest services. And it's harder for a number of reasons. First of all, they're not really set up for this kind of test, retest, benchmark methodology. I'm probably going to cause some chaos in production if I try and do that. Secondly, that data is going to be changing all the time. So if I run it once and I run it again and I get better or worse results, I can't really say if that's because the input changed or it's because of some change I made, right? And finally, there's a whole set of security concerns and trying to do that in production. There's nothing like the data playground that data warehouse had. And so there's a bunch of limitations if I'm trying to think about how to build this kind of data-intensive application purely on request response. And this will particularly become apparent as I think that, hey, I need to see the outcomes, it's going to invoke. What are the actions this agent is going to take? I need to see those actions before it does it. Ultimately, I need to be able to run that benchmark. And so the question that arises is, well, what about streaming? I think this is where streaming can come in done, right? Streaming has the strength of both of these paradigms. It has the ability to do something in real time. It has the ability to work with data. But not necessarily, like stream processing, in particular, has often had a gap between theory and practicality. It's been something that kind of makes sense intellectually, but it's kind of hard to do, right? And it hasn't always fulfilled this vision of making it really easy to work with data in real time. And I think there's a couple of reasons for this. One of the reasons is that a lot of stream processing systems, you just don't have all the historical data. So a lot of Kafka clusters, maybe got 7 days of data, but you don't have everything going back all the way. If you're trying to run something that's built on data and you don't have all the stuff you need, that's really hard. And if that's missing, then you often have to graft on some other system to make it work. Likewise, a lot of the stream processing systems haven't been good at kind of high throughput processing of historical data which makes it much harder to work with that full set of things. And the result is you're trying to build across maybe many different systems to get something that -- we'll do the historical data but can come into real time to do the real-time data and somehow bridge it all. This is not an easy thing to do. What can we do about this? Well, there's been an idea that's out there that you can really imagine streaming as a kind of generalization of batch, like something that's batch plus-plus, where you can run something at a point in time and get a result. But you can run it and have it keep running as that data evolved. So this is not a new idea, but in practice to get this right, there's a number of details you have to put together to actually make something that's practicable you have to actually have all the parts and have them really integrate well. So what are those parts? And how do you actually make this work? Well, there's two key concepts that a system has to deal with to do this well. One, is this idea of a stream? What's the flow of events happening in the world. Everybody here would probably know this from the world of Kafka. You also need the tables, the kind of state of the world, what's the current state of the world. An intelligent system kind of inherently needs both of these. I think we have some intuition that you need these. The stream is, in some sense, giving you awareness. It's like the sensory system of what's happening out in the business. And the tables are kind of like the memory banks, like what's all the stuff that we know right now. And inherently, you're going to want to connect these. But if this seems a little bit theoretical, let me give a practical example. So this is a process that is common out in the world, the process of taking a census. So U.K. has a census, U.S. has a census, a number of countries have census. And the goal is to really compute a bunch of stats about the population. Where do people live, how many people are they, nationality, et cetera, origin, all the things you might want to know. And so in a really funny way, the U.S. census is actually hardcoded into the constitution as a batch process that runs every 10 years. So every 10 years, it runs and you literally just count every single person, you go find every person, and you write them down. So it is the full table scan of census operations. And it's not that different in a lot of other places. It's actually not that different in the U.K. And this probably made sense in its time. Certainly, when the U.S. Constitution was written, the data was all collected on paper and transported by horseback. So I think it was inherently not going to be a stream processing system. But nonetheless, it just doesn't match with the current world. You want something where you know the state of things more than every 10 years, and you don't want something that's out of date by the time it finishes. So how could you do this with these two concepts? Well, inherently, you have these two things, right? You have the state of the population, what are the people? Where do they live? That enumeration of people, that's kind of the table. And then you have the stream of what's happening, the stream of births, deaths, people moving between cities. These two things together give you everything you need to know, right? As people -- as these events happen, the underlying table is evolving to reflect it. And these two things are intricately related. And this -- in the stream processing world, this is often referred to as the table stream duality, right? So if I had the stream of all the births and deaths and movements, I can actually recreate the table of where everybody lives as long as that stream went all the way back to the beginning of time. But also if I watched that table and I watched it evolve and I wrote down all the changes that happened in the table, I would actually end up with that stream. The two concepts are kind of interchangeable. That's what I mean. And if I had this, I can actually do real-time process on my population. I can have a view of where everybody was and all the stats I wanted that continually updated. And this is not a science fiction, it turns out there's actually a number of countries how do this, not the U.S. and not the U.K., I don't think so either. But there are some countries in Europe. I think in some degree in India, that have continuous population registries that update all the time. So it's not entirely impossible. And it's not just a thing that happens in the census system. These concepts also show up at the core of databases, where you have the idea of a commit log of changes, which is effectively the stream of updates to your data, and you have the tables of data that are sitting in the database. And these two things are actually directly related. The stream of changes is what actually populates the tables. And people say, in a sense, the tables are kind of just like an optimization. If you had that log of all the changes, you would actually have all the data and be able to recreate it, not just in its current state, but at any point in time. And this is the same relationship, if you think about it, that Kafka and Tableflow actually have to a company at large. So if you think about what is Kafka doing, it's hub of real-time data that is ultimately acting as a kind of commit log across the company. It's taking all the updates, all the things that are happening all across the organization, and it's applying them across all the systems you've got. And that's primarily used in this operational state for the kind of real-time applications running the business. If you go over to the analytical estate, there's a similar basis, there's a similar hub of data, which is these tables, right? These open tables data Delta or Iceberg, which are serving a set of query engines. And this is again an open service, not just the internals of a single database. So both of these are taking these concepts and doing it kind of a company-wide scale. And so the very natural thing as we talked about, is to connect these to actually have the stream, feed and continuously update the tables, have the stream of changes actually populate the tables. And when you do this, you get a kind of unified data set across stream and table that's actually the basis for stream processing. And one of the points I want to make is that this is not just a skin deep integration in Confluent. It's not just a connector that sends the data off to some Iceberg thing. This is actually a really fundamental representation. In Kafka, we've long represented this idea of a table as a kind of compacted stream. And in Delta and Iceberg, they actually represent tables in a stream-oriented fashion. They're built on a kind of LSM oriented design, which is effectively a table that is written out in streaming chunks and compacted together to deduplicate it. And you can plug both of these together in a very literal way to create a unified storage layer across streaming. And so this is what we've done with Tableflow. We've actually literally unified this stream table duality into something that actually represents the full life cycle of data. And for data systems, this is actually a basis for starting to think about how to make stream processing really practical. But data systems don't just have a storage layer, they also have a processing layer. And this is where Flink comes in, where it actually unifies these two things, and it gives you the ability to look across. It gives you the ability to treat batch as just a kind of bounded stream, a stream that's stopping now. And if you have these two things together, you can really start to realize this vision of making streaming a generalization of batch. And this comes into play is when we think about, okay, how are we going to build these data-intensive applications, what are the requirements we have? Well, one of the really important ones was the ability to reprocess data in a really easy to do way. And this reprocessing with streaming data, there's been a model for how to do this for a long time, where you just do it on the stream, right? You process the stream and if you need to go back and do it again, you kind of rewind, start over, and do it again. They sometimes called the Kappa architecture, although it's kind of such a simple thing. I don't know if it needs a name. And it's actually a really simple approach, but it has some practical limitations. And two of those limitations are, one, it's often not that easy to rewind all the way to the beginning of the stream because you may not have all the data in one place. You may not have it all in Kafka. You may not want to store it all there. That might be too expensive, that might not be desirable. It might be duplicative. And secondly, it can just be slow, like actually reprocessing all that historical data could just take a lot of time. If we're trying to iterate on top of this data, we don't want some process, which is slow to repopulate everything. And I'm going to show you how a combination of Tableflow and Flink solve this. And so it starts with the unified processing in Flink. So here is an example of two programs that do the same thing, one in SQL, one in Java. And when we look at this, we can say, well, okay, is this a batch program? Or is it a stream processing program? And the answer, of course, is yes. It is both of those things, right? It's a batch program, if you run it on just the data at a point in time, the snapshot. It's a stream processing program if you let it continue to run. And so you can have a single definition of what you want to do and treat it as both. And then as Shaun described, we've done a ton of work now to optimize the combination of these two. So using data that is in Iceberg or Delta, you can significantly speed up the processing of this data. When you're running these kind of snapshot queries, they'll actually be 50 to 100x faster. And this is very meaningful. If you're working on a good chunk of data, this will take something that might have taken 20, 25 minutes and it'll take it down to seconds. And so if you're doing kind of iteration on data, this is a very meaningful change. And we're not stopping there, as Shaun said, we're actually generalizing this further, so the streaming queries will always take advantage of these underlying optimizations, whether they're running in snapshot mode or not. They'll actually fall back on this and use it whenever they need to catch up. And so if you think about what you get when you put this together. You get a unified data system where the storage layer is built on Kafka, acting as a kind of commit log, Iceberg and Delta acting as a kind of table representation, Flink acting as a query layer. And we put these together into a unified system that shares schema, that shares a set of capabilities so that you have unification across all of this. And what this enables is solving some of these problems I talked about. This unification makes reprocessing of data really easy. I can now work on these data sets iteratively. I can work on my program and process them and look at the output and process them and look at the output. I can integrate AI into that process and do the same thing. And this makes it really easy to build with data. And that's really important. But not only can I do that and not only can I iterate on it, I can actually translate it seamlessly into production. I can take that same thing and run it as a continuous stream processing. So that actually really gives me -- all three of these characteristics, the ability to build with data, the ability to iterate on that and the ability to act in real time. And I think this is a really powerful combination. And I think this takes stream processing from something that was niche that was kind of on the side and makes it into something that's an incredibly powerful tool in the toolbox. And with Tableflow acting as the basis of this, you really don't have to pick and choose. You can kind of have your cake and eat it too. Because Tableflow is going to also populate all the data that you need in the rest of the analytics ecosystem. It's going to help you fill up all the tables that you need in your lakehouse or your warehouse. And that data will actually get better. It will actually arrive faster, and it will be transformed on the fly. So you can land data in a way that's immediately usable for your analytics users. And you can do this in a way where you don't have all the painful mappings from the operational estate to try and guess the schema and put it into the analytical estate, that will be actually maintained end-to-end with the unified schemas. So instead of having some ETL process breakthrough next day, if data changes in an incompatible way, that will be caught in development. People will know that, oh, I can't publish to that topic, with that schema that actually will break. So your lakehouse, your warehouse, the analytical side of the business gets better. But so does the streaming world. Suddenly, your data streaming platform has infinite retention and gets this essentially for free because you were going to keep that data for the warehouse anyway. And the data is shared. It's the same Iceberg table. You get the ability to reprocess historical data on demand, and you get a compacted table version of that data to use for reference. So both sides of this get more powerful by sharing. And I think that these two platforms are both really key to the future that we're moving. The world of data streaming is powering these kind of data-intensive applications, the things that are running in real time that are applying AI to run the business that are driving some of the customer-facing operational applications. The lakehouse and warehouse are the basis for analytics, the basis for a lot of the intelligence, the insight, the ad hoc analysis, the data science. And both of these need to somehow suck in all the data across the business and harness it, and make it available. And that's ultimately what we're trying to do at Confluent. And to do that, we've really brought together these core open standards that we think people want to build around. Flink, Kafka, Delta, Iceberg. We brought them together into a unified system. And we've added to that the connectors that actually hook this up to the rest of the organization that pull in the streams, and the set of unified schemas and governance that tie it all together. So that Flink knows about the same data with the same structure that's in Kafka, the same data with the same structure that's in Iceberg or Delta. All of those are operable in the same way. And we think that this can act as a basis that really powers the next generation of data-intensive applications that actually connects all the different systems across an organization. And we think that this is really the fundamental platform. When we think about what's going to happen with data, how we're going to harness that data with AI, how our company is going to translate more of what it does into software. And we're incredibly excited to build that future with all of you guys. And so that's kind of ultimately what this conference is about. We've got a fantastic lineup, I think, with really engaging content across all of these topics. Kafka stuff, stream processing, AI, all kinds of aspects of the governance of data, the use of data, all of it, you can find in some of the different sessions here, and you can not only do that, but talk to some of the people who are doing it for real, who are building these systems, who are using these systems, who are applying it in different parts of the economy. So go to some talks meet some people and let's all explore this new world together. I couldn't be more excited. Thank you very much.

This call discussed

For developers and AI pipelines

Programmatic access to Confluent, Inc. earnings transcripts and 32,000+ others is available through the EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments, full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.

Confluent, Inc. (CFLT) Earnings Call Transcript & Summary

Earnings Call Speaker Segments

Unknown Attendee

Shaun Clowes

Ahmed Saef Zamzam

Shaun Clowes

Addison Huddy

Shaun Clowes

Ahmed Saef Zamzam

Shaun Clowes

Ahmed Saef Zamzam

Shaun Clowes

Robin Sutara

Ahmed Saef Zamzam

Robin Sutara

Shaun Clowes

Dora Simroth

Edward Kreps

This call discussed

Other Confluent, Inc. earnings calls

Peers in Information Technology

For developers and AI pipelines