Confluent, Inc. (CFLT) Earnings Call Transcript & Summary

March 19, 2025

NASDAQ US Information Technology conference_presentation 101 min

Earnings Call Speaker Segments

Kamal Brar

executive
#1

Hello. Welcome. Good morning. [Foreign Language] Bengaluru. Welcome to Current 2025. Amazing. Look at all the people in this room. Last year, when we did this event, I think we had 2,100 folks register. This year, we raised the bar even further. We have 2,500 people registered. We have breakout rooms for folks to go over and learn. And it doesn't stop there: We have thousands of you watching online, so I just want to say thank you to every single one of you for making Current Bangalore one of the most important events for Confluent in the company. Thank you. So I also want to take a moment to acknowledge our customers and partners. In some cases, folks have traveled from Asia Pacific but also all over the world. And so I really want to acknowledge their participation in today's event. So we have a ton of exciting content, a great lineup of speakers. And of course, I know a lot of you want to hear about the product innovation that we're going to announce today, so without further ado, I want to announce on stage Confluent executive team, our Co-founder and CEO, Jay Kreps; our Chief Product Officer, Shaun Clowes; and our India leader, Rubal Sahni, to join me to start the ceremony with -- our "lighting of the lamp" ceremony. Thank you. [Presentation]

Kamal Brar

executive
#2

All right, thank you, everyone. Let's get started. [Presentation]

Unknown Attendee

attendee
#3

Please welcome Co-founder and CEO of Confluent, Jay Kreps.

Edward Kreps

executive
#4

Hello, everyone. Welcome to Current. So this is our second year in this venue, as Kamal said. Incredible growth in the event. And we completely sold it out well ahead of schedule, so thanks to all of you for making it here. And sorry, to everyone who couldn't quite get a ticket. Hopefully, you're watching online. We're going to have incredible content over the course of the day; dive into a lot of what's happening with Kafka, data streaming, stream processing, but I'm going to lead it off by stepping back and talking a little bit about what's happening in our technology ecosystem. I'm going to talk about what's driven the rise of Kafka and streaming and where it's going in an age of AI. And to do that, I'm going to start by stepping back and looking at the larger trend. We didn't have a ton of streaming data 20 years ago. Why now? We've had computers for a long time, but I think that there's an underlying force that's pushing this along. And that's, in some sense, companies are becoming software. Not just using software, but actually big parts of the processes that run organizations are now built in software, how you interact with customers, how the goods and services are produced, all the logistics and operations. More and more of that is getting built in software systems. And as part of that, all of these systems have to interconnect. They have to key off of each other. When something happens in the business here, something has to react and respond over there. And if we think of how we've gotten to this point. It wasn't all at once. We started with something much simpler. If you rewind a few decades: We started with simpler software that was actually more siloed. This is the early enterprise tools that we might have, the ERPs and CRMs and HRISes, all the 3-letter acronyms. What were these things? They were ultimately a UI, right? You type in some data into the app, hit save, right? It's stored. Maybe it shows it back in a few different ways, but it's ultimately an island into -- in and of itself. And the paradigm is really showing data back to the user of that application. It's very local, right? And as these sprung up all over the organization, you did have a bit of a problem, which is you couldn't get any insight across the business, right? You could see what was in one app, but you couldn't see across the business. And this gave rise to kind of the next movement in the data and application world, which is the rise of data warehousing. If the left-hand side here is all of the applications that run the business, this is a place to start to do some of the analytics, where we could suck in the data maybe at the end of the day; process it, get into some usable form; be able to run some analysis, reporting, ad hoc analysis on different parts of the business. And this is kind of the basic framework that we have today for thinking about systems, but over the last 10 years, all of this has accelerated. We have more applications doing more things; more interconnected across more environments, up into the cloud, across different cloud providers in different regions. And even the analytical estate, the data warehousing tools, this has exploded as well. There's lake houses and AI tools and specialized data marts, vertical SaaS reporting layers. There's a lot more stuff, right? And it's a lot more interconnected than it used to be. And the change isn't just that. It isn't just that we have more things. It's also that the software systems themselves are changing. It's no longer the case that we're just talking about isolated UIs. We're actually talking about software systems that run large parts of the business, and this is a big change. So if -- the back-end platform to support an interactive UI-based application is kind of a storage system, something to hold your data and look it up at the right time when the user comes and views the screen, but in a world where these systems are interconnected and in a world where a lot of the logic is not just the user making a decision and pressing go, that action starts to become continuous. It's no longer about just storing the data. It's about reacting to things happening continuously. And this is a lot of what's driven the rise of data streaming. You start to have systems that are reacting to different parts of the business. They're not just storing data and waiting. They're actually active. They're working all the time, and those systems are taking on large parts of what organizations do. And if you think about the trends today, this is really accelerating. And it's accelerating because of the rise of AI. So if you think about what was -- our ability to build the parts of the business in software. What held that back? Well, one of the big things that held it back was it was pretty hard to come up with a list of hard-coded rules to describe everything that happens in a business. There's a lot of nuance. There's a lot of things that people were doing. It was just hard to program. And this ability to take some of that and encapsulate it with powerful AI, that makes it possible to really widen the scope, so if the last 10 years was driving this interconnection and more things moving into software, the next 10 years, I think, is going to be even more of that. I think it's going to happen even faster. I think it's going to be broader, and I think it's pretty exciting. So if you think about where we're going. In a sense, we're kind of moving out of a world of business intelligence, and into a world of artificial intelligence. If you think of what did it mean to be data-driven in that older world I described, it meant that you could put all your data in one place and be able to run a report that would explain the business to you, right? That was being data-driven, but ultimately the action was happening through some executive that looked at the report and maybe called somebody. If you think about what it means to be data-driven going forward, I don't think it's just that report, right? There's -- it's about applications that use that data and take action and not just taking the thing the executive was doing and putting it into software but actually making decisions, taking action customer by customer, transaction by transaction in a much more granular way directly as part of running the business. So what does all of this have to do with data streaming? Well, actually a lot. If you think about the back-end platform for business intelligence, it was this big batch platform. We're going to suck a bunch of data into our data lake, lake house, warehouse. We're going to run a bunch of processing on that to get it into shape. It's going to support a set of use cases that allow us to report off of that, but ultimately this was about presenting data back to humans. It was a very people-, user-oriented use case. And that was kind of the basis for the last paradigm of cross-company infrastructure, but if you think about what's the place where data all comes together now to run the business, that's actually something very different, right? In a world of AI, with applications that are taking action on data all the time, suddenly you have a lot of the same capabilities but translated into real time. You still need to be able to bring all the data together, but it's no longer at the end of the day. It's continuously. It's as a real-time stream. You still have the need to process and build structure and quality around the data, but this is no longer a set of batch processes. It's real-time stream processing. And the set of applications this can power is things that actually execute parts of the business, core services that run the business. And so I think this is the role for data streaming, for the data streaming platform. This is what has driven the rise of data streaming over the last few years. I think the rise of AI takes this and accelerates it even further. And part of the reason for that is gen AI is actually very different as a paradigm for using data than the traditional machine learning we've seen. We've had machine learning applications for a long time. There might be some real-time aspects of that, but mostly that's something that kind of happens off in some data lake somewhere. But that's actually changing. And the people who work on this are changing it, so I want to talk a little bit about that. Let me start by just explaining what the classical machine learning workflow is, right? It's something like this. You have some applications that have data. Maybe you're extracting that data. You have some data engineers that are building pipelines and transformation. And ultimately a data scientist is going to build a custom bespoke model for one problem. Maybe you are predicting fraud. Maybe you're scoring relevance, but this model just does that one thing. And a lot of the intelligence is not really in the model. It's actually in all the feature extraction that's being done in processing the data to make that prediction. So in a classical machine learning world, you actually have a lot of the work in the model building side. And that's mostly being done off-line, in batch, at the end of the day. Most of the use of data is there. It's off-line, in batch, for these feature derivation pipelines. And there is a fair amount of effort to string together a feature that cuts across 3 personas, 3 sets of systems -- and makes its way all the way back to production. So what's changing now when we move from kind of classical machine learning to a gen AI world? Well, the paradigm is quite different. And the fundamental reason is the models are reusable. You don't have to build a new model for every single problem in a company. You're starting with these pretrained models that are much more powerful. They're much more generally applicable. And they're actually not built by most organizations. They're built by a relatively small set of companies, the OpenAIs and Anthropics and Googles and Metas and so on, right? And now we don't need to build that model from scratch. We just need to apply it to our problems. And so what does that mean? It means the use of data from an enterprise point of view. The key stuff that each company has that's essential for making the right decisions with the right context, that moves out of the off-line world and moves into run time. It moves into the path of inference. And so what that means is a lot of the use of data now in an enterprise context becomes real time, becomes streaming. And streaming has a really big role to play in this. And we're seeing this in our customers, in a bunch of different use cases. The first is around building these RAG applications. Any AI application, it has to have the right context. It doesn't matter how smart the model is. It can't make a good decision if it doesn't know about the particular customer or the particular transaction, the particular activity in the business that's happening. And to do that, you have to somehow take all the different data sources spread across that big, tangled mess I described. You have to be able to bring that together and unlock it to -- bring together with the model to make decisions, produce output. And streaming is a key way of doing this because it allows that data to be fresh. It allows it to be up to date. It allows you to transform it into what you need and really take action on it. So that's the first use case we're seeing around streaming with AI is building these types of RAG applications. The second is around building agents, building applications that actually act on what's happening. And so I'll introduce a use case for agents because people often mean very different things when you hear "agent." It's kind of a buzzy word, but I mean something very straightforward, which is just taking some activity that might have required human interaction and being able to move it into software with these models. And so I'll take a specific, very simple example. So in retail and in a lot of the delivery companies that we work with, there's this problem of maintaining a product catalog, the view of all the products that you have. And it's particularly interesting in the delivery world because, if you're delivering groceries, you have to suck in all of the information about all the products in stock and inventory that all of the grocery stores have in all of the places. And you have to create a normalized view of that. What products actually are there, and where can you get them? And you have to have a good description. If you want to be able to drive relevance in advertising, you have to really know which things are the same, which products are substitutable for each other. You have to really understand the world of groceries. And not surprisingly, this data is really messy. Not all grocery stores are kind of the pristine data shops that you might imagine. And so the actual workflow for something like this, it's a lot of software, but it also has some people historically. You're sucking in this data and you have some people kind of tagging stuff and manually connecting it. And it's a fair amount of work to actually make that accurate, keep it up to date. And it creates a delay in the flow, and so a very simple use case for AI and agents is starting to supplement this. Instead of having humans go in and crunch through each data thing with some very repetitive tasks, start to apply an LLM. Start to give it the context of what other products there are. Are these the same? What's the best way of describing this given these 5 descriptions which kind of overlap? What category does this product belong in? How can I tag it? What might people want to advertise against it? All of this can be done with a language model. So how do you start to build an agent like this? It sounds kind of very science fiction-y, right? It sounds like you're going to need some very futuristic tools that maybe haven't even been invented yet. And to some extent, it is, but most of the science fiction is actually in the language model. That's actually the hard part. How you plug this into your organization is not that complicated. At the end of the day, we want to be able to connect these systems up to other bits of software, other side context data. We want it to be able to react as things are happening in the business. At the end of the day, this is actually not that different from a traditional microservice. In fact, I would say, for this kind of simple agent, it's basically a microservice that uses LLMs in the background. It's kind of using some of these events as a sensory system to understand what's happening. It's using that to construct the context data that it needs to make decisions. And then it's acting on what it sees, on what happens, on the input, on the product uploads that are coming, the descriptions that it sees. It's taking action. And there's a reason that we build microservices with Kafka. This kind of event-driven microservice has a bunch of properties that help it be very successful across a lot of domains even without LLMs, right? It allows you to decouple some of the different things. It allows you to plug things together. You can have more modular deployments. There's a bunch of things that are good, but in the context of building agents, I think there's 3 properties that are really powerful. The first is that they're real time. The second is that they're replayable. And the last is that they're decoupled. And I'll say what I mean by each of these. I'll start with real time. So maybe the easiest way to apply an LLM would be like in batch. I just take a jump at the data. I run the language model. I look at the output. I can kind of iterate on that; get the right context, the right prompts; try different models; see if it's good. Running in batch might be the simplest way to build something, but it's not the simplest way to actually run something. If you actually try and plug something into a business, doing it like periodically at the end of the day, that's really hard. At the end of the day, companies happen in real time. If I'm automating something, it should probably get faster, not slower, as part of that automation. And so data streaming allows a way of doing this that actually speeds stuff up, that actually happens in the line of the business. And I think that makes it much more appealing than kind of batch systems if you're trying to actually build microservices or agents that use this, but even though I'm moving it into real time, I still have the ability to rerun or replay. After all, the thing that made the batch model appealing was the fact that I could benchmark, that I could try different contexts, that I could try different prompts, that I could try different models. I could run that on the input data. I could produce the output. I could score that output. I could send it off for evaluation. I could run it again. That ability to run and rerun is really important, and this becomes a key aspect of how you run this kind of model. You could run it on the input data the first time. It produces some output. And then you can rewind, start over and do it again with a different model and see what the difference in that output is. And this isn't just about the development part of this. This translates into how you operationalize these agents, all right? If I have something that's running part of the business; and I want to roll out a new model, different context data, I want to know: Is it really going to continue to make good decisions? And so one of the nice things about this kind of microservice is you can run them in parallel. You can almost A/B test the two. You can run like a blue-green deployment and have a second version of it, look at the output it's producing, compare the two, see if it's in fact getting better. And you can do this not just with a single agent but in the large. After all, my purpose in building these was to decouple these. And so if event-driven microservices decouple bits of software, they decouple bits of software that have LLMs in it as well. And that allows me to work on parts of this system independently. It allows me to run this without actually causing side effects. I can look at the output events. I don't have to actually trigger the side effects in the downstream services. That allows me to work on these parts independently; and support a relatively complicated flow of different software systems in a traditional sense, AI agents that are taking action. Put all this together into one set of workflows, one set of business rules, one part of the company that all plugs together. So this is a little bit about why data streaming is really getting adopted in the use of AI agents. So where are these new applications going to live? I talked a little bit about this operational estate, the kind of production applications, thing that run the business. I talked a little bit about the analytical world, the data warehouses. Where are we going to put -- where is this AI stuff happening? And it's actually sort of a complicated question to answer. If you think about it, are these kind of applications analytics? Well, in a sense, they are, right? I'm doing complicated gathering of data, building pipelines, processing it into something that's usable as context. It's definitely sophisticated use of data, so it does have some of the characteristics of analytics, but it also has some of the characteristics of operational applications. After all, this is part of a core flow in my business. If this is maintaining my product catalog or interacting with customers, it can't go down. It has to run reliably. It has to be highly available. It's going to have to hit certain SLOs. I'm going to have to be able to monitor it and know what it's doing all the time. So it has a lot of the characteristics of an operational application. And so it's actually a very interesting time for these kind of things. We have things which are both analytics but also applications. It's kind of the operationalization of analytics and it kind of blurs the line. And if we want to be able to build these things and we want to be able to build these across the organization, we're going to have to have the same data available everywhere. And that's going to have to be available to these operational applications which are going to be embedding bits of AI. It's going to have to be available in my lake house, my warehouse, in other areas where use of AI is going to be happening. We have to really unify all the data across. And that brings me to a pretty exciting product area that we've been working on at Confluent, a feature called Tableflow. And I want to talk a little bit about the motivation for this. So in the analytics world, the fundamental unit of data is mostly tables. In your data warehouse, you have a bunch of tables. In your lake house, you have a bunch of tables. And for the longest time, it's been the case that each analytics system had its own tables. It had its own world. If you wanted to fill this data warehouse, you would have to create custom pipelines and flows to fill it up, but if you want to use some other system, you have to fill that one up too. And you end up recreating a lot of the same data. You end up seeing the same problems, replicating the same logic, but what's happened as these analytics systems have moved to the cloud is -- in the end, all the data, all the tables are kind of in cloud object storage, anyway. And so there really is an opportunity to start to open up that data set and share it across. And this has driven the rise of formats like Delta or Iceberg that give a standardized way of representing this data and opening it up; using it across different data warehouses, lake houses; really connecting these. And increasingly, this is a way of actually connecting all the different systems and applications in the analytical world, actually allowing all of these things to share a data set. So if this is connecting up the analytical estate and Kafka is connecting up all these microservices and applications and real-time data, how can we connect these? How can we actually start to populate these tables with real-time data, feed it off of the streams? And there's been ways of doing this for a long time. You can have different connectors that read data out of Kafka and write it into your lake house and try and format data in this way, but there's a bunch of reasons that this has been difficult. It's been challenging to maintain. At the end of the day, a lot of the paradigm downstream is you're getting this data. The application upstream determines what it is, and you have to somehow carefully map it into these tables. If that ever breaks, then all of your application logic is broken. If some fields goes away or changes meaning, everything downstream changes, so it's been relatively hard to maintain these mappings. And this is where this feature Tableflow comes in. And we're really excited to be announcing the GA of this feature. And we're going to talk a little bit more about how to use it. We'll show some of the demos that actually show this in action, but what this does is make it possible to take every stream of data that's stored in Kafka and represent it as an Iceberg table or a Delta table that's fully structured, that's continually populated. And it's not just a connector that copies data from one thing to the other. It's actually unifying these 2 concepts. It's actually bringing together the stream of data with the actual cable. They're actually the same thing represented in object storage, and you're able to use these across both of these worlds. You can have the real-time feed of updates as well as the full representation of the history. And you can take any of these analytical systems that work with these formats, the Snowflakes and Databricks; tools from Amazon or Google that will run on top of these tables. And now they can have access to this full set of real-time data. And this is actually a very powerful concept that's coming into the streaming world. So inside of databases, we've always had these 2 concepts. We've had a concept of a commit log or a stream of changes that are coming into the database. And then we had the actual tables of data the database maintains. And inside the database, these 2 concepts are connected. Whatever gets committed to the log gets applied to the table. And if you think about what we've done with Kafka, it's actually take that commit log or stream and open it up as a kind of open service across the whole data center. So all the applications can tap into that stream. All of them can use that set of changes. All of them can apply that. And if you think of what's happening with Iceberg and Delta. They're actually taking a similar tact (sic) [ tack ]. They're taking these tables of data and they're opening it up to a broader set of applications and making that an open-data service across the analytics world. And so by connecting these 2 things, we're kind of doing what was done inside of databases but as an open service. Now you have streams as a first-class primitive that all your applications can subscribe to. You have the resulting tables that are derived off of those that are open and available across the analytical world that anything can use. And this is not just curiosity. It's actually a pretty big paradigm shift in how you use data. The old paradigm and the paradigm that we're mostly stuck in for using data and batch is you kind of ship a bunch of raw data into the data lake and then you try and clean it up. You try and fix everything that's bad with it, get it into some well-structured form, but that turns out to be very fragile. And it turns out to be a lot of work. And it turns out to be very hard because ultimately there's no contract between the application that's making the data, that's changing the data, that's producing it; and the downstream systems that are going to try and process it or use it. So if anything changes upstream, it may break downstream. And if all that you're using your data for is reporting, then that's annoying. Like, your report may break. But as you start to think about these AI applications which have significant data dependencies and that are now running production parts of your environment that are responsible for interacting with customers, that are critical parts of your infrastructure, you can't just have things breaking. You have to have a reliable way of having a contract for how data is going to flow from all the upstream applications, how you can create that context, how you can use it. And that's what this enables. It enables a pattern we would call shift left, taking that contract and moving it from something done after the fact to something that's upstream so that applications publish well-formed data; so that early, in the development cycle, if you make a breaking change, you know. You know that, okay, this can't be deployed. This is actually not compatible with what's downstream. And this has been the case in the world of Kafka for quite some time, all right? We've had this Schema Registry where you can register schemas. You can say, "Okay, this is the structure. I want to ensure compatibility. I can evolve this schema, but I can't create breaking changes. I can't break the downstream people who subscribe to this stream," but now the same Kafka schema is actually directly tied to the Tableflow tables so it's actually translated directly through. And that means that, as that schema evolves, the table will evolve, but likewise, if the application tries to publish something that's wrong, that's incompatible, that can't be used, that will be rejected. And that error down is, with the upstream team in development, not something that's found in the ETL run the next day as suddenly all your data is not available and written in the wrong format. So this unification is not just at the storage layer. It's up into the schemas and governance for data, and it goes all the way to the processing as well. One of the things that we offer in Confluent is Flink. And Flink provides these really powerful stream processing primitives so that, if the stream of data I want to publish out to the organization is not exactly what I'm getting, I can actually join things together. I can aggregate it. I can enrich it. I can produce a stream that's broadly usable and that I can maintain as an application owner. And what this is doing is taking the idea of data products of well-formed data that is actually maintainable over time that can be treated like a contract. And it's making it very practical for application owners to produce those, produce it out to the rest of the ecosystem; and producing it out to the ecosystem not just of other operational applications but now, with Tableflow, out to the analytical world as well. And this is incredibly important as we move into this world of AI, where the use of data moving across is increasingly to power these relatively powerful processes that are doing important parts of the business and are going to depend on this data to get the right answer, to do the right thing, to have the right information at the right time. And suddenly these data pipelines go from some back-end ETL thing for reporting to a critical part of what makes the business run, a critical part of what makes money. So this ability to unify these 2 worlds, I think, is a really powerful feature that we're quite excited about. So I guess that just leaves one question. So we talked about AI. We talked about where this lives in the data center. The last question is, okay, if this new generative AI -- it's -- kind of spans the analytics and the application world, who is going to build this? Like, we talked about the different personas in the classical machine learning workflow. Who is ultimately going to build these AI applications? Is it the application developers? Is it data engineers? Is it data scientists? Like, who do you need to actually go be successful in building these things? Well, I think, as we move into a gen AI world, the answer is all of these people. All of these people are going to do this. And the reason is because these tools are actually a lot easier to use than building traditional machine learning models. I mean we've all played with AI, and it's actually quite possible to get something pretty usable. So all of these people, I think, are AI engineers. And I would say every engineer that works with data that is building functionality is going to be an AI engineer. This is going to be a key part of the tool set in the same way that using databases is a key part of the tool set, learning different programming languages, learning basic deployment tools, working in the cloud. These are just basic skills that we've all mastered and bring to bear in solving problems. And I think, as a result, we're at a really incredible time where what you can do as a software engineer is much broader. What you can do as an organization with software is much broader. And I think a lot of what's happening in the world of data streaming is going to make this practical. And it's going to become a core part of the foundation that enables this. And so I'm really excited about what we have left -- what we have coming up in this conference. You're going to hear a lot about the new functionality and tools and all of that, but before we get into that, I want to welcome up to the stage somebody who's doing some of this stuff for real, somebody who's putting data streaming in practice. I want to welcome Rajesh Kandasamy of Marriott, who's going to talk about data streaming at Marriott. Rajesh?

Rajesh Kandasamy

attendee
#5

Thank you, Jay. That's really amazing. Thanks, Jay. Thanks for hosting this amazing event in our Marriott property. We really appreciate your partnership. I am here with all of you, very excited to be here with everyone at Current. I am Rajesh Kandasamy, vice president, application development architect at Marriott. I manage the entire data streaming for the organization, which I'm going to talk about today. I also manage the customer domain, which includes customer identity and access management and the profile management, which has more than 230 million customers across globe. I'm here to talk about data modernization behind Marriott Bonvoy and how data streaming actually elevate the customer experiences from shop to book and throughout your journey and your stay in our property. Marriott. We are the leader in the hospitality and been in the business since 1927. We actually started our business as a root beer stand in 1927 at Washington, D.C. and quickly moved to hospitality and become a leader in the hospitality. Our portfolio has 9,300 properties, with amazing 32 brands across globe in 144 country and territories. Our vision is to become the world favorite travel company and not just being hospitality leader. Hopefully, you all know our Marriott Bonvoy, which is the leader and the largest loyalty program. If any of you are not member, please let us know. We'll be happy to sign you up [ for ]. So delivering a customer loyalty is a high-stake business for Marriott because 70% of our revenue comes from those valuable and loyal customers like you. So our customer has high expectation on us to know who you are, what's your need and optimal ways to engage you, so having a customer data real time is key to our success. So data streaming become a critical part of our business. If you are a traveler who loves to plan your travel at last minute, who books your reservation in last 24 hours like me, when you make the reservation, we get that reservation in real time today; and understands you and who you are, what's your need and your preferences. You may like that extra towel in your room or that special wine for your dining. We understand you. And we fetch all those data from different system in real time and enrich that reservation and make that reservation as a matured data product and deliver to the property. So associates in our property actually react to that and make sure they have everything well prepared to welcome you, to provide the best-class experiences during your stay. In the past, this used to be a fragmented experiences because our batch integration took anywhere 24 to 48 hours to deliver this data, so I would like to take a step back and talk about [ where ] this is all started. In the beginning, we had this spaghetti-complex P2P integration architecture where even a small schema change in our reservation was a big deal because our reservation system is actually integrating to -- 20 to 30 different integration system internally and externally. It's not just the P2P integration. Now the data is moving from system A to system B, to system C. It's created lack of auditability and data redundancy all over, which you're all familiar with. And our lovely batch took 24 to 48 hours to deliver the most valuable data that I was talking about. So we embarked our data streaming journey by introducing our first-ever Kafka platform via IBM SoftLayer. And then we quickly evolved to Confluent Platform which we are managed by. Today, we have this simple, flexible architecture powered by Confluent Cloud, which is actually set to scale for the business. By doing so, our engineers -- no more investing their time, valuable time, upgrading the platform. It's not just time consuming and complex, but also it is disrupter to the business. Instead, they actually invested their most valuable time creating an ecosystem, products and tools that's very valuable for our business user, developers and quality engineers; and created the self-services platform. Because of that, we have 900 business-critical transaction. And 1,200 consumers are consuming this data in real time, which was not possible with that spaghetti architecture. So in Marriott, we have core and emerging products, our core products being our lodging products, your room, spa, golf and all the products that you get in the property. And our emerging products are all our partner products, your co-brand cards, your airline partners and that partnership with Uber and Starbucks. We have 100-plus amazing partnership across the globe because our Marriott Bonvoy members are not just earning points during their stay. They're earning points when they have a sip of coffee at Starbucks or they are actually having a ride in the Uber, so partner integration is so critical to our business. In the past, any partner integration took 6 to 9 months, which is a big no, to support the scale for the organization, so we introduced the data streaming and make our platform self-services so our business can automate the onboarding of partners. And using a Confluent connected ecosystem at ease to integrate with those partner integration, we brought that 9 months to 3 months to the integration system, [ in enrollment ], and 6 weeks -- 3 weeks in the integration and 6 weeks to the production. Amazing evolution, what technology and the platform can bring to the business. What is next? Marriott being in this, the biggest digital technology transformation in the history. It's a multiyear journey, and we are in the tail end of the journey. When we started that transformation, the biggest one, we started with a basic simple architecture assumption that everything is going to be data streaming driven and everything is -- needs to be real-time data, so we are so much excited what the technology will bring to our Bonvoy members and our associates and the property owners. I am so excited and looking forward to that transformation to come to life. I would like to leave 3 key takeaway to all of you. Make your business-critical data real time, number one. And make your platform very scaled, efficient and operationalized. Number two, once you have that real-time data in the hub now, enrich the data as much as possible to make the data as mature data product, number two. Number three, there is no topic these days without AI, so apply AI on top of that to provide the best hyper-personalization to your most valuable customers. On top of the 3, this is the one which is very close to my heart, create an ecosystem. Create a tool, product for your business users, whether you automate the onboarding or providing that API through portal for the self-services capability. Or create a tool and capability for your developers, engineers or the quality engineers because -- I'll tell you this. They are the one who is going to make you look success. Today and in the future, Marriott data streaming has become a critical aspect of what we do in Marriott. And I would like to take this moment to thank Confluent for their amazing partnership along with this journey. And thank you all for joining with us today. Thank you so much. I would like to welcome our next guest -- speaker, Shaun.

Shaun Clowes

executive
#6

All right, thank you, Rajesh. It is amazing to hear about the impact that Confluent's data streaming platform is having at major organizations like Marriott and many others all over the globe. Also I like that last point that Rajesh had on his slide about where Marriott is headed, the idea that they are moving towards AI and taking the data streaming platform as the underlying foundation to get them there. And I'm sure that many of you are probably thinking the same thing. "How do I move my organization towards AI?" And that's where what Jay shared earlier about the operationalization of analytics is so critical. We can't continue with the current siloed approach. Why is that? Well, the problem is that, because of this messy, fragmented, duplicative ecosystem, organizations really struggle to get at their data and use it. In the operational estate, we have a mass of point-to-point code and application integration tools trying to make the operational applications work together. And over in the analytical estate, we have a whole separate mass of ETL and ELT tools and data warehousing technologies that are sucking data out of the operational estate, loading it into the analytical estate and trying to piece it back together for analysis. Now as Jay shared earlier, it was kind of okay to do AI and ML in that old architecture because traditional AI and ML workloads existed almost entirely in the analytical estate. And that's because traditional ML models were typically trained over a period of months using just corporate proprietary data usually in the data lake. And then your data scientists would, through trial and error, arrive at some relatively simple statistical models that they could then deploy to make predictions, but the world of gen AI has turned that totally on its head. The foundational models now come pretrained from one of the major providers. And so when you're trying to build something with AI, let's say you're trying to build a chatbot or task automation or you're trying to build an AI agent, then your problem actually has nothing to do with training anymore. It's about continuously feeding that AI with the information that it needs to actually deliver good results and to work for you. There's a need for a continuous, unbroken stream of data from the operational estate to the AI and from the AI back to the operational applications to update data or to take action. Now that continuous, unbroken loop of information from the operational applications to the AI and back has to operate at unprecedented speed and scale. It has to be really fast because you don't want your AI agents taking action on out-of-date, incorrect information. It has to operate at really massive scale because you need to be able to keep up with the ever-increasing demands for data inside the enterprise, but you also have to be able to keep up with that scale cost effectively because you want to be able to take these really smart LLMs and put them to work on all of your business problems and still get an acceptable return on investment. And finally, you have to bring really rich, complete data to these LLMs. You want to fill up the context window with all of the most important data to get the best possible outcomes with fewer hallucinations or errors. Now any break in that continuous flow, if latency is to spike or the context is incomplete because of some sort of processing error upstream, it's actually going to significantly impact your customer experience. You can very quickly have an AI agent that was once super smart turn into an absolute chaos machine making terrible decisions, harming your customer experience and damaging the brand, so honestly, if you're looking to tap into the power of gen AI, organizations fundamentally have to rethink their data architecture. This traditional siloed approach just isn't able to keep up with the demands of AI. It's not about clunky, static point-to-point pipelines anymore. You have to think about your data as a living, breathing network of streams that's dynamic, real time and reliable. And when you think about it, it's actually kind of obvious. Gen AI has almost nothing in common with the traditional analytics stack that was built around reports and batch-based decision-making. Gen AI isn't about dashboards. These are experiences. Gen AI isn't about reports. These are actually applications. And so gen AI is driving a Cambrian explosion of new applications that are more powerful and can be built more quickly than any of us could have ever imagined 2 years ago. And I know you've all heard that many AI projects are stuck in pilot phase. And that's true, but it isn't because you can't build those experiences quickly. It's because organizations really struggle to feed them with reliable, real-time, rich information. And as a result, they can't deploy these AI apps into production at scale safely. If you want to put it really simply: Every AI problem is fundamentally just another data problem. So how do we solve our data problems? Well, that's where the data streaming platform comes in. The data streaming platform is the key to unifying the operational and analytical estates. It's how you take your data mess; and turn it into contextualized, rich, reliable data; and use it to power AI across all of your business needs, but today, instead of just telling you how to do this, we want to show it to you in a series of demos. Now you just heard from Marriott, the largest hotel chain in the world, so let's stay on theme. We're going to talk about a hospitality example. We're going to talk about [ River hotels ], which is a global hotel chain with hundreds of franchisees all around the world. Now [ River ] is looking to build an AI experience that will enable them to promote highly rated locations but aren't selling as well as they are expecting and to do so with minimal human effort or intervention. Now to do that, power this type of AI use case at the scale of a global chain like [ River ], we're going to need a few things. First, we're going to need our data streaming because real-time experiences and real-time AI is powered by real-time data. Second, we're going to need make -- need to make certain that our data is governed and it's enriched through stream processing because the quality of our customer experience is driven by the quality, reliability and recency of our data. And we're going to have to take that rich, reliable, real-time data; and we're going to have to feed it into their AI models, applications and reports we need for our various different AI use cases. So let's start with that first step, streaming. And for that, I'd like to welcome to the stage Addison Huddy, VP of Product Management and the Head of Kafka and Streaming at Confluent.

Addison Huddy

executive
#7

Awesome. Thank you, Shaun. It is great to be back in Bengaluru. I'm excited to talk to you all about the future of Kafka and streaming at scale. Now about a year ago, I was on this very stage and I had 3 predictions. You might call them hopes or dreams for the year to come. So my first prediction -- and the cool thing is we're here 10 months later and all of them came true as of yesterday. So the first thing I hoped for was Apache Kafka 4.0 with GA. And you would no longer have to rely on ZooKeeper. As of yesterday, Apache Kafka 4.0 is GA, yes. You can clap for that. That's a big deal. All right, next, I hoped that Freight clusters would GA to give you all a more cost-effective way to stream high-throughput workloads. Confluent customers are already saving up to 90% using Freight. It's pretty cool. Okay. And number three, and this is 100% true, I predicted KKR would win the 2024 IPL. Don't believe me? I have Slack messages to prove it. All right. So enough of this victory lap. Let's get into the details about all these predictions, okay. So like I said, just yesterday, Apache Kafka 4.0 is GA. It is the biggest release of Apache Kafka ever. The transition to KRaft is complete. Apache Kafka no longer ships with ZooKeeper. Consumer rebalances are faster and more stable, making applications faster and more stable. And there's an early access for queues on Kafka, and so much more. There's like 37 KIPs, so check out all the release notes. It's a big, big deal. Now I've said it before and I'll say it again. I've learned firsthand that just taking Apache Kafka and putting it to cloud doesn't make good on a lot of the cloud-native promises of elasticity, cost efficiency, scale. So to deliver on these cloud-native promises at Confluent, we needed to take a different approach that would allow us to have a best-in-class Kafka service. So we got to work and we invested in a ground-up system that we call Kora. Kora is a fully disaggregated system that gives us the flexibility, performance and scale we needed to handle all of the different types of workloads that our customers send our way. Kora is an ever-evolving system. And we're hard at work on some next-gen enhancements to make it more performant, more scalable, more reliable. And you can read about all those details by checking out the link at this QR code, but today, I want to focus on one thing in particular. And that's cost efficiency. So at Confluent, our mission is to put data in motion across your entire organization while saving you money. Now saving money on Kafka, there's a lot to it, which I'm sure you're all very aware of, but I want to focus on 2 things that I think Kora is particularly good at. That's optionality and auto-scaling, so let's first talk about optionality. Now one of the challenges with data streaming is that not all data moves the same way. Some require high throughput. Some require really low latency. Others just need to move data from A to B as cheaply as possible. And the typical way of handling this is by creating a massively overprovisioned shared services cluster or by stamping out and maintaining a bunch of different clusters for a bunch of different teams. The result is poor cost efficiency and no teams getting exactly what they need. Now Kora changes this. Kora has allowed us to deliver the right cost profile with the right feature set, all in the same stack, so Kora has allowed us to build different types of clusters for different workload requirements and budgets. We built fully optimized set of options that auto-scale instantly to your different workloads so you can pick the best cluster to fit your use case and budget and then we auto-scale the workload from there. Now for low-latency workloads, that's where we have 3 different types of clusters that are great for real-time pipelines, event-driven architectures, microservices. So that's basic, standard and enterprise. Basic and standard are great for dev clusters or getting those first production workloads into market. For even more scale in private networking, that's where enterprise comes in. And we've been hard at work on making enterprise even better. And by the end of the month, we're going to be on Azure, GCP and AWS. And they'll scale to over 7.5 gigabytes per second of throughput, meaning they can handle almost any workload that you can send their way, yes. That's a lot of throughput. That's a lot of throughput. So last year, we talked about our vision for Freight. And they're generally available on AWS. They support private networking new -- with a new networking interface that we call private networking interfaces that implement the native AWS ENI -- interface. Freight clusters replace a lot of the expensive inter-AZ replication charges with "direct to object store" writes, so this means they're great for relaxed latency workloads where you don't really particularly need the latency, which allows you to trade off latency for significantly reduced costs, making them perfect for workloads like logging, metrics and observability. And they're the perfect complement to Tableflow. All right, now let's talk about auto-scaling and how it's the cornerstone of saving money in the cloud. Now most companies dramatically overprovision their Kafka infrastructure for increased stability. Even at peak usage, we found that -- on average that a self-managed Kafka cluster is underutilized by over 50%. So let's check out this graph: The dotted line up top, that's an overprovisioned static Kafka cluster. The green line is your workload. That red-shaded area, that's all wasted money. So on average, you're wasting over 50% of your infrastructure costs, at best. The true cost could be much higher. Kora brings auto-scaling to Kafka, making a more efficient use of resources. So Confluent auto-scaling clusters are powered by elastic CKU or eCKUs that are always shaping your workload, making sure that it's rightsized. This means by default you're saving over half on your infrastructure costs. Now we've talked about Confluent Cloud. And just last year, WarpStream joined the Confluent family; and we could not be more excited. WarpStream uses object store writes to avoid all those cross-AZ charges that I talked about earlier, making them great for logging and observability use cases. And they're fantastic for those that have chosen to do a BYOC deployment strategy. And it's awesome because they have the best-in-class architecture for this type of deployment. And for all those on-premise workloads, that's where Confluent Platform comes in. And these offerings aren't alternatives to one another. There are different offerings to fit different use cases and workloads throughout your enterprise, so don't just pick one. You can mix and match these things together to get the right cost profile and to fit with your use case and strategy. We're also very excited to announce that Confluent Cloud is available on JioCloud Services, making it easier than ever for businesses in India to get started with data streaming. This is a very big deal, yes, very, very big deal. So today, Confluent Cloud-dedicated clusters are available. Scan this QR code to learn more. Now here's the best part. All of these clusters, they work together to form a streaming mesh. You can mesh your streams together. Thanks to cluster linking and WarpStream's Orbit, you can replicate data across your enterprise: any cloud, any data center, all in real time. And it's our vision to unify management and monitoring of all these different clusters so all your clusters, whether in cloud or on prem, are displayed in Confluent Cloud under a unified view, all your metrics, all your topics, lineage across all your streams. Imagine being able to set an encryption policy in one place and then have it apply to your entire streaming estate with a click of a button. This will soon be a reality. And we're very excited to share more about this as the year goes on. And with that, I'd like to invite Shaun and [ Mark ] back on stage for some demos. Thank you.

Shaun Clowes

executive
#8

All right, thank you, Addison. I am incredibly excited to give you all of those different options so that you can bring streaming to all of your data workloads no matter the volume or importance of the data that you're pushing in your enterprise, but it's time to get back to our use case. [ River hotels ] is looking to reengage customers who've recently browsed hotels that were highly rated but who didn't end up completing their reservation. Now if we're going to power an AI use case like this, we're going to need to connect and stream a bunch of data from across our enterprise. Now the data they need is in a few different sources. There is an Oracle database that has customer and hotel information, but there's other information they need, for example, clickstream data. They're going to need recent reviews and ratings and recent bookings by customers, so let's bring all of that together. [ Mark ], are you up to tag team with me on this demo?

Unknown Executive

executive
#9

Absolutely happy to. Great to be here with everyone at Current today. We've got a lot to cover, so let's go ahead and get started. Can we queue that demo, please? [Presentation]

Unknown Executive

executive
#10

All right. So Confluent makes the streaming part super easy. We have a bunch of different cluster options, as Addison was just talking about, so you can tailor to your individual workload profiles and cost requirements. So in this case, I've got some real-time data, so I'm going to use a provisioned enterprise cluster here that is perfect for low-latency workloads while also using private networking. And for my clickstream data, I'm going to use the new Freight clusters that Addison was also just talking about, which is perfect for high volumes of data while also being privately networked but also being very cost effective as well. So getting started, super easy. Let's just get that provisioned. And so now that we have our clusters, we've got to get some data into our clusters. Shaun was mentioning that we have some data in an Oracle database, our customers and hotel catalog, so this is a good opportunity for us to use our new Oracle XStream CDC source connector, which 1 of over 80 fully managed connectors we offer in Confluent Cloud. Getting started, super simple; just in this case, we're just setting up some basic database information. Then we are simply going to tell it what format or what output we want the Kafka data to be in, where we want to send the data, from where we want to query the data. And then we'll provide just a little bit of sizing information, some capacity planning; name it. And we'll just leave the default name. And then that's it. Once you're provisioned here, data is now streaming from our Oracle database directly to our enterprise cluster. Here is what a sample of our hotel catalog data looks like. So I mentioned that we also have clickstream data that's going into that Freight cluster. It's coming from a good, old Java application, good, old Kafka producer that we all know and love. Let's take a look at an example of what these messages look like here. It's also important to note that all of our topics are backed by schema. So we make sure that only good-quality data is getting into the system from the source. And this is an example of what that schema looks like. So that was just a couple of minutes to get everything started. We got our clusters provisioned. We have data flowing into our clusters. Shaun, back to you.

Shaun Clowes

executive
#11

Thank you, [ Mark ]. You made that look really easy, but sadly, it's not always that way. We've seen customers spend 3 to 6 months when they try and build their own connectors to different systems or weeks or months when they stand up their own infrastructure to try and self-manage Kafka connect clusters. It gets really complicated really fast. And that's why we built over 80 fully managed connectors in Confluent Cloud to all of the most popular sources and sync systems that give you the 2-minute experience that [ Mark ] just showed you, but we wanted to go further than that. With Connect with Confluent, our partner program, we're bringing an even simpler connectivity experience directly within the user interface of tools that you know and love. There are over 50 of these integrations already, which means you can set up connectivity to Confluent directly within major applications like MongoDB, Elastic and AWS Lambda, but for cases where you really do need a custom connector for your use case, we'll also take away the management burden and host it for you in Confluent Cloud. And today, as [ Mark ] showed you, we're really excited to announce our new Oracle CDC connector. And you actually just saw it in action in the demo. Now our Oracle connectors are actually some of the most popular in our entire portfolio. And this is a brand-new premium connector powered by the Oracle XStream technology, which delivers unprecedented scale, performance and resiliency. It is compatible with even the most complicated Oracle environments. Now the new connector is currently in early access. And it will go GA over the next couple of weeks, so worth checking out, but it's not always just about connectors. As [ Mark ] mentioned, oftentimes, data is streaming natively from applications that were built to stream using the Kafka clients. And we're making it easier than ever for developers everywhere to build, test and deploy applications that stream natively by taking an experience directly into one of the world's most popular integrated development environments, Visual Studio Code. Now our brand-new Visual Studio Code plug-in is already GA. You can download it today from the Visual Studio Code marketplace. Now back to our use case. We've got our data connected. We've got streams up and running, but if we really want to power AI, we don't just need the real-time streams. We also need the streams to be trustworthy. And we need them to be contextualized with all of the relevant information that the AI is going to need to make a decision or power a workflow. Now Jay mentioned earlier this idea that we need to shift left, the idea that we have to move governance and processing closer to the streaming source of the data. Why? Well, honestly, for AI to be truly effective and powerful, businesses need to be able to take action on what is happening right now, not yesterday, not an hour ago, in real time. Now Marriott even mentioned this earlier, but consider our hotel reengagement use case. What are we going to need to know to make this work? Well, we're going to need to know the most recent hotels that the customer has looked at. We're going to need to know the available rooms across different price points. We're going to need the most recent and important reviews and ratings of those hotels. We're going to need to know the travelers loyalty program tier and a bunch more, all to help make a personalized offer to that customer. And it's not just about information that is generated in real time. You also need to get access to a whole bunch of context information quickly as well. So you heard Rajesh talk about information like a customer's purchasing preferences or a customer's prior purchases. All of that contextual information may not be generated in real time, but it's needed in real time at the point that the AI is making the decision or the offer. But organizations really struggle to bring all of their data to bear against problems like these. They really struggle because it is so siloed. In the operational estate, you do have your applications exchanging data in real time, but that's usually done using a mix of a bunch of point-to-point custom code or application integration tools. And that means that none of the operational applications can see all of the data across the organization. And adding new data sets is really expensive and time consuming. Now if you look over in the analytical estate, the picture is very different. The analytical estate can have a broad view of a lot of the data across the enterprise, but the way that, that view is created is with ETL and ELT pipelines that suck the data out of the operational estate and then process it in large batches to try and put it all back together. Now that means that the data in the analytical estate is only as recent as the slowest of the input batch jobs, and the quality is only as good as the worst of the input batch files. Now if we want to break down these silos, get access to more of our data and put it to work, we're actually going to have to move some of the analytical capabilities into the operational estate. And that's what Flink does. Flink, in conjunction with Kafka, brings the very best of the stuff you might think of in the analytical estate over into the operational estate. And that's why Flink has seen really dramatically fast adoption. Really innovative organizations like Uber, LinkedIn, Stripe, Shopify, Disney+ and many more have all adopted Flink as a way to break down the silos and seamlessly process data across the operational and analytical estates. Now this enables really powerful and important workflows, things like ad targeting, personalized recommendations, dynamic pricing, real-time logistics. These are important types of workflows and capabilities, but they're often hard to deliver. Flink makes them not just achievable but approachable and affordable too. Flink is really so powerful for a few reasons. It's one unified engine that supports processing in both -- continuous streaming mode and supports processing in batch. It can support really complicated transformations and applications by seamlessly maintaining state. And it can operate at really high volume with low latency and low cost. And what that means is that developers can work with, shape and enrich their data as it is created at the source. And that reduces the need for duplicative downstream processing later on. And it drives up the opportunity for reuse of that data across the operational and analytical estates. Put really simply: Things that were previously only possible in the analytical estate are now possible in the operational estate and in streaming, and the result is not just more valuable. It's actually less expensive too. It's why Marriott has Flink on their key upcoming initiatives list. And it's why Flink is a cornerstone of our product innovation strategy, but let's go back to our example and see it in action. [ Mark, River ] has their data streaming. Can you show us how they can make it contextualized and trustworthy for the AI use cases?

Unknown Executive

executive
#12

Absolutely, Shaun. All right. So to level set: [ River ] wants to promote and engage customers and promote highly rated hotels that aren't necessarily selling well. To do that, we're going to send personalized offers to customers who have been browsing hotels but didn't actually complete a booking within 20 minutes. Now to make this happen, we're going to need to enrich our bookings, hotels and clickstream data in a bunch of different ways. And this is where Confluent Cloud for Apache Flink comes into play. This is the easiest way to get started with stream processing, and so let's check it out. Go ahead and queue that demo, please. [Presentation]

Unknown Executive

executive
#13

So here I'm starting in Data Portal. This is our one-stop shop for making all of our data products easily discoverable. Now we can see a bunch of useful information like who owns this data product, which for some reason tends to be hard to track down. You can see an example of the schema here, but like I mentioned, we want to use Flink to do some stream processing, so let's go and get a sample of this clickstream data to see what we're working with. This should give us everything that we need to understand the actions that a user was taking on the web page. You can see in the actions column there we have views and clicks and booking clicks. Let's take a look at what our bookings data looks like. This should give us all the information we need in terms of the actual bookings the customer was interested in, like the check-in, check-out information, the actual hotel ID, their e-mail. And so like I mentioned, we want to find and target customers who were interested in hotels but didn't book within 20 minutes. We're going to start by getting all of the customer interactions in these 20-minute intervals. And then we're going to filter out all of the actions, except for the ones where the user was actually clicking on a bookings link. This is our signal that these users are potentially interested in booking a hotel. And then we're going to bring in some more information about the hotel. We're doing a join to bring in more metadata about the hotels themselves so that we can do a join one more time with the hotel information, this time finding similar hotels in the same city that the user was interested in and as well as similar amenities that we might be able to recommend to them. And if you notice, we're not thinking about infrastructure, provisioning, operations. We just simply run the query, and compute is being dynamically provisioned behind the scenes to accommodate this workload. And what we're left with here now is a data product that tells us who we should send personalized offers to and what hotels we can recommend based on the city and the amenities they offer. Now I think we can do a bit better here. Let's bring in reviews for these hotels for the last 30 days. Now just having a whole bunch of reviews for each hotel isn't super useful or contextual for our customers, so what we're going to do is use a new feature in Flink called native model inference, which allows us to run AI models locally within Confluent Cloud so that we can run these jobs against these models like this, like this Llama LLM model. And all the processing is running locally. The data doesn't have to leave our environment. And so what we're going to do with this model now is we're going to augment our data product and pass the reviews through this model, where it's going to summarize the reviews into a nice, short, concise summary that actually is useful. That's what that ML.PREDICT function is doing. It's invoking the model as we pass the reviews through it. And so if all goes to plan, what we should be left with here now is this new and rich data product that, again, tells us who we should send personalized offers to. It tells us which hotels we should recommend based on the city and the amenities that the user was potentially interested in. And now we have a concise review summary over the last 30 days so that we can make sure that the hotels we're actually recommending are highly rated so it's more contextual and relevant to the user. All right, back to the slides, please. So everything you just saw here is great and all. And we're going to reuse all of this for the next part as well, but I think we can take this to the next level. Let's augment the same workflow you just saw but this time with agentic AI. We're going to use agents to actually create and drive these personalized offers on our behalf. Now this is easy in Confluent because these agents are just running in the background as stateful microservices. And microservices are kind of our thing, as you heard Jay talk about. And we've got 3 different agents that are going to be running in the background, that are going to have different tasks to perform. One is a customer insights agent that is responsible for generating a report or a profile based on the activities and interactions the customer is doing. There is a hotel insights agent that creates a report based on things like the amenities and the reviews that we're passing through. And then finally, there's a content creation agent, whose job it -- is to take the summaries that the customer insight agent -- the hotel insight agent create and actually create the personalized offer that gets sent to the user. So let's go ahead and see this in action. So here we are back again with model inference. This time, we're actually going out to OpenAI and using a GPT-4o model. We're going to use this prompt and model as what we call our agent orchestrator, so it's going to take a bunch of inputs. So this is the job we're going to start running here. It's going to take the inputs from the interactions. And then it's going to get passed through this prompt through this GPT-4o model, and it's going to classify and route which agent should handle the specific input. And then we're just starting the data feed here. And so like I mentioned, we have a bunch of these microservices running in the background. What's happening now is data streaming from our data product, going through that agent orchestrator. It's going through that GPT-4o model, getting classified, getting sent to the hotel insights agent, the customer insights agent. And then in just a moment here, yes, that data was passed to the content creation agent. And we're left with actually a bunch of interesting information, but most relevant is, at the bottom there, we have that very personalized offer we can recommend to the customer. And in this case, it also tells us why it's making that recommendation. And it's actually -- looks like it's oddly specific, where it says the guest frequently uses wellness facilities and has shown interest in hypoallergenic accommodations, which is oddly specific, but there you have it. Back to the slides, please. So think about what you just saw here. This isn't just AI for the sake of having AI. This is real-time AI working with itself to drive real engagement by dynamically adapting to each of these customers' preferences. And all of this is made possible because we reuse the same quality data products to power the agents to run this campaign on our behalf. Shaun, back to you.

Shaun Clowes

executive
#14

Thank you, [ Mark ]. It is amazing how easy that looks, amazing, amazing, amazing, honestly. You just saw [ Mark ] power 2 great offers for [ River hotels ] using AI. And he did it in like 3 minutes, unbelievable, using trustworthy, reliable data products and streams that he already had available and he could act on immediately. And it's a great example of how Confluent directly bridges streaming and AI. Now a major part of that capability that you just saw [ Mark ] demonstrate was about Flink and new capabilities that are coming to Flink, so I wanted to take a moment to recap some of the recent innovation in Flink and some things that are coming today right now. So firstly, late last year, we announced the open preview of the Table API for Flink. Now the Table API lets developers use all of the power of the Flink engine, but they can do it from Java and Python, some of their favorite languages, in addition to the Flink SQL that you saw [ Mark ] show a fair bit of just then. But we want to keep it powerful for developers, so we're also introducing user-defined functions. Now user-defined functions let developers define complex processing or transformations in code. And then they can leverage that user-defined function directly from their Flink SQL, so you take powerful processing and put it back into the simple low-code configuration that [ Mark ] just showed to you. Now the UDF is actually already GA on AWS. And they're going to Azure in the next couple of weeks. We also announced flexible schema management in September last year. Now flexible schema management lets Flink work on all of your streaming data in Kafka whether or not that data was originally serialized with the schema or not. Now that means that you can use Flink on top of Avro, Protobuf or JSON data. And there's no fiddly knobs or configuration. You literally just declare the schema, and off you go. It just works. And we also know that many of you were waiting to try Flink until we had private networking for your preferred CSP, so we just wanted to recap and share that private networking is now GA on both AWS and Azure, but that's not all we have for you in Flink. You just saw [ Mark ] demonstrate some capabilities that are bringing Flink directly to your AI use cases. With model inference, we're enabling you to directly use models as first-class citizens in Flink. You saw [ Mark ] just literally using ML.PREDICT function. You declare the model and you can directly use it. You can use your own proprietary models. You can use open-source models. Or you can use models from major providers like OpenAI, Azure OpenAI, AWS Bedrock or Google Vertex. And finally, with federated search, we're making it easy for you to bring together context from across your business so you can access information from major applications like Databricks or Snowflake and bring it together to fill up the LLM context window with all of your best data, produce the best possible outcomes and reduce errors or hallucinations. Now we want to bring the power of Flink everywhere you need it. And that's why we're bringing it on prem with Confluent Platform for Apache Flink. Now this actually went GA in December last year as part of Confluent Platform 7.8. It brings our years of experience of running Flink at scale in the enterprise to all of our on-prem customers. And we're enhancing Flink with new capabilities to make it easier to manage in the enterprise. For example, we also introduced application life cycle management to manage applications in large enterprise deployments. Now in the demo, you also saw our stream governance capabilities. Now those allow organizations to manage, govern, trust and reuse their data at scale. Governance is the foundation of data reuse and agility in organizations everywhere. Data producers share high-quality data. And data consumers can find, access and leverage data that they can trust. It means that application developers can build new rich customer experiences with off-the-shelf real-time data. Data scientists can train and infer ML and AI models from that data. And data analysts can uncover new insights or power new reports from that exact same data. And the data platform team can meet the needs of all of those constituencies with a set of rich data and capabilities that meet their needs. Now fundamentally, Jay said earlier that we're all becoming AI developers. If we're all going to be AI developers, it sure will be good for us to have a safe, easy way for us to discover and reuse trustworthy rich data. Now back to our demo. We just saw 2 great examples of operational business-facing AI use cases, but the world doesn't just stop there. What about what's going on in our analytical ecosystem? What about the reports, business decisions and AI workflows that are being powered from the analytical estate? How can we take all of this great work that we've done in the operational estate to deliver real-time reusable data? How can we transpose that over into the analytical estate? Well, Jay shared a little bit about that earlier and it is the news of the day. The good news is that you can now directly project all of the work you are doing and streaming in the operational estate into the analytical estate with 0 effort. How? Well, in the operational estate, streaming has become the de facto standard for the way data moves between applications. And that has unified the architecture of the operational estate. With Confluent, you can stream with Kafka. You can process with Flink. And you can govern to make your data trustworthy, reusable and contextualized, but as Jay shared, over in the analytical estate, the picture has been a lot more confusing and messy. In general, a lot of the data that arrives in the analytical estate is actually coming from Kafka in the first place because it's coming from the operational estate, but in the analytical estate, the data warehouses and data lakes have usually been tightly coupled to their tables. So the data warehouse thinks about the world in tables, and those tables can only be used in the engine that they were created with. So for example, traditionally, a table in Redshift could not then be reused in Trino or any other processing engine. Now that mismatch between streams and tables and the tight coupling between tables and their processing engines meant that, even when data was coming from Kafka, to get it into the data warehouse, you had to do a lot of work. You'd have to land the data from Kafka as a file. You'd often then want to reformat the file to a file format that's best for data warehouse, for example, Parquet. You'd have to apply a schema. You'd have to map types correctly. And once you have done that, you would then load it into the data warehouse or the data lake, but it's literally just the work to successfully load it. Once you had loaded it, you'd still need to do work to make it ready. So for example: You might do joining, aggregation, filtering to make the data rich and ready for whatever your use case was. That's actually a ton of cost and compute just to get the data ready for use. And in the process, you've actually lost the governance that you had and the real-time nature of the data when it was originally in its streaming form, but it doesn't have to be that way anymore. Over the past several years, we've seen the emergence of open table formats like Delta Lake and Iceberg. And those formats enable analytical tables to be separate from the engines that are working on top of the data. And that development unlocked our mission to make data connected and accessible across the operational and analytical estates. And it's why we built Tableflow. As Jay shared, Tableflow is an evolution of the Kafka storage engine. It means that the data in the Kafka stream can be directly accessed as an Iceberg or a Delta table. I want to be clear. The exact same real-time, reusable, reliable data that is already powering your operational estate through streams can now be directly and immediately accessed in all of your analytical tools, your BI tools, your data warehouses and your data lakes. This is one of the most powerful technologies we have ever launched. And I am incredibly excited to share that Tableflow is going GA today, right now, in AWS. If you'd like to check it out, please check out the QR code on the screen. All right, so what do we get from that? Well, the good news is no more clunky ELT or ETL, no more endless data prep to try and make data usable, just effortless data flow from the operational estate into the data warehouse and the data lake ecosystem beyond. You save time. You cut costs. And you can get back to the important work of innovating and spend less time wrangling your data. Now let's see all of this in action. Let's power some analytics and AI workflows in the analytical estate. And to talk us through that, please welcome Swaroop Oggu from Databricks.

Swaroop Oggu

attendee
#15

Awesome. Thanks, Shaun. Super excited to be here with you all. At Databricks, we constantly try to improve the experience of customers when ingesting real-time data into the analytics lake house for AI and analytics. With Confluent's partnership, we are trying to make this super seamless and much more easier for all of our customers.

Unknown Executive

executive
#16

That's right. And a big part of that journey is Tableflow, where in this demo you're about to see, we're going to take the data product that we were just working with and make it accessible into our lake, in this case, our friends over at Databricks. And then we're going to power -- empower their users to create web campaigns from reviews and from the customer information and the bookings information that you saw us working with earlier. Now this time, we're going to empower all kinds of users to work with this data, even those with minimal technical expertise. Let's go ahead and queue that demo. [Presentation]

Unknown Executive

executive
#17

All right, I -- by now, this should look like home to everybody. This is back in the Flink workspace. This time, we're augmenting that data -- or that hotel bookings data product with more information about the hotels themselves. This time, we're actually trying to make this data product a bit more human readable because our end users in this case are actual humans. And so now we're adding things like the country, the price, the actual hotel name instead of just the hotel ID. This is much more usable. Now as we mentioned, we want to make this data accessible into our lake; and this is where Tableflow comes in. Tableflow simply takes the stream and represents it as a table. It's really that simple. And to get started, you just hit enable Tableflow. You choose which format you'd like. In this case, we're going to be working in Delta. And then you tell us where you want us to store the data. It can be in your own storage, or we can manage it for you. In this case, I'm using my own S3 bucket. And that's it. We're just going to hit continue, and in just a second here, our stream is now instantly represented as a Delta Lake table in our lake. No more ETL work, no more type conversions, no more worrying about any of that. It's that simple. Now, I think, the more exciting part. Swaroop, do you want to show us what we can actually do with this data now that it's in Databricks, accessible in Databricks?

Swaroop Oggu

attendee
#18

Sure, Mark. Awesome. So what -- on the Databricks side, what you're looking at is called AI/BI Genie. AI/BI Genie is a conversational experience that lets business teams interact with data in the natural language. Let's try this by asking genie to show the hotel booking counts across all the countries. So what genie is doing as magic behind the scenes is breaking down that ask and generating a SQL query, understanding the context, hitting the right tables and also helping me to visualize it. So quickly, I can see that India is ranked fourth. Let's try another one. Let's try to see the booking revenue across all the cities in India, so again the same magic, genie is trying to think; break the ask into query; understand what's the context of the data, where the data is; and quickly help me visualize it in a bar graph. Here I can see that Hyderabad is busy. So we just went from having data ingested to having unlocked real-time insights. This is amazingly powerful. Let's extend it further. Let's use agents to unlock [ whatever just using ] the insights to take some actions. What you're seeing on the left is the Databricks notebooks, where we have 2 functions. The first function actually analyzes the sales data for a given city. And then the second function summarizes the best reviews for a given hotel and then tries to list the reasons. I have 2 functions that can generate 2 outcomes. Now what I want to do is do go with an intent. And the intent here is to have the lowest-performing hotel in Bengaluru to be promoted using an Instagram post. Now I would like to use open-source agent, take AI orchestration framework like LangChain; and bind these 2 functions, where you can see the agent is being mapped to the tools which are the functions that we just defined. And then right after that, if you try to invoke the agent, give it the context of me trying [ to impose ] a Instagram post, it is basically triggering the 2 functions that we just walked through. And right after that, we can see that agent is been -- successfully able to generate a nice Instagram post that can help me promote my lowest-performing hotel in Bangalore. That's going from ingestion to insights, to outcomes. So this is where we are super excited. Awesome. Back to the slides, please. So it is -- what you've just witnessed is just a glimpse of art of possible. What we want to do is unlock a lot more use cases with this partnership. And it's just not about bringing the data, moving data faster. It's about ensuring the AI is operating on the high-quality real-time data and consistently improving the accuracy of what -- the insights that we are unlocking. And what would that yield is basically a higher and -- higher-impact business outcomes. And what we feel or what we -- confident really going together to the market is to build this partnership and extend the deepened relationship across the -- all the product features between Confluent and Databricks; and also, starting with Unity Catalog and Tableflow, deepen the product partnership to bidirectionally integrate all the capabilities that Tableflow has with the Unity Catalog and vice versa. So to wrap it up, I would say real-time AI -- real-time data is what unlocks AI. And together, we're going to make it seamless. Back to you, Shaun.

Shaun Clowes

executive
#19

Thank you. All right, thank you, [ Mark ], and Swaroop. That was amazing. Honestly, in my career, I've had the opportunity to launch a bunch of really powerful technology, but I don't know how many people were counting there. It took 4 clicks to get the data usable in Databricks and then less than a minute to get more value out of that data. This is, in terms of clicks to power, one of the most powerful technologies I've ever had the opportunity to introduce. I'm extremely excited to see the opportunity ahead of us. Ultimately we're on a mission. We want to make your data accessible everywhere you need it, in the operational estate and the analytical estate. We want to bring it in all of its rich, reliable and real-time form to all of your analytical tools, your data warehouses, your data lakes, your BI tools and beyond, so I'd like to take a moment to acknowledge our partners on that mission. We have our commercial partners Databricks, AWS and Snowflake, with whom we're building deep catalog integrations so that all of your streams literally appear immediately as tables. You don't have to do anything. And we're excited to be launching Tableflow with a variety of ecosystem partners, for whom -- they're already invested in open table formats. And Tableflow represents the easiest possible way to land your data for use inside their tool sets. And that's it, folks. We've solved the problems that we set out at the start of this presentation section, but what we've really done is harness [ River hotels' ] data to unlock new opportunities, more engagement, better retention, more loyalty, more dollars on the table. That's a great customer experience, and it's actually just the beginning. We haven't just solved for this specific use case. Because we shifted governance and processing closer to the source, we've unleashed a network effect from our data. In the future, we're never going to have to start from scratch, building custom-code pipelines. We have ready-to-go, high-quality, reusable, reliable data that is ready to be put to work whenever and wherever we need it. And that goes beyond even into our analytical use cases too. Tableflow is converting that data to be directly accessible as Iceberg or Delta format and accessible through all of your BI tools. So imagine all of your business data, rich, reliable and real time. It's available to power your online operational applications and all of your analytics tool sets. That's the power of the data streaming platform. And it's how we go from a point-to-point world that's an absolute mess to a clean world of reusable, reliable, rich data that enables fast-moving use cases that can be delivered in days and weeks or even minutes like you saw in the demo rather than months and years. And with that, I'd like to say thank you very much for joining us here today. We have an incredible program of breakouts planned for you. Thank you. And have a great day.

This call discussed

For developers and AI pipelines

Programmatic access to Confluent, Inc. earnings transcripts and 32,000+ others is available through the EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments, full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.