Appen Limited (APX) Earnings Call Transcript & Summary

June 23, 2022

Australian Securities Exchange AU Information Technology IT Services investor_day 88 min

Earnings Call Speaker Segments

Mark Brayan

executive
#1

Hello, everyone, and welcome to Appen's 2022 Investor Technology Day. I'm Mark Brayan. I'm the CEO of Appen. It's a pleasure to be here joining you from San Francisco and to be able to share some of the insights into the AI market, our technology and how it benefits our customers and our business. Our agenda today starts with a brief introduction of the team. Before we get into the tech demos, we'll provide a recap of our strategy and discuss how data is used throughout the AI life cycle to build the products that all of us use every day. Our tech demos will showcase areas of our technology that align with each pillar of our strategy. For our first pillar, grow and diversify, we will explore knowledge graphs and our China platform. Our China platform enables our participation in the high-growth Chinese market and our knowledge graph product -- productizes our data structuring expertise. For the second pillar, automate, we will examine automated computer vision and audio labeling, both of which improved the speed and gross margin of data delivery. Our third pillar, expand, we will look at point of interest and synthetic data. These new products add to our offerings and expand our addressable market. And for our fourth pillar, evolve, we'll discuss how we are improving our internal operations and productivity of crowd management and data quality. Last year, we welcomed Sujatha Sagiraju and Mike Davie to the team. Sujatha, our Chief Product Officer, joined Appen in September 2021 after a long and distinguished career at Microsoft, where she led their machine learning operations products. She has deep AI experience and an extensive product background, which is essential for our product-led strategy. Mike Davie founded Quadrant. He and his team joined Appen following the acquisition of Quadrant last year. Many of you will know Wilson Pang and Ryan Kolln from previous Tech Days. Wilson, our Chief Technology Officer, has been with Appen since 2018 and has over 20 years' experience in software engineering and data science. Ryan, our VP of Strategy and Corporate Development, also joined us in 2018 and brings technology and strategy expertise to Appen. I'll now hand the session over to Ryan to take you through our strategy. Thanks, Ryan.

Ryan Kolln

executive
#2

Thanks, Mark. I'll kick off the session with a recap of Appen strategy. Appen is the leading provider of data for the AI life cycle. Now building great AI models is an iterative process. As a first step, you need to source high-quality training data. These are the examples that the AI models will learn from. When it comes to AI, the more data, the better. It's important to have sufficient volumes of data that is representative of the model that you're trying to build. Second, you need to prepare the data to be used for training AI. This includes the critical step of labeling the data. There are many different techniques and approaches used for labeling. However, the common factor is that accurate labels result in high-performing models. Third, the model is built using the label data. This is known as model training. Finally, it's important to test the performance of models in the real world. It's very common for model performance to degrade over time, something known as model drift. Ongoing model validation is key to understanding AI performance and knowing when to source and prepare new data. Appen operates in 3 of the 4 essential steps of the AI life cycle, including data sourcing, data preparation and real-world model evaluation. For data sourcing, we offer a variety of options. We have custom data collection services, leveraging our global crowd of over 1 million people. We have 3 label data sets that customers can buy directly, including valuable point-of-interest data from Quadrant. We also have a partnership with Mindtech that provides high-quality synthetic data. For data preparation, we have a leading annotation platform that supports all the key data modalities. For model training, we integrate with clients and partner with leading model management companies. For model evaluation, our global crowd provides substantial scale and breadth of validation capabilities. This is essential for model evaluation as it's important to test models with real-world uses. Technology underpins our strategy. It delivers efficiency, quality, reliability and scalability for the business. We have a suite of products that are used across the business to deliver the work for our customers. First, we have Appen Mobile. It's the interface for our crowd workers where they can come, sign on to Appen and review projects and complete some tasks like data collection. App Connect is our powerful matching engine that matches our crowd workers to projects. We do this at very large scale aimed between 50,000 and 100,000 contributors each month. The Appen Data Annotation Platform is our labeling platform. We acquired this capability through the acquisition of Figure Eight, and we continue to build on this to create a highly efficient and automated labeling platform. We also have our China platform. In China, we have built a crowd management and annotation platform that is highly tailored to specific market requirements. Wilson will talk more about this later in the presentation. Finally, through the acquisition of Quadrant, we have Geolancer and Hydra products, which are specifically tailored to point of interest and location data. Mike will talk about these later in the presentation also. We're trusted by leading global tech companies. This slide shows a subset of our customers. As you can see, we have an enviable customer base and are trusted by them to deliver data [ clearly ] to the highest standards of quality. While our key customers include the world's largest and most sophisticated participants in the AI sector, including Google, Amazon, Microsoft and Salesforce, we have many more customers across a wide variety of industries and verticals. The growth in AI is driving the need for training data, as you can see from the chart on the left-hand side. This chart from research firm [ Cognilytica ] is very important. It's some of the first comprehensive research on training data. We're often asked if techniques such as cell supervise lining will make training data redundant. Our view has always been that the need for training data will continue to grow alongside the emergence of new technologies, and this research confirms that. We spend a lot of time speaking with our customers about the market also and hear from them a consistent set of messages, in particular, that the combination of humans and technology is critical for delivering large volumes of high-quality training data. There are many factors that differentiate Appen from our competitors. We offer our customers a unique combination of leading annotation technology and a global crowd. The chart on the left-hand side plots annotation technology on the Y axis and workforce capacity on the -- sorry, on the X axis and workforce capacity on the Y axis. Appen is the only provider with leading capabilities across both dimensions. We've been operating for more than 25 years, delivering high-quality AI training data. And we're trusted by global AI leaders to support both in deployment and emerging AI-enabled products. We offer our customers a unique combination of technology and the global crowd. Our crowd supports large-scale data requirements with high-quality. And our business is organized into dedicated units focusing on the breadth of customers, including Global, which is our big tech customers; China; government; and enterprise. We have built and acquired incredibly valuable technology that enables Appen to take a product-led focus. This will help build scale and repeatable products and services. Technology is helping us to drive growth by targeting customer segments and building delivery solutions that support growth of specific segments. We're also combining our expertise and technology to automate our crowd and labeling services. This will help leverage AI and machine learning in our labeling operations to improve the productivity of our crowd. We have invested in new tech-enabled products to expand market opportunities available to us. This is helping Appen transform the way we do business by using technology to improve the scalability and productivity across the business. Before we get into the tech demos, I'm going to hand over now to Sujatha, who's going to take us through a real-world example of data for the AI life cycle.

Sujatha Sagiraju

executive
#3

Thank you, Ryan. I'm very excited to share with you how Appen enables customers to build their mission-critical AI applications. Prior to joining Appen, I worked at Microsoft for over 20 years, including in Bing and Azure Machine Learning Group. That experience taught me how businesses across different verticals are building AI applications at mega scale. I strongly believe that within the next 5 to 10 years, every single business application built will leverage AI. Otherwise, it will lose competitive advantage. When the opportunity to work at Appen came up, I jumped at it because I could see the critical role that Appen is playing in enabling the AI transformation. In order to build any AI application, companies spend majority of their time on data, and data is our superpower. We are the leading trusted partner for the data for the AI life cycle. At Appen, we work with several customers on a wide range of use cases. Today, I'm going to show you real customer examples that Appen enables that keep my family safe. I've a teenage son, who recently started driving, and as you can imagine, it gives me a [ heartburn ]. What I keep reminding him is focus on the road while driving. Please be careful when changing lanes, and don't look at the phone while driving. What gives me comfort are the AI features in my car and his cell phone. Now I'm going to walk you through how customers leverage Appen to build these AI applications. The first stage of the AI life cycle is data sourcing. My car has the distracted driver detection feature, and it makes a sound when the driver is not paying attention to the road. Appen does data sourcing for our customers who use this data to build distracted driver detection AI model that prevent potential collisions. Data preparation is the second stage of the AI life cycle. My car also has a lane assist so that my son is alerted when the car swerves into a different lane. Appen enables our customers to prepare and label the road seen data so that the customers can build their lane-assist AI models. Model evaluation is the fourth stage of the AI life cycle. My son's cell phone that has -- has the hands-free features so that he can use voice for activating it. Appen enables our customers by ensuring their voice activation models are performing well in the real world. The examples that I'm going to share with you today are real customer Appen projects. Data sourcing is the first step for the data scientists to build their distracted driver detection AI model. In order to build a model that performs well, the data scientists need a lot of diverse data that is representative of the real world. The image you see here is the data collection project example we did for a customer. We sourced a few hundred diverse participants and recorded their eye motion while they looked at different predetermined points in the car. The indicator here visualizes the eye movement in the camera. For this project, we had participants of different gender, race, age. We also had diversity in how they looked like with facial hair, without facial hair, with makeup and without makeup. Diversity in the data is very important. When my son is not focused on the road while driving, I need the AI model in my car to detect it regardless of whether he's wearing a cap or not or how long his hair is. Sourcing and managing a diverse crowd is a very complicated technical problem. We have a crowd of over 1 million contributors spanning more than 170 countries, and we have over 1,000 projects of different types running in parallel at any given time. Our cutting-edge cloud management platform leverages very sophisticated AI models to match the right contributor to each project in order to ensure high data quality and timely delivery for our customers. Wilson is going to talk about the crowd management platform later on in the session. I'm now going to switch to lane-assist AI application that alerts the driver when the car swerves into a different lane. To enable our customers to build this model, we first source different road scene data. The second step in the AI life cycle is data preparation. The data scientists need road scene data with the lanes labeled so that they can use it as input to train their models. Data modeling can be tedious and is not always straightforward. This simple road scene has different types of lane lines. It has solid lines, dash lines and lines with different color. It also has a road shoulder, which is not a lane. Then there's added complexity of road lane looking different depending on the country. If the data scientists were to label thousands of road scenes themselves, it can be a very time-consuming process. And in most companies, data science resources are pretty scarce and expensive. So it's not the best use of the data scientist's time to do the manual labeling work. Our customers use our platform to easily label data at scale. In this example, customers specify the location of the road scene data in our platform with instructions on how the lane lines need to be labeled. Our platform matches the lane annotation job with the right contributors who label the data for customer instructions. This image shows the annotations in our tool where the different lane lines are marked appropriately. This lane-assist AI feature in my car helped my son quite a bit when he was starting learning how to drive, especially in narrow lanes. In the demo section, I'll share some machine learning-powered capabilities in our platform that enables fast labeling of the data by improving crowd productivity. Now I'm going to switch to how Appen does model evaluation for our customers to ensure voice activation AI models and cell phones are performing well in real world. For voice activation to work properly, AI models must not only match keywords but should also understand the speaker intent. If my son sees that his car is running low on gas, he may use voice assistant in his cell phone to find out 2 things. First, where the nearest gas station is; and second, how much gas costs at the nearest gas station. Now both these queries have nearest gas station phrase in them, but the intent is very different. The first one is about directions, and the second is not. In order to evaluate real-world model performance, customers upload the data with different queries with instructions on how the contributor should rate the results. Our crowd management platform matches the evaluation job to the right set of contributors. The image I'm showing here is an example of a contributor experience where they rate whether the voice assistant AI model activated the right function correctly. In this example here, the AI model worked correctly for the first query but not the second one. The customers measure that model performance in real world based on contributor ratings and use the contributor ratings to retrain the model. The latest had [indiscernible] [ state of AI ] results show that over half the respondents retrain their model at least monthly. And over 90% of the respondents retrain them all at least quarterly. Model evaluation is the first step in the retraining process. Our children's generation expect voice models to work in any situation, even picking up the latest slang. It is important that the AI models perform at higher accuracy in the real world. As the world evolves, the model must evolve as well. One of the demos I will show you later on is our industry-leading quality controls in the platform, which enables customers to build their high-performing models. Let me summarize the key takeaways. Appen is a one-stop shop for data for the AI life cycle. We are the only company that supports all stages of the data for the AI life cycle for all data modalities. Combination of our platform, expertise and crowd make us the leading trusted partner for the data for the AI life cycle. We have a rich expertise of over 25 years that we are productizing, which enables us to execute at scale in a repeatable fashion. Our machine learning-powered capabilities improve the contributor efficiency, which improves our gross margin. This also enables us to deliver quality data faster to our customers, which is a competitive advantage. With that, I'll hand it over to Mark.

Mark Brayan

executive
#4

Thanks, Sujatha. We'll now step through each section of our strategy, provide a couple of examples of how our technology brings that strategy to life. With grow, we'll talk about our China platform. This gives us access to the vast market in China. And we'll also talk about knowledge graphs, which is a new product that productizes our data structuring expertise. With automate, we'll show you how we automate the computer vision and audio data-labeling processes. In expand, we'll showcase some new products, point-of-interest data from Quadrant and synthetic data from Mindtech. And then finally, to evolve, we'll show you how we're evolving our internal operations to improve our scalability and gross margins. First, we'll take a look at the grow pillar of our strategy, and I'll hand it over to Wilson to take you through the China platform.

Wilson Pang

executive
#5

Thank you, Mark. China is one of the biggest markets in the world -- AI market in the world. And there's a huge trinity [indiscernible] demand. China is also a market very different from the rest of the world. To win in the China market, we have built a separate China annotation platform. Appen started to invest in China a few years ago and had a phenomenal growth. Our quarterly revenue has grown more than 20x from $400,000 in Q1 2020 to almost USD 10 million in Q4 2021. Behind this phenomenal growth, our China platform is a key enabler, but why didn't we just reduce our Global platform? Here are a few major motivations. First, the requirements from the China customers can be very different from the rest of the world. We need a platform tailored to their specific needs. Second, China market is very dynamic and demanding. A dedicated team and a platform is needed to provide rapid response. Data protection is key in our business. We need to make sure all China customers' data remains in China. Meanwhile, data from global customers are not accessible from China. Finally, air-gapped platform can protect the IT for our customers. Given all those considerations, we built a China platform on a separate tech stack with dedicated engineering team. Let's look at the platform. We have leveraged our global knowledge and operational expertise to build the China platform. It is a mini version of our Global platform but tailored for the China market. The China platform has rich set of annotation tools supporting different data modalities, including image, radio, audio, text, content relevance. It's also worth calling out that Appen is a top autonomous driving [ training ] data provider in China. And our larger tool, our computer vision tools are the key differentiators. Our China platform also has a lot of AI pre-labeling capabilities. It supports object detection, object tracking for computer vision. It can do auto labeling for audio transcription and many other capabilities. The platform also has functions to support product management as well as resource management, including crowd management and also a pretty powerful workflow engine underneath to stitch all these different operations together in a very flexible way. This is also the area where we see quite different requirement from China. It is very common to say that crowd workers, BPO workers and internal workers are working at the same product in China, while those rarely happen in our Global market. Moreover, customers themselves also want to get involved in the product delivery and doing quality review, quality checks. And we help build this tailor solution into this China platform. How does the China platform interact with our Global platform? The short answer is there are almost no interactions except for one area. We allow our Global crowd to opt-in and work on the China platform without sharing their personal information. They can use a single ID across China platform and Appen Connect. Their personal information remains in Appen Connect, and all their payments are handled by Appen Connect. This gives our China business another advantage. They can leverage crowd with different language skills and culture background, while it's super hard for other local players to do something similar. Other than that, the China platform is fully air-gapped. Let's have a recap. Tech has been super important for our growth in China. We have invested a separate platform tailored for the China market need and then [indiscernible] that to grow in China and increase our total addressable market. Having an air-gapped platform, product, strong IT and data protection, and our customers, they really value our approach. I will now hand over to Ryan to talk about the knowledge graph. Ryan, over to you.

Ryan Kolln

executive
#6

Thanks, Wilson. I'm going to spend some time now talking about a pretty exciting part of the AI industry called knowledge graphs and the tool that we've built, which productizes a lot of the expertise that we have from our linguistic heritage into something that creates a huge amount of value for our customers. Knowledge graphs are a way to map relationships between data. It's similar to a database. However, instead of being arranged into rows and columns, the structure is based on the relationship between different data points. On the left-hand side is an example of a very simple knowledge graph. At the top of the circle called a node, and in this case, Elon Musk as a person. There are other nodes below that are the attributes related to Elon Musk. In this case, there's a city and 2 companies. The line between the nodes is the relationship. So in the first case, Elon Musk is born in Pretoria and so on. Knowledge graphs underpin the AI used by technology companies that we use every day. Another example you can think about is LinkedIn. And if you have a connection with someone, how all the connections are arranged together, including the common connections based on company's workforce, schools attended, et cetera. Appen has a long history in providing linguistic services to structured data, particularly in the field of ontology design. We've been doing this for many, many years for our leading customers. And what we're excited about is we built a product called Appen Ontology Studio, which combined our linguistic expertise with a no-code interface for customers to design and annotate knowledge graphs in a highly intuitive way. We'll now show a short demo of the Appen Ontology Studio product. We'll start with a simple knowledge graph for a retailer that includes some product attributes. Adding an editing node is simple, thanks to our no-code interface. In this case, we're adding a dress as a subtype of clothing and high heels as a subtype of shoes. We'll also add cocktail party as a type of event. This is the process of designing the ontology. Now we have the structure. We can begin to annotate products. On the right-hand side, there's an image, and we'll draw a bounding box around the dress. We enter in what the product ID is, and now what we'll do is assign the attributes from the knowledge graph to that product ID. We'll simply bring in the knowledge graph that we had designed later -- designed earlier rather and choose the attributes that we want to assign to the cocktail dress. So in this case, we've imported the knowledge graph. And very simply with the drag-and-drop interface, again, no code, we can allocate the attributes from the knowledge graph to the dress and also add other things like color as an example. So in this case, we've allocated product 125 is a dress, and the occasion that you may wear this dress is a cocktail party. And if you start to think about search terms, this becomes really important to have those relationships mapped. We'll also, in this case, enter a new product called 316, which is the shoes, allocate them as high heels and then create a relationship between the 2 products. So in this case, the shoes are to be worn with the dress and vice versa. So this is a really simple example, and much of the work we do in knowledge graph is highly complex. Knowledge graph is useful for structuring data. It's also useful for defining relationships between real-world environments. In the example here, on the left-hand side, you can see that there's some machinery, and each of the entities of machinery is labeled. There are different pipes and other equipment. There's also a fault that's been labeled on pipe 27. Because the knowledge graph understands the relationship between pipe 27 and pipe 189 that they're adjacent to each other, shown by that first oval, when you see a fault on pipe 27, it creates an automated check to preventative maintenance purposes to go and check pipe 189. So another example, and this is actually from a real-world example that we are supporting the customer in. So the key takeaways for Appen Ontology Studio is that we're productizing a core component of our linguistic expertise. This creates more stickiness with our customers and enables us to scale our expertise and support a much wider set of customers. We're democratizing technology that's typically only available to leading technology companies, which grows our addressable market, particularly in the enterprise sector. The product also improves the productivity of our crowd when doing knowledge graph annotation projects through the no-code interface in simple intuitive nature of the work. We're super excited by this leading-edge product. I'll now hand back to Mark.

Mark Brayan

executive
#7

Thanks, Ryan, and thanks, Wilson. I hope these examples help you see how we're using our technology to support the growth pillar of our strategy. Next is our automate pillar and some super exciting stuff. This is where we use machine learning to pre-label data and streamline the data labeling process. We'll show you 2 examples: one for computer vision and one for audio data. I'll now hand it over to Sujatha to talk about computer vision annotation.

Sujatha Sagiraju

executive
#8

Customers tell us that they need a lot of data labeled fast. To meet the customer demand for speed, we augment human labeling with automation. In order to accelerate the labeling process for image and video, our platform has several machine learning-powered automation capabilities. We have 3 types of labeling automation. The first automation is speed labeling. This is where our AI models assist the contributors by automating the [ slow task ]. This significantly improves the labeling time. The second one is pre-labeling. This is where our AI models perform an initial best guess. The pre-labels are then sent back to the contributors, who check and correct the pre-labels as needed. This capability also significantly improves the labeling time. The third automation is smart validation, where our AI models verify the contributor work before it is submitted. The contributor is notified of any errors that they can fix. This improves the data quality for our customers and overall contributor performance.

Unknown Attendee

attendee
#9

In this demo, I'm going to first show you how a contributor would label a road scene manually and then demo the same image utilizing the machine learning capabilities in Appen's platform to work more efficiently and decrease labeling time. This shows how a contributor would manually label the scene that includes things like the road, vegetation and the sky. They would take a thick paintbrush to annotate large areas quickly, and then they would take a thinner paintbrush to label areas that require more detail. As you can see, it's a very tedious process. It is also pretty error prone because even in this example, the annotator will need to go back and correct the lane lines that were mislabeled. Now I'm going to show you the automation that we have that speeds up this labeling process. The first machine learning capability is called speed labeling. This works by grouping together like pixels in a scene and is able to have a more detailed hand around the edges of objects. For example, it works around the edges of the trees and even that lane line that was mislabeled in the first demo. It is able to annotate the details in the scene with a single tool without the annotators having to adjust any setting. Our test showed that there is at least a 25% improvement in crowd productivity with this tool. Next is our pre-labeling feature. To use this, I uploaded this road scene image into the Appen platform. In the workflow, I can easily select a model to use from dozens of pretrained models in Appen's catalog. Here, I'm selecting a model that will pre-label the pixels in my street scene image. Then I can configure this model for my scenario by adding things like the car and road categories in my pre-label. Once that is done, the model will run on the data that I had uploaded. We can see that the job I have for the crowd now has the pre-labels ready for quality checking. Lastly, we have a smart validation capability that contributors can use for verifying their annotations before submitting. This ensures a reduction in rework. For example, in this image, when the contributor validates the annotation, it highlights all the pixels in the scene which have not been labeled. The contributors can then fix the missing annotations before submitting their work. In summary, it's a combination of our speed labeling and our pre-labeling, which pulls from dozens of pretrained models. And lastly, our smart validators, which all work together to enable our contributors to produce high-quality annotations in a fraction of the time.

Sujatha Sagiraju

executive
#10

In summary, automation improves the productivity of our crowd, speed of delivery and improves data quality. This improves our gross margin and creates a competitive advantage for Appen. With that, I'm going to hand it to Wilson to talk about audio automation.

Wilson Pang

executive
#11

Thank you, Sujatha. Voice recognition has become very mature, and we see significantly increased adoption in different domains with different use cases. To build domain-specific voice-recognition models, we need a large volume of use case specific training data. Over the years, we have evolved our audio automation capabilities to support the special [ asset ] domains. Let's look at the customer service AI example. Our clients want to build AI to provide a customer service for drive-through scenarios. And the voice recognition is a key component. However, voice recognition in drive-through scenarios can be very challenging. Let's listen to a real example. [Presentation]

Wilson Pang

executive
#12

Clearly, the audio in the drive-through environment is very different from audio when people are talking to Alexa or Google Home. There's a lot of background noise, voice from multiple person, sometimes even including a crying baby. In this particular example, it also has a lot of food-related risks. It's not surprising to see generic voice recognition models don't work well here. And our customers, they need to build a drive-through specific voice recognition model. Let's take a look at how Appen is providing training data for our customers through this particular use case. We first load all those use case specific data into our annotation platform, then we leverage our own in-house models to transcribe those audio first. Then our crowd members review those preliminary results from our model and make adjustments if needed. With this approach, we can provide large volume and high-quality training data quickly, and our customers can use them to build a great customer service AI in this challenging driving through environment. There are 2 things really worth calling out. First, pre-labeling change the crowd task. Now it becomes a QA function. But the crowd is still needed, and we cannot automate this job 100%. Second, generic voice recognition model doesn't work even for pre-labeling. This is the reason why we need to build our own in-house models to support this use case. Let's see why. The left chart shows the accuracy of different ASR models on top of this drive-through audio data. The gray bar represents all those accuracy from those leading public models, and the red bar represents the accuracy from our in-house models. Clearly, all those mainstream ASR models can only provide less than 50% accuracy. And our models provide 82% accuracy because they are trained on this particular data set. The right chart shows the correlation between the work productivity and the model accuracy. The workers need to work on the pre-labeled results generated from the models. So the model accuracy matters. The Y axis is the worker productivity measured by how many segments transcribed per hour, and X axis is accuracy from all those pre-labeling ASR models. Using our in-house models customized for this particular use case, our workers can be 3x more productive than leveraging some generic ASR models. So what are the key takeaways for our audio automation? Pre-labeling significantly improves the crowd productivity. That leads to higher gross margin. Our in-house audio automation models help us to grow in these different sectors like a customer service, like a drive-through. And also adding all this automation capability to our platform, it enrich our ADAP features that increase the stickiness of the current customers, which lead to greater revenue. With that, Mark, back to you.

Mark Brayan

executive
#13

Thanks, Wilson, and thanks, Sujatha. I hope those examples show you how automating the work that our crowd does makes them more efficient and makes our business more efficient and more productive for our customers. The computer vision example that Sujatha showed you gives up to 2x the productivity. And productivity was up 3x in the audio example that Wilson showed you. We'll now move on to expand, which is how we've added to our product set to increase the offerings we have for our current customers as well as expand the addressable markets that we can go after. I'll now hand it over to Mike, who'll take you through -- who will take you through our point-of-interest data solution, Geolancer.

Mike Davie

executive
#14

Thanks, Mark. It's a pleasure to be here, everyone. I'm Mike Davie, the founder of Quadrant, for the last 20 years of my life commercializing new data technologies. And I'm here today to talk about POI data. Well, they say the only constant in this world is change. And man, has the world of POI data changed the last few years. COVID has wreaked havoc across mapping because millions upon millions of locations now have gone out of business or changed. And that's where Geolancer comes in. But the world of mapping has evolved, and you just can't go scraping on the Internet to get the data you need. You need to get out. You need to get in the field and see what's going on. Behind the camera is a restaurant that is not open anymore. And last night, the restaurant behind me here closed much earlier than the hours that were specified in the local mapping applications. So you need to get in the field to see what is going on. And also what we're seeing with our clients is that the world of digital is now being expected in the physical world. So much richer experiences are needed. And so that's where Geolancer comes in. So let's have a look at it. Geolancer is an application where our crowd, which we call Geolancers, go into physical world and collect data. Our standard data collecting is a mass market scalable solution. These basic attributes are needed globally by all major mapping, logistics, transportation and geospatial analytic firms. For clients who require more, our application seamlessly allows us to collect additional customizable data for their project. This allows us to capture rich data suitable for many industries and use cases, where [ data ] in the field is needed. Our process involves multiple layers to ensure quality, where the first Geolancer will go collect the data and a second will need to verify it. Third, our in-house verification enables us to ensure high-quality work from the crowd. Strict quality control measures ensure this quality as well as its use as a training tool for the Geolancers to understand how to improve going forward. With all field data, privacy is always a concern. So our proprietary algorithms blur things, including faces and license plate to ensure compliance and removal of PII. Our product is built to be future-proof as we see mapping companies globally moving towards AR-enhanced [ road ], reference points and unit POI landmarks are needed. And Geolancer is designed to support these use cases. And so what does Geolancer and the acquisition of Quadrant mean for Appen? Well, first, digital mapping and data collection is a $4.6 billion market. That's growing 14% year-over-year. So this could have a great impact on our top line. Second is that with the nature of everything changing around us all the time that this is a data set that always has to be updated, which means recurring revenue streams. Lastly, with Appen's 1 million-plus crowd globally allows us to enter new markets quickly with Geolancer. And with that, I hand it over to Ryan.

Ryan Kolln

executive
#15

Thanks, Mike. Another area that we've added capability to our customers is in synthetic data. We do a lot of work in real-world data collection. However, there are cases where this can be difficult. Edge cases are one example. These are the things which occur very rarely. We think about autonomous driving, if there's a car driving down the wrong side of the road, it doesn't occur very frequently, but it's super important to know when it does and how to respond. Data privacy can be another issue. For example, some AI that requires images or faces can create challenges due to PII limitations. Data bias can also be present in training data. For example, data sets may not have sufficient skin tone representation to operate effectively in the real world. External conditions play a big factor in the performance of outdoor AI systems. There's a wide array of weather conditions that need to be taken into consideration when creating AI training data. As an example, back to autonomous driving, there's a big difference between a snowy condition, rain, sunshine as an example. And finally, when our customers are building brand-new products that are highly secretive and sensitive, they may not want people going out into the real world testing their products. These are some of the challenges that we face with real-world data collection, and synthetic data creates a way to solve some of these. The synthetic data in the computer vision context is a computer-generated photorealistic images based on 3D environments. Because the data is created in a simulated environment, it's automatically labeled with 100% accuracy. On the right-hand side, in the top image, you can see a 3D environment for an in-home, and in the bottom end is the area of the scene in green that are used for the data labeling. The real value of synthetic data is the combination of real-world -- combination with real-world data. It's not a complete replacement for one -- one-for-one replacement for human annotated data. The value is bringing the combination of human data and synthetic data together. We're super excited to be a partner and investor with Mindtech, one of the leading synthetic data companies. Combined, we have a unique competitive advantage of being able to bring together real-world data with training data. I'll now play a short video to show some of the capabilities of the Mindtech platform. So Mindtech has built software to develop high-quality computer vision training data. Customers can create and edit scenes using a wide array of variables and attributes, including camera locations that are used to record the images and create the data. The platform also contains simulation capabilities that enable customers to create real-world interactions. In this example, a crowd of people walking around a scene. It also supports a wide variety of body types, time of day and different weather conditions. The platform is super important for customers to create edge cases of data across a wide variety of applications and markets. The human-centric focus is really important to support metaverse-related applications or anything related to AR and VR. Mindtech have also built in some sophisticated tools to view the composition of the data that's being created and allow editing and sorting of that data. So the key takeaways here, there are some limitations to real-world data, and that can impact AI performance for edge case scenarios in particular. Synthetic data plays a role to improving model performance, and this, in turn, increases market growth as we expand the breadth of use cases. AI models benefit from the combination of synthetic and real-world data, but human data is going to be an ongoing need. So again, synthetic data is not a one-for-one replacement of human-annotated data but rather a complement. Finally, we're super excited about the combination of Appen and Mindtech to form an industry-leading partnership. It's a unique competitive advantage for us and something that we're bringing to our customers. I'll now hand back to Mark.

Mark Brayan

executive
#16

Thanks, Ryan, and thanks, Mike, for highlighting some pretty cool products that give us more things to sell to our existing customers as well as expanding the markets that we can address. We'll now move on to the final and fourth pillar of our growth strategy, which is evolve. And that's about evolving our internal operations for both scalability and margin expansion. There are 2 examples here: the first being how we match crowd workers or our contributors to the work we do at scale, and the second is how we ensure high levels of data quality very efficiently for our customers. First of all, it's over to Wilson to talk about crowd management.

Wilson Pang

executive
#17

Thank you, Mark. We have over 1 million crowd members from more than 170 countries. They're our foundation to deliver customer projects as well as our competitive advantage over the pure software players. At the same time, crowd management is also complicated. It can take a lot of time and energy from our internal teams. Over the years, we have built most AR capabilities to help us to manage the crowd. So what is the problem we want AI to solve when managing our crowd? Matching the right contributor with the right projects. On the one side, we have over 1 million crowd members from different countries speaking different languages, having different skill sets and also different preference on projects. On the other side, we have more than 1,000 live projects at any given time. Those projects have different data types, different skill set need, different quality measurement and different rates. Every month, we pay anywhere from 50,000 to 100,000 workers. How to match those contributors with project is a very complex task, and we do it at large, at large scale every day by leveraging AI. AI starts with data. We use 3 categories of data to train our project worker matching AI models. Worker profile information. It contains all the user speaking languages, country for residents, demographic information, e-mail, user name, all those useful information. Those are key to our matching model. We also leverage contributor activity data, including all the user behavior data, how they log in, what their web browsing history, which link they click, how long they have spent the time on one particular page. All those are useful information, too. The third category is project history, what kind of project the worker has worked, what their quality, how many hours they spend every week, what are efficiency? So we combine all those data, the data from those 3 categories and then build the worker matching AI so that it can decide which workers to choose or which worker not to choose for a particular project. Let's look at some examples of how we're using user login activity. User login is very simple activity, but there's many signals we can extract from it. We know the country the user logged in from, the IP address of the computer when the user logged in. We also know the time of the log in and how many times this user logged in, in a certain time period. Here are 2 examples. Worker B here like you can see this worker, he logged in -- he claims he's from U.S., but he logged in from Brazil IP. And also he made 300 logins from 3 a.m. to 5 a.m. And this -- like very suspicious, right? This can either to be a bot or this is a bad actor there, while user A is all normal. So you can see clearly, there's a lot of signals we can use this to help us to detect the bad actors. Now let's look at how we use project history data. There are 2 workers. Worker A has worked on autonomous vehicle image labeling project. He works 20 hours a week. His quality is pretty good. It's 92. And also, he can deliver 50 images every hour, pretty good worker there. Worker B, he worked on some retail image labeling project. He worked 40 hours a week, a hard worker, but the quality is okay, only 85%. So we also see information whenever we have a new project, such as autonomous vehicle labeling projects, the AI model will know worker A is a choice, but not worker B. Putting everything together, and let's see how we train our AI model and constantly improve it. First, we capture all the signals from retail profile, contributor activity as well as all their project history. And we get all this data into a crowd DIA database. This database has more than 200 signals, and those signals got [ updates ] in real time. Let me use the crowd DIA database to train our AI models. After the model is trained and deployed, it starts to generate this worker validation and project worker match results. And we saw those results into a match DB. The match database is refreshed every hour. Then the results of the match DB is used in fraud detection and work selection for project, all project recommendation for workers. Meanwhile, for the prediction with lower confidence from our AI model, we also have our specialists in the loop to review and crack this result. This is fairly important when detecting bad actors, where we don't want to have any false positives. Those crack in the results, there will also be feedback to this crowd DIA database. And the model will be retrained regularly. Just like our clients, we need to constantly retrain our AI models, and a human in the loop is key to the model of success. Here are the key takeaways for our crowd management automation. Crowd management AI helps us to select workers and onboard them quickly. That helps to improve our project delivery speed, which lead to greater revenue. It also saves time from our internal teams. It helps them to scale to support more projects without increasing the team size. Finally, it helps our workers to find better projects, which leads to high paid workers and also higher retention. With all that, I will now hand over to Sujatha to talk about how we evolve our platform quality features.

Sujatha Sagiraju

executive
#18

Thank you, Wilson. Andrew Ng is an adjunct professor at the Stanford University. He was the co-founder and the head of Google Brain and was the former Chief Scientist at Baidu. He's one of the world's most famous and influential data scientists. This quote from him highlights the importance of data quality in building AI models. Controlling data quality is very critical at every single step of the AI life cycle. Otherwise, it becomes a garbage in, garbage out type of scenario. If the training data is not of high quality, the model predictions will not be accurate. At Appen, we have been delivering high-quality training data to our customers for over 25 years. We have now taken that rich expertise and productized it as a product offering so that our customers can use easily the quality controls in a self-serve fashion. While customers realize the importance of quality data, it's very challenging to prepare high-quality training data to train AI models with. I'm going to show you an example to illustrate why this is challenging and then demo a self-serve quality control we have in the platform. The example I'm sharing here was inspired from a customer project we did for a social media platform company who wanted to evaluate whether the search engine were showing relevant images. The search term here is mixer, and the contributors must evaluate the image results. For search relevance, it is extremely important to have multiple evaluations for every result. Just because as humans, we all have different backgrounds, context and biases. And that plays a role in how contributors rate the results. At the same time, it is equally important to have just the right number of contributors to keep the cost under control while maintaining high quality. In this example here, all the contributors rated the first image of kitchen mixer correctly. Only some contributors rated the social mixer, handheld mixer and drink mixer correctly. Two contributors incorrectly rated the beaker used in chemistry labs as a mixer. You'll also notice that the first contributor has rated every single result as correct. And maybe they're not paying attention correctly. For managed services projects, we have internal tools and processes that our quality managers use. For example, we use test questions to identify low-quality contributors. We have tools to audit contributors, to check the project status and to view estimated time of completion and also to monitor the contributor efficiency. We have productized these tools and processes into our platform that enables our customers to get high-quality data in an easy self-serve fashion. I'm now going to show you a demo of this industry-leading platform capability.

Unknown Attendee

attendee
#19

Customers used to reach their data preparation goals without any coding or complex technical work. This is a robust and flexible editor, enabling any customers to easily set up, build and customize their project workflow as well as control their QA policies and practices. The project editor provides customers self-serve flexibility to have absolute control over who they wish to annotate their data and who they wish to check those annotations. They can choose to use their own internal contributors or Appen's world-class contributors. And all of this can be managed by our customers themselves without writing any code. Every Appen project comes with a suite of powerful built-in quality mechanisms that help ensure our customers acquire data of the highest quality. The Appen platform provides test questions, which enable customers to screen their contributors for readiness and continuously monitor their contributors' performance throughout an entire project life cycle. While the project is running, customers can now view real-time overall progress and quality performance metrics on a new project dashboard. These metrics can be broken down by contributor and by the job that they are performing for the project. The project dashboard is designed specifically to help equip customers with actionable insights so that they can take the right action at the right time, they can understand where their data is at in the project execution life cycle, how contributor accuracy is performing relative to the high-quality goals and when that project is estimated to complete. Another new quality management mechanism, quality audit, is very useful at the end of the project, enabling customers to perform a quality audit across all annotations in order to identify potential buys, unexpected trends and even isolated anomalies. In the detailed view, customers can see the aggregated responses of the contributors. They can either mark the responses as correct, incorrect or make corrections as they see fit. Customers tell us that this is so useful that we are looking to add a similar spot check report that can be easily accessed at any time. Let's review all this easy to use, no-code quality mechanisms: project editor, test questions, project dashboard, fine audit and new spot check reports. It's the combination of all of this that allows the Appen platform to provide the easy-to-use and powerful quality control, measurement and visualization capabilities that ensure customers achieve the needed accuracy during project execution.

Sujatha Sagiraju

executive
#20

Let me summarize the key takeaways. High quality of training data is absolutely critical for our customers to build models that perform well in the real world. We have been delivering high-quality training data for over 25 years. We have productized our expertise and built it into our platform so that our customers can use it easily in a self-serve way. They're industry-leading capabilities that give us a competitive advantage. With that, I'll hand it back to Mark.

Mark Brayan

executive
#21

Thanks, Sujatha. Thanks, Ryan. Thanks, Wilson. Thanks, Mike. To close, I'd like to reiterate our growth strategy and the critical role that our technology plays in delivering our strategy as well as the benefits to our customers and the value to our company. We are the market leader. We generate more revenue than our competitors. We have clear competitive advantages in the size and diversity of our crowd, the breadth and capability of our technology and the unrivaled experience and skill of our people. Our customers, which include the largest technology companies in the world, both here in the U.S. and in other countries, including China, have trusted us for over 25 years to deliver their critical training data assets. Our products underpin our growth strategy. Humans will continue to play an important role in the data collection, annotation and model testing phases of the AI life cycle. But our products will play an increasing role in streamlining and automating these phases to deliver more high-quality data to our customers faster than ever before. And at the same time, growing our revenue and expanding our margins. Today, we showed examples of how our products support our strategy and deliver value to our customers and our company. Our China platform gives us access to the high-growth Chinese market, and our Ontology Studio product productizes our deep data knowledge to increase revenue and margin for the growing need for knowledge bases. Our machine learning models are helping us create higher volumes of high-quality computer vision and audio data much faster than ever before. This makes us more competitive and will generate more revenue and higher margins in time. We also showed how we're expanding our product set and within our addressable market and revenue potential. Quadrant gives us real-world point-of-interest data that's essential for any product that relies on location. And our partnership with Mindtech provides synthetic data to help us address more use cases. Finally, we're evolving our internal operations with our products to improve margins and scalability. We showed you how we're using machine learning models to match contributors to projects and how we're simplifying our quality management and with our workflow product to ensure reliable, high-quality data for our customers. Our products are essential to delivering our long-term goals of at least double revenue by 2026, greater customer diversification and operating margins at 20%. Thanks for attending our 2022 Technology Day. Apologies again for the technical glitches at the outset. We hope you've enjoyed the presentation and learn more about our products and how they underpin our growth strategy.

Mark Brayan

executive
#22

I'll now go through the questions that we've been given online and take any more that come in over the next little while. The first question is in terms of data sourcing, data preparation and model evaluation, can you talk to Appen's revenue exposure and the industry revenue split between these 3 buckets? So we don't call out that other than as we did in our full year presentation, mentioned that about 80% of our revenue comes from relevance work, which is in that model evaluation bucket. So we haven't called out or split the other revenue types other than that model evaluation piece. The question goes on, can you also talk to margin differential, if any, between data sourcing, preparation and evaluation? And given the scale Appen has in model evaluation/relevance, should we assume that gross margins are higher in evaluation? We do have a lot of scale in the relevance work, and our margins are pretty healthy there, as you know. For data sourcing and data preparation, it depends a lot on the project. And it depends a lot on the technology that we can deploy for that project. For example, the Quadrant application allows us to collect data for multiple customers concurrently. So we get a lot of scalability every time a freelancer goes out in the field. If we're able to deploy our machine learning models in data preparation, we can get higher margins in those projects versus ones that are fully manual. The next question. How do we think about the need to validate models frequently for drift, et cetera, and Appen's recent performance, wherein some relevance projects have seen less work? Have companies change their refreshed schedules? The short answer is yes. A lot of our large customers, as we've explained over the last few periods, have been redeploying engineering assets to new product development to help reduce their reliance on advertising-based revenue. By doing so, they've taken a risk, if you will, about the frequency of updates they're applying to some of their core models. And that's entirely reasonable and is based on the model and the business case. I might ask Wilson, can you give us some examples of models that may need frequent updating versus a model that could use very infrequent updating, for example?

Wilson Pang

executive
#23

[indiscernible] needs frequent updating probably can go on [indiscernible], right? There's many examples. Like search, you train a search ranking model, you need to understand all those new terminologies like a new product, a lot of new stuff there. You need to train this like very frequently. Advertisement, same thing, right? This is kind of relevant. Then you go to voice recognition. For all voice recognition technologies there, let's say, you talk to a phone, your phone needs to recognize your voice. There are so many new terminologies every day, new phrase. You need to retrain the model very frequently. That's audio. Come to conservation, like all those object detection, object tracking. There's new stuff coming like also all the time. You need to train the model. So for majority of those real-world AI applications, you need to retrain the models pretty frequently. For the models you don't need to retrain like very frequently, I can think of 3 categories. Number one, it's really now there's a big CBPR conference there, like all those professors, students from university, when they train the model to generate paper, probably [ one time ] deal or maybe 10 times, 3 times right? You don't need to retrain like every month. You [ generate ] the paper, that's it. That's one category. Second category, there are some underneath models like how we can convert all the text into a vector or limiting to vector. Those models are used by other models. Those other needs models probably don't really need to retrain those frequently. That's the second one. Third category, I want to put, that's a real-world AI. People don't retrain those frequently. Because they don't know there's an issue, their model already drifts. They're not aware there's a big model performance problem. And that's the area, I think, everybody should pay attention and really monitor the performance and retrain the model.

Mark Brayan

executive
#24

Thanks, Wilson. So in response to the question, there's a number of factors that may impact how frequent a model is updated, but there's also a decision by the model owner to update it or not. And they take the risk of the model performance. The next question is, how will increasing automation impact gross margins? And how should we think about Appen's gross margin assumptions in context of the FY '26 goals for EBITDA margins? So our whole thesis around our strategy is to improve gross margins with our products. And with that, drop those margins to the bottom line to improve the EBITDA margin that we called out by 2026 as being at 20%. That will take -- well, we've given ourselves 5 years to achieve that. It's early days. We are seeing some improvements to some of the projects, and another question refers to that. But it is early days in how we're deploying these models. And we gave you a few examples today to show you how we're deploying models in computer vision automation and an audio automation. But our transformation program will get those models in more projects and improve those margins over time. Regarding automation and the crowd productivity uplift, while it should increase gross margins over time, can Appen talk to the experience to date? Has Appen had to give up productivity through better pricing for customers or have margins improved? It's mixed. I can think of some audio projects where we've benefited from margin expansion, but then I can also think of some computer vision projects where we've had to use the models to be competitive and win the business. So for now, it's mixed. But we think over time, we'll be able to improve those models and improve the products such that we'll get very high margins, and we'll be able to hang on to most of that ourselves. The last question we have is who performs the knowledge graph work? And what's the revenue model for Ontology Studio? Do companies pay a separate fee? It's a very new product. And currently, we're using it to deliver service to the company. It helps our linguists and our project managers deliver that service or deliver that project far more efficiently using the Ontology Studio. So currently, we're using it internally to deliver the service. Whether we sell it as a product over time, we'll make that decision as the product matures. But for now, it's underpinning a service that we are providing to our customers. And another question has just come in real time, and it is given the product investment and product enhancements in order to grow enterprise revenue -- sorry, I can't see that. Given the product investment and product enhancements, in order to grow the enterprise revenue base, is it now more about investment in sales staff/go-to-market? If so, can you provide us with an update on the investment in sales staff? It's a combination. We're doing 2 things in parallel. We're building the products, which give us more things to sell, and we're also adding sales teams to take those things to market. For now, our sales teams are fully staffed for the '22 year, fully staffed to the levels that we need to achieve our objectives for enterprise for the year. If we run ahead of those objectives, we may add more sales staff. But we're doing both things in parallel. We're building the products, and we're also developing the go-to-market resources to take those to market. What does the sales cycle look like with enterprise clients? How long are the lead times? Can they test, try your platform before committing through APIs, et cetera? The sales cycles vary anywhere from a couple of months through 6 or maybe even up to 9 months for the bigger deals. Very importantly in our world, given the experimental nature of AI, we often start with a pilot study. The question asked, can they test or try the platform. What we ideally like to do is hold the customer's hand through that pilot to prove out a use case for them. And that then forms the foundation for what we call expansion revenue over time. So many of the projects start small, and they grow over time. So that first deal may take a few months because it's a small pilot. A larger deal may take a few more months to grow it from there. But then once we start providing high-quality data to the customer, we get a fairly regular flow of projects from those customers. I may ask Sujatha to add to that question, given you work with many of our enterprise clients.

Sujatha Sagiraju

executive
#25

So when we think about enterprise customers, some of the things that they think about are what are the data modalities that you support because when customers talk to us, it's not just about one data modality that they are interested in right now. They're interested in all the other data modalities that they might expand into in the future. So while they might start to pilot with the one capability, they also, what I say, think of the platform while the pilot is going on for the other data modalities, and we enable that.

Mark Brayan

executive
#26

Yes, that's a good point. The fact that we can handle so many data modalities is very valuable to our clients, and we do see them move from one to another as they start to build more AI for their enterprise or for their customers. The next question is a platform of models you're building. How do you monetize? Is it only through margin improvement/helping internal labeling? Or is there an opportunity to sell directly to your customers? So we have a number of different revenue models. We do sell our platform direct to our customers. For those of you that have followed us for some years, you may recall, we bought a company called Figure Eight. And they historically provided the -- sold the platform to their customers, and the customer used the platform and ran the project themselves. In those instances, we're selling a SaaS license to those customers. Increasingly, however, because of the complexity of projects and things like designing the project to, for example, tick whether something is a mixer or not, it sounds trivial, but our customers are increasingly relying on our expertise. So we provide a service layer over the top of the platform use. And in those cases, they may pay us for the amount of data that we provide. And within that is the platform license and the service fee that we provide. The other model that we provide and this is for some of the relevance projects, particularly the large ones for our large customers, is on the time that we provide for them. They have a good understanding of how much data a contributor can provide per hour, so they charge us per hour. So those 3 models. There's the software license model, the data volume model and the time taken model. Can you tell us more about your strategy to continue growing the high-value China segment? So the strategy for China is, in many ways, to mimic what we've done in the West. That's worked very well for us. And so in China, as Wilson explained, we're building our China platform for reasons of IP and data protection. And then we're targeting, first of all, the large tech companies. We're also targeting the autonomous vehicle companies. And then we are also winning business with mobile companies. Wilson, of course, has a lot of experience in China. So I might ask him to comment on the growth of our business in China as well.

Wilson Pang

executive
#27

Yes. China is a very different market. And the customer there is like very demanding, right? Someday, like sometimes they like a feature today, and they don't want the feature tomorrow or maybe a week. That's the [ whole way ] that we invest a separate China platform, so we can have this flexibility. And beside the platform, we also have a dedicated China engineering team behind the platform. They can provide a lot of rapid response. I really think that strategy helped us to really move fast and win that market.

Mark Brayan

executive
#28

Yes. Speed is important.

Wilson Pang

executive
#29

Yes.

Mark Brayan

executive
#30

China speed. And particularly so. Thank you. Compared to the competition from a tech standpoint, where is Appen leading? And are there any areas that need further improvement. I think I'll hand this over to our experts. First of all, to Sujatha, given you know a lot about the competition. And then maybe Wilson can add some comments as well.

Sujatha Sagiraju

executive
#31

Sure. So when I think about competition, the knowledge comes from working with our customers and also the external information that we look at from the competitors' websites and doing the feature comparisons. We can say a few things. One, we are the only one that does all the data stages of the AI life cycle. Other players that we see over here do 1 or 2 stages. We are the only ones that do data sourcing, data preparation and model evaluation. And then the second key factor is that we are the only one that does all data modalities. You will see some companies that are specialized in [ ABCV ], some companies that are specialized in audio speech. But again, we are the only one that does all stages of the data cycle and all the data modalities. That's a huge competitive advantage for us. The second thing I'll also say is -- the third thing is the combination of our expertise, crowd and platform. You'll see pure platform players or you will see only BPOs who have the crowd, but they don't have the software. And combination of the crowd and platform, along with our expertise, I think, makes a huge key differentiating factor for us.

Mark Brayan

executive
#32

Wilson?

Wilson Pang

executive
#33

Yes. I do want to add on that, really from the technology perspective there. When we talk about all those different products or tools, right, look at this whole life cycle, data collection, it sounds like very trivial or easy, but you need a powerful mobile app. You need other data collection software. We are probably the only provider who have those like doing all these data collection products there. Come to data labeling, data labeling has many players. Some players focus on computer vision. Some player focused on content relevance, but we are probably the only one-stop shop. You can do audio. You can do computer vision. You can do [indiscernible]. You can do everything. And we also have low code, no-code software. You can build your own tools there. That's what it's enabling, right? Data evaluate -- model evaluation. We are also the only player there has been doing this for many, many years. There's a lot of expertise. And that's one part. The second part I also want to emphasize is really people think this kind of data labeling, data collection, data model evaluation is just a software business, right? It's not. There's expertise of operation. There's also crowd management. Those components are super important. But meanwhile, building all the AI and software and the product to help us to manage this cloud to productize all those operator expertise, that's also critical. I think we have a lot of advantage there, for sure.

Mark Brayan

executive
#34

Yes, thank you. And another thing that occurred to me while you're both providing those answers is as a public company, we're highly accountable to many groups of stakeholders, including everybody on the call today. And our customers like that. They like that we take care around things like data privacy. They like that we take care around ESG compliance, et cetera, looking after our crowd. So nonproduct items but also things that make us stand apart from our competitors. There are 2 more questions that we'll take. One for you. Can Sujatha talk to any changes that Appen has made in terms of product strategy since she's joined?

Sujatha Sagiraju

executive
#35

Absolutely. Let me pick my favorite one. When I joined Appen, we position ourselves as a training data company. And I honestly believe we were underselling ourselves when we did that. What we are empowering customers to do is much, much bigger. It's the whole data for the AI life cycle. We are the one-stop shop for the data for the AI life cycle, and that's one of the big changes that we have made. That's one part. The second part is in terms of our product strategy, building capabilities that the customers can use in a self-serve easy way. That's a huge, big change because if you think about it, we have the expertise. We have been doing this for 25 years. We know how to do this for customers. Now we are building those capabilities so that the customers can do it themselves. And I'm super proud of the capabilities that we are building over there.

Mark Brayan

executive
#36

Yes, it's very exciting. And Ryan used the expression before that we're democratizing training data, and that's the self-serve aspect of that. So very exciting. The final question, and I know we have run over time, but because of the technical problems before, we're happy to do so. But the final question is just following up on the product front. Can you also talk to any areas that Appen lags versus the competition and where we need to invest? Wilson, first of all.

Wilson Pang

executive
#37

Sure. The area, I think, we are a little bit behind is really all those products or tools to really win those autonomous vehicle markets. We invested those product, and we had a huge success in China. We are probably the #1 training data provider for autonomous vehicle in China. But in the Global market, we have just started, right? I think there, like we're small and more projects, we're small and more customers, we can enhance our autonomous vehicle like product suite. That's the area I do see. That's an area for us to improve the growth there.

Mark Brayan

executive
#38

Absolutely. Sujatha, are there any other comments, areas we lag or areas you'd like to invest in?

Sujatha Sagiraju

executive
#39

I want to answer more of the second part because, again, as a product person, I look at where we are right now, where our customers are, where our compete is. And of course, we'll always have a long list of things that we want to do. But in terms of where do we want to go is that's the vision in terms of building products that can easily enable customers to do the entire data for the AI life cycle that is highly automated to be able to do it in a self-serve fashion for all data modalities for all stages of the AI life cycle. That's what we'll be building.

Mark Brayan

executive
#40

And in near real time.

Sujatha Sagiraju

executive
#41

Yes, in near real time. Yes.

Mark Brayan

executive
#42

Thanks very much. Well, thank you, Wilson. Thank you, Sujatha. Thanks to everybody for joining. Thanks so much for the questions. It's good to talk to you all. Again, apologies for the technical challenges at the outset, but we hope that this was a very useful session for all of you. And we're looking forward to the next time that we all meet. Thank you, and good afternoon, good evening, good morning, wherever you may be. Thank you.

For developers and AI pipelines

Programmatic access to Appen Limited earnings transcripts and 32,000+ others is available through the EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments, full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.