FactSet Research Systems Inc. (FDS) Earnings Call Transcript & Summary
March 25, 2020
Earnings Call Speaker Segments
Bijan Beheshti;Vice President, Director of Quantitative Strategy
executiveGood morning, everyone. Thank you for joining us on this webcast uncovering investment opportunities with DataRobot on FactSet. My name is Bijan Beheshti. I lead our Quantitative Analytics team at FactSet, and I'm pleased to be joined by Rob Hegarty, General Manager of Financial Markets at DataRobot. We have a lot to get through today. But first, I just want to get through a couple of items here. Today's event will last approximately 30 minutes. The audio is going to be broadcast through your speakers. Throughout the presentation, you have any questions, please feel free to submit them via the questions window of the control panel. We will address these during the Q&A at the end. If you have any technical questions, please submit them in the same way. We'll get to them right away. After the presentation, you will be receiving a recording via e-mail. So in just a minute, I'm going to pass this off to Rob to give you all an overview of DataRobot, just talk a little bit about the partnership between FactSet and DataRobot and to discuss how DataRobot is used in financial markets today. Rob will then hand it back to me. I'm going to go through an actual use case. I'll provide you guys with a full-on demo and walk through building a model, then putting a model into production, and then we'll wrap up with Q&A. So with that, I'll pass it over to Rob.
Rob Hegarty;GM of Financial Markets and Fintech
executiveGreat. Thanks so much, Bijan. Good morning, everyone. Good afternoon. I'm happy to have you here. So I'm just going to take a few minutes here upfront, take about 5 minutes and talk a little bit about DataRobot, the platform, the partnership with FactSet and just how we're helping asset managers and financial markets, firms globally make better predictions. So very quickly, DataRobot on FactSet helps you accelerate your investment decisions by providing all of the best things that you know about FactSet with the world's leading automated machine learning, an AI enablement platform, DataRobot. And that allows you to go from data, using of all the unique content data sets that FactSet has, along with the core Quant applications. We've combined that now with AI machine learning capabilities from DataRobot and all with an open provider to enable you to have a powerful workflow solution all the way from idea to execution. I wanted to talk a little bit about the importance of the time to build models, and it's never been more important, given the current environment that we're in. With this type of volatile market environment with changes happening so quickly, it's important to be able to build your models as quickly as possible. And it's now possible to build more models with less data and still get those valuable insights just in a much more accelerated fashion. DataRobot sources for hundreds of open source and proprietary algorithms. You'll see some of those at work today with Bijan's demo. And that improves the likelihood that many of those algorithms would capture the current conditions as opposed to relying on just a few algorithms that you may be familiar with. And then lastly, the features that drive the existing Quant models in the current scenario, the current market environment that we're in, are completely different. And a lot of those models have not been exposed to these conditions. That is why it's important to be able to draw on as many models as possible when building out predictive capabilities. So real quickly, a little bit about DataRobot, for those of you that don't know us, we serve the Global 2000. We are a cross-industry platform. You'll see in a second across every industry globally. We've returned on investment. We've delivered ROI of over $10 billion to our customers. We've been around since 2012, an advantage of being a relatively new company when we talk about this, but certainly a veteran in the AI space is that we are AI native and built for the cloud with a singular focus on AI since our beginnings. Also important in this kind of environment, we are well funded. We're backed by $430 million in funding with a good -- great set of investors that are -- understand our marketplace. And we also have over 1,200 employees globally. We operate in 35 different countries around the globe. We are trusted by customers across all industries. As I said, financial services is our largest industry. And we also have 8 of the top 10 global financial institutions as clients of DataRobot and 3 of the top 5 global asset managers as well. So a very good cross-section of industries with a lot of depth in financial services. Very quickly, a little bit about DataRobot and the platform and how it operates. The goal for us is to be able to take you from data to value very quickly through an end-to-end artificial intelligence platform. So that's everything from a data catalog to be able to understand all of the data sets that you can pull in. You'll see some of those at work in Bijan's demo, where we're pulling in numerous FactSet data sets. We do data preparation as part of that, feature engineering, so we automatically generate an engineer features based off the data sets that are -- you pull into our platform. At the core of our platform is automated machine learning, which is the model creation and validation process. And then we also take care of things in post building -- post model building processing, which is when you want to go to deploy these models. We have model ops and deployment, model risk management, monitoring and so forth. So everything it takes to actually deploy, manage, monitor and maintain those models once you get those. In terms of users, who are users of the DataRobot platform, in investment management world, it's everything from quantum to data scientists to portfolio managers to research analysts. We also have outside of financial services and investment management, data engineers, software developers, business analysts and so forth. So it's a platform that's usable by highly quantitative users as well as business users, non-data scientists. And we all sit that on top of enterprise-grade open source model. So you'll see in the demo that we're pulling from a number of different models that we use, whether that's Vowpal Wabbit, Python, R, TensorFlow, XGBoost, we have you -- hundreds of open source models at your disposal. And the point of the platform, and you'll see this -- how this works in real-time in just a minute, it's just to be able to add data and hit start. My last slide is really to just talk about how we add value in every area of asset management. What you're going to see today will very much focus on the front office of an Investment Manager, so portfolio management and research. There are a number of different use cases within an asset manager. You'll see a few of them here, and you'll get a good look at these use cases that Bijan will go through. But we also have use cases deployed in end use at asset managers outside of the front office, whether that's trading, operations, compliance and risk, we even go into technology, doing things like cyber threat detection as well as out to the client experience, whether that's customer relationship management or new customer acquisition. So with that, I'm actually going to turn this back over to Bijan, and he's going to take it from here and give you a demo of the platform.
Bijan Beheshti;Vice President, Director of Quantitative Strategy
executiveGreat. Thank you, Rob. So just give me one moment here, I'm going to go ahead and share my screen. Okay. Great. Let's go ahead and get started here. So as Rob mentioned, DataRobot is an automated machine learning platform. And what that really means is we're going to be leveraging this tool to structure supervised machine learning problems. It's really a natural fit for FactSet because we have so much content on our platform and that ultimately having DataRobot hosted on FactSet will allow you to create machine learning problems, basically take any sort of data that's on this platform and ask questions, right? We could ask questions like what is the liquidity of the company going to be in the future, what is the probability that a bond is going to default. As long as we have the historical data available, we can set that data as a target variable, and we can set other types of data items as features that we want to use to make predictions. And then DataRobot will create these machine learning models for us to be able to generate those predictions going forward. So what I'm going to do today is, I'm going to focus on a very short-term example. And generally, we look at a lot of history, we look at longer-term models, and we try to create these predictions, look for stable market environment and so forth. But given everything that's going on in the market today, given the very, very high levels of volatility, I wanted to create a model that's just focused on basically year-to-date information, looking at weekly data to predict forward 1 week volatility for various securities. And so I'm going to do this for the Russell 3000 Index. And why don't we just jump right in and I'll go ahead and start setting up this model. So as you can see, I have my FactSet workstation up here. I have a bunch of tabs. This is just how I've organized the FactSet workstation. If I go over to the DataRobot tab, there's going to be a couple of things in here for me to work with. So I'm going to start with this data integration dashboard. And this is really where I'm going to be able to pull in all the different content that I want to use to fuel my machine learning model. So let me just quickly go ahead and set this up. And we call this Russell 3000 weekly volatility model, let's call it weekly volatility ML model. And then I can give it a description forecast security level volatility. And so the first thing I'm going to do is I'm going to set up a universe. Now those of you on call have never used FactSet so far, aren't familiar with screening functions. All I'm doing is I'm using a set of FactSet formula to define mine universe and to define all my different parameters. These formulas can be looked up very easily. I'll show you how you can do that if you're not sure where to start. But basically, what I did is I used function called FG FactSet global constituents. I just plug in the identifier of the index that I want as well as the date, 0 is a dynamic date, it goes back through time when I'm back-testing. So this is going to give me all the constituents for the Russell 3000 Index. Next, I'm going to set my start date and end date for my model. So let's go ahead and do a year-to-date model. So we'll start with December 31. And we're going all the way up until March 13 since we're doing a forward 1 week volatility predictions. We want to go back to the point where we at least have 1 full week of data for an out of sample validation. We're going to do lots of out of sample validations throughout this model. But we want to end it at the point where there's still 1 week left for 1 more validation period. I'm going to rebalance this every week. And FactSet has pretty much every trading calendar available because -- using U.S. securities, I'm just going to go down and select the United States. So now I can basically add any sort of variables that I want here. So I'll just give some quick examples. Let's say I just want to see the name of every company within the Russell 3000. I'm going to type in a formula, PROPER_NAME, and I'm going to get all the names of all the -- sorry, Russell 3000 companies down here at the bottom. I can go ahead and continue to add features. Let's say that I want to add the price. And let's say, I don't know what the formula is for price. When I just start typing price and go to formulas, you can see all the different formulas and have the word price in them. I'm just going to pick the first one here. And this is going to show me all the different arguments that I could potentially add, maybe if I want to do different currency or change rate. So I'm just going to add in a price, and price is going to drop in there. If I don't know what I want, I can go through and browse for different data items. You can see all the different libraries that FactSet has available here. And I'm not going to spend too much time going through this, I'll just give one quick example. If I want, maybe fundamental data, I can come in here. I can pick any sort of metric. I think there's over 1,000 fundamental items just in this one library. Let's say I want sales. I can pick the report basis, I can pick the currency. And as I'm doing this, you can see the formula preview down at the bottom. It's going to build the formula for me. So maybe I just want fully reported data. All my settings have been selected. I can go ahead and hit okay and this formula will be dropped into this grid, and we can see the results. So I'm not going to spend a bunch of time looking up formulas here. I actually have a number of them on my other monitor in excel. So what I'm actually going to do is I'm just going to copy paste them here. So I'm going to do a control V. And what I've actually done is I've pulled in a bunch of different factors from our quant factor library. So this is actually a new database that -- I'm showing you the in-house version right now, but within a couple of weeks, this is going to go live. There's over 2,000 factors in this database across over 20 factor families. And I'll talk a little bit more about what we're looking at in just a second. But what I did was I threw in a ton of factors in here, about 75 different factors, and I've also thrown in a target variable. So that's 1 week forward volatility. When I throw all my metrics into this field, I can actually select any one of them to use as the target, and that's going to be the metric that we're going to -- want to predict. Now -- and just to give you a sense of the source of data items that we're looking at, I'm going to drag over an Excel spreadsheet over here, just so you can see what type of metrics are in our model. So I've kind of categorized it here based on types of data. So we're looking at different reference data. So our target variable, again, is the future 1 week volatility for every asset. We have some basic asset level data, like how many days until earnings report, what is the actual business description, some size metrics, country concentration, so how diversified is a company in terms of country exposure, different industry metrics, some fundamental stuff like value metrics, growth, efficiency, profitability, some quality scores and solvency scores. A lot of pricing and market-based information. So we'll be looking at different momentum indicators, volatility indicators, market sensitivity indicators, liquidity, technical indicators as well as exogenous variables. So these exogenous variables will share the same value for every security on a given day. This will be things like gold and oil prices, the level of -- and the actual level of pairwise correlations across the entire universe. So how correlated are all these securities' returns relative to one another. And a few alternative data sources as well. So we'll be looking at some sentiment data from analyst consensus information, we'll have some corporate governance data there, some insider transactions as well as some crowding information using our ownership database. So we can see what percentage of a stock is held by ETFs or hedge funds, active managers or passive managers. So as we kind of put this all in here, what you'll see is a preview down at the bottom. Let me go ahead and maximize this. And so the idea is pretty simple here. We have all of our different companies in this universe. We can see what the 1 week forward volatility is for every single one of these companies. And again, this is as of the end date of our model, which is going to be March 13, 2020. And we're going to use all of these other metrics over here on the right to basically predict what the 1 week forward volatility is. Now you'll notice that some of these metrics are continuous variables. Some of them might be binary variables. And some of them are actually just open form text. If you go all the way over to the right, you can see that business description. Some of these are 2,000, 3,000 characters, maybe even more of text describing what the business does. And some of these are actually categorical variables like sector or economy, where we'll have a certain number of categories. The type of data item really doesn't matter. We can throw in any sort of data into the system and use these as features. And the same goes for the target variable. The target variable can be a continuous variable, it can be a binary event, could be a probability of something happening, it could be a category that we're trying to predict. It doesn't really matter. The system is very flexible in that regard. So from here, now that I have my data ready, I'm going to jump into my next tab here, which is the actual DataRobot application. And so what I'm going to do here is first set up my target variable. So our target again is that 1 week forward volatility. We very quickly get a nice distribution, we get some insights from the data. So there's not a lot of missing data. It says about 1% of the data is missing, but good to know. What DataRobot does automatically here is it starts analyzing this target variable. So we can see the distribution. We can see a number of missing values. We can see the type of metrics that we're looking at, which is continuous variable. And based on all of that, DataRobot is going to suggest that we use an optimization metric. And in this case, we're using the root mean squared error. And I'll walk through a few options in just a moment of other types of metrics that we could use, depending on what we want our final model to look like. If I go down to the bottom here, you can actually see all of our data. It's been parsed into the system. So you can see all these different features that we have. So if I want to open any one of these up, say I want to look at operating cash flow yield, I can see the full distribution for all the time periods in my data set, and I can make this a little bit more granular. I can get some specifics on the number of unique values, number of missing values, the mean standard deviation and so forth. I can also select which features I want to use in a feature list to meet my predictions. So if I only want to run predictions using any of these 5 features, I can create my own custom feature list and then use that to build a model. What I'm going to do is, I'm actually going to select a default feature list. I'm going to select informative features, which is something that DataRobot builds for us automatically. It looks at all of our features. It looks for features that might not be useful if there's maybe too few values or if there's target leakage, too high correlation between the future and target variable. And it's going to filter those out for us and leave us with the features that make the most sense for our model. So I'm going to stick with informative features, and I'm actually going to set up our machine learning model now. So many of you may know machine learning to function like this, right? We take our data and we train on a subset of that data, we then take a different portion of that data set, and we validate our results. And this is called out-of-sample validation. Because we're dealing with time series data, financial data that changes through time, we're going to have to set up time-aware modeling. And I'll really show you what that looks like in just a couple of clicks here. So basically, what we're doing is instead of just randomly partitioning our data, we're going to sort it by time period. And you can see here in blue, we're going to use this chunk of data to train our data, train our model. And in green, we're going to validate our model. And so I can add a few different back-tests in there. So let me go ahead and have 4 different back-tests. And I'm going to create a walk forward model. So I'm basically taking about 5 weeks of information and I'm training my model, and then I'm going to validate on 1 week out-of-sample. Again, this is a very short-term model. This was intentional. We're just trying to focus on this period of high volatility to see if we can get a stable model that can create predictions for us. So you can see that we're training, validating, training, validating, training, validating and so forth, all the way until we get to this final partition, which is our holdout. So our holdout is going to be locked away until the very end, and we can see how we performed in that most recent out-of-sample validation period. So really, what we're looking at is this red bar is going to be last week's data that we're validating, and we have 2 weeks ago, 3 weeks ago, 4 weeks ago and 5 weeks ago. There's a lot of additional options here. We just don't have enough time to go through all of them. I want to point one out. Depending on how you want your model to perform, you can come in here and select a different optimization metrics. So if you want maybe the most explainability from your model, you can pick R-Squared, if you want your model to maybe rank your predictions more accurately than anything else, you might select Gini Norm. We're going to go with this recommended Root Mean Squared Error to just minimize the error of our final models. And I'm going to go ahead and just hit this big start button. And that's pretty much it. Now we're ready to start. So what's going on here is DataRobot is now analyzing all of our data. So we are basically setting up those partitions that we just discussed a moment ago, we're going to be characterizing the target variable. We're going to be characterizing all of our features. And this is really where the AI starts to kick in. DataRobot is going to be analyzing all the different features in our model. It's going to be looking at types of data, the distribution of that data, number of missing variables. And it's going to build a model repository that's bespoke to our problem. So we want to predict 1 week forward volatility using all of these different features. We're going to get a list of different models with different feature-engineered methods, the data transformation methods, different parameter settings based off of our underlying data set. And so the first thing that's happening here is our data was just reordered. And you can kind of see here all of our data has now shifted in order of importance. So this is a univariate metric of nonlinear importance. And what that means is we're looking at a mono agnostic view and just trying to see which one of these metrics has the -- you can almost think of it like a correlation that incorporates nonlinear relationships to our target variable. So if I were to open any one of these up, if I open semivariance up, for example, you can see that as our target variable goes up, so does our exposure to semivariance. So there's almost a linear relationship there, and we can basically observe that model agnostic univariate relationship for any one of these metrics. For momentum that's more nonlinear. You can see it's kind of U shaped. But at the same time, what's happening, if you look over on the right, is we're actually running machine learning models. We step away from the data for a second, you can actually see that there's 2 models running right now, and they're actually running in their own environments. So these are running in individual docker containers. So we have a Eureqa generalized additive model running here and the decision tree regressor. And ultimately, these models are going to be running through Python, through R, through Julia, through a number of different languages and environments, and they're going to be coming from a number of different sources, and they'll have different blueprints for each model. And I'll talk about what that means in just a moment. I'm running 2 models in parallel. What you notice is there's a number of workers in the top right. So I can actually scale this. If I want to run more machine learning models in parallel, all of this is running on FactSet's AWS server. So we can just scale this up. And let's say, I want to run 20 of these models in parallel. I can go ahead and just up this number to 20. And now you can see that I have 20 different machine learning models running here at the same time. Everything from neural networks like TensorFlow, XGBoost models, we even had a basic linear regression at the top, I don't know if you saw that, but I scrolled down. And as these models are running and they're completing, a model leaderboard is being generated. So I'm going to go ahead and click on that second tab at the top. And based on our Root Mean Squared Error, you can see which models are at the top and which models are at the bottom. And as these models continue to execute, you'll be able to see various models get displaced and new models surface up to the top. So in interest of time, I'm actually going to jump to a completed model. By the time that we're done speaking, this whole model should probably be complete itself. But I'm going to go ahead and bring over another screen here. It's essentially the same data. But it's run for a model that's already been completed. So I'm going to go over here into my model view. And you can see that 87 different model blueprints that have been completed. We just don't have enough time today to go through a number of them and to kind of talk about all the different reasons for selecting one model versus another. So what I'm actually going to do is I'm going to hone in on a single model here, this Light Gradient Boosting on ElasticNet Predictions model. And in just a couple of minutes, I want to explain some of the things that we can do here. So first off, we can see the blueprint. So again, this blueprint is going to be unique to this model, and there's 87 of these blueprints that will run in this particular project. Now as I look through this blueprint, I can see the different types of data items. I can see the different sort of feature engineering that's going on, all the way up to the algorithm. And every step of the way, we have documentation. So if I wanted to maybe click on a link, I can go ahead and pop this documentation open and see exactly what is a Light Gradient Boosting trees regressor, when would we use it, what are the strengths, what are the weaknesses, different parameter settings in here all the way down to academic references. I wanted to go ahead and look at that. So if I want to go ahead and analyze this model, and we'll do this again very quickly, I'd love to spend an hour going through the details of this model. But given the short time frame that we have, we'll just try to go through this in a few minutes. I'm just going to show you the structure real quick. So you have -- we have 4 different back-tests here. So again, this back-test ended 6 weeks ago, this one ended 5 weeks ago, 4 weeks ago, 3 weeks and 2 weeks ago. And we're validating out-of-sample in that green bar for each one of these periods. And we can actually see what the Root Mean Squared Error value is for each one of them, but we can see that stability through time. The first place that I'm going to start is actually a lift chart. And so this is -- let me actually go to a first back-test. This is a great place to start to make this a little bit more granular. You see the accuracy of our model. We're essentially going to break it up into buckets. So in this case, I'm taking 30 different fractiles actually. And I'm sorting my predictions for the Russell 3000 securities from our lowest volatility predictions to our highest volatility predictions. And you can see that in general, we underestimated volatility, but directionally, it was accurate for this time period, again this was our first back-test so this was a couple of weeks ago that we're validating. If we look at our holdout, our model caught up. And so now that band is much tighter. And you can see our predictions. This is for last week results for volatility. So the predictions we made 2 weeks ago for last week are volatility predictions in blue with the actual volatility numbers here with this orange line. You might be wondering, okay, this looks a lot tighter, why is Root Mean Squared Error for the holdout period almost double the first back-test, we'll look at the scale. 2 weeks ago, our average volatility was a lot lower than it was last week. And so that Root Mean Squared Error is just going to be higher based on that. But it does look like we are making more accurate predictions in general, it looks like a better lift chart. There's a lot of different ways to analyze this. We can look at stability through those different back-test periods, you can see how that's kind of increased, its volatility has increased and the Root Mean Squared Error. We can understand this model better. So a big criticism, I think of machine learning in general is that these models are black box. We don't really know what's going into them. That's not the case here. For every single one of these models, we can identify which features were most important in making our predictions. What's interesting here is that you can see that all of our exogenous variables are in the top 10. So VIX, gold, pairwise correlations, oil these sort of macro top level type of metrics are influencing our model a lot more in this environment. At the very top, we have a technical indicator, average true range, so that's a volatility measure that is based on the actual range of security prices. So we can see the relative importance of all these different features. But what's even more interesting, I think, I'm just going to spend a couple of minutes on this. I know we'll go over the half hour mark here, but I do think this is pretty interesting to look at. Because it gives us a real sense of what our factors are doing within this model. How are these different features resulting in our predictions, looking the way that they do. And so I can pick any one of our features. So average true range, you can see as my exposure to average true range on the x-axis goes up, my model predictions also go up. You can think of this as a marginal impact. So if a stock had a higher average true range, we're going to predict a higher future volatility for that stock. If I look at something like days to report, if a stock has been -- is going to report the earnings within the next 7 days or so, you can get a much higher prediction for forward volatility with a sharp drop. And then a fairly linear decline as we go all the way up to 90 days. I can pretty much search for anything here. I can look for momentum, and I can see this nonlinear relationship. So higher predictions for very low momentum, higher predictions of volatility for very high momentum. If I want to look at maybe hedge fund ownership, so as hedge fund ownership increases, our model is going to make a lower prediction of volatility for assets. As ETF ownership increases, our model is going to predict higher volatility for an asset. If I want to look at our fundamental metrics like the Altman Z-Score. If you're not familiar with this metric, it's a pretty interesting predictor of potential bankruptcy. What I found really interesting is that if you look at the theory behind this, it's written all over different articles and papers that once you get to value of 2.99, that means that your company is pretty much safe from bankruptcy risk. And at that point is when the model actually bottoms out their volatility predictions, which is I thought was kind of interesting. And the last thing I'll show here is just a country concentration. So I thought this was quite interesting. Two weeks ago, our model -- if you have an exposure of 1, which means you're 100% exposed to a single country -- and this is essentially a Herfindahl index, so the lower value means you're more diversified across country exposures. And since these are U.S. companies, so 1 basically means you're pure U.S. play. And so there's a sharp spike in volatility here if you are a pure U.S. company in terms of predictions from our model. And if you kind of go back there -- to 6 weeks ago, you can see that U.S. was actually lower volatility. If I go to my third back-test, which was 5 weeks ago, you can see the same trends. More U.S. exposure means our model is predicting less volatility. Once you get to 2 weeks ago, kind of the same trends here, once you get to 1 week -- I'm sorry, 2 weeks ago here, you get a sharp increase. And then last week, our holdout -- our model is adapting through time, and our model is a lot more linear here. So we still have more risk from U.S., but it's not as sharp of an increase when you get to that 1 value. So the last thing I'm just going to show here -- there's a lot I want to go through, and we just, again, don't have the time. But I want to show how this kind of ties back into FactSet. So if I go back to my data immigration dashboard, you can actually see a list of different models that I have set up. And down here, this is the model that we just created -- sorry, let me go ahead and find the right one. And so what we've actually done is we've deployed this model. And so we get a formula here that can be placed anywhere within FactSet. And just in 1 minute real quick, I'm just going to show you a couple of examples of that. If I jump into the Quant tab here, you can see this formula in a screen I've created. And when I go and add it to this report, you can get our volatility forecast there. So this is essentially -- this function is taking all the underlying formulas that we had plugged into our model, and it's refreshing them. Those 75 factors are getting refreshed. They're getting sent to the data robot engine. They're getting run through the deployed model that we were just looking at. So all the feature engineering is happening, that entire model blueprint is happening behind the scenes. And we're getting a prediction that come back to us. And we can see that prediction for next week's volatility using today's data, the most recent data for every company in a particular universe. In this case, I'm looking at the S&P 500, a different universe than what I trained my model on. So I can go ahead and sort this. I can see which securities have the lowest volatilities and some common characteristics across them. They have very low average true range. The earnings yield values aren't too high, they're not too low. If I kind of reverse this, I can see my highest volatility production names. So Apache, Royal Caribbean, Norwegian Cruise Line. They have very high true range values, earnings yield is either very high or very low, you can just see extreme values there. Year-to-date return is generally very low. All those different variables are going to come into play into our prediction. I'm just highlighting a few different variables here. But again, the factor, the hedge fund ownership whether a company is founder-led or not, all those different factors that we used in some way are going to take -- have some impact on this final prediction value. Just another quick way to look at this. So you guys haven't seen this. There's a new tab, a coronavirus tab in the FactSet workstation using some stats. On the coronavirus, it can be helpful to kind of navigate through the market under these vulnerable times. If we look at a real-time performance view of the S&P 500, what I've done is I took this coronavirus tab and I just plugged in our volatility forecast in here, so we can actually monitor this in real-time for all these securities alongside our existing analysis. And we can do portfolio level analytics here, too. If you look over on the top right, you can actually see the average volatility forecast for the different industries within the S&P. And so we can see which ones are up top, which ones are down low. If I want to dive into any one of them, like consumer discretionary, I can see here that hotel, restaurants and leisure are the top industry within that sector for our volatility estimates from our model, Internet and direct marketing retails at the bottom. If I want to maybe dive into our most volatile industry, I can see the stocks at the top, Royal Caribbean, Norwegian Cruise, CCL cruise and casinos. At the bottom, you have some restaurants fast food chains. And if I want to maybe find out what's going on with Royal Caribbean, I can just go ahead and double click on that, get some news and get more information on what's going on with that company. So I know we're about 6 minutes over right now. So I'm going to go ahead and wrap up here. Again, there was just a whole lot that we could show. We just wanted to give you a quick preview here. Hopefully, this was helpful for you guys. At this point, I'm going to go ahead and take a look at the questions that have been submitted, and we'll go through Q&A before wrapping up. So just give us one moment here while we pull up the questions.
Bijan Beheshti;Vice President, Director of Quantitative Strategy
executiveSo we do have a question here from a FactSet user. It looks like who's asking whether or not DataRobot access through FactSet has a prerequisite of subscribing to our Alpha Testing module. So for those of you who don't know Alpha Testing is our factor back-testing tool. It's a tool that's used to design different factors and back-test them through time. A subscription is not required to Alpha Testing. In order to use DataRobot through FactSet, you will need access to the FactSet terminal. This will get you access to all the different content sets in FactSet that you'll be able to use within DataRobot. You're not limited to those content sets. You can certainly use your own data, you can use third-party data that's integrated in the system as well. But you don't need access to Alpha Testing in order to use DataRobot through FactSet. And the application itself, I was showing it through the FactSet workstation, you can also access it through a web browser through FactSet web. So the next question here, could you speak briefly as to the tools and features available in the data preparation stage? Sure. So there's a lot of different things that DataRobot has available in their repository that can be added into any type of machine learning model. Now it's going to depend on what type of data we're looking at. I just opened up a very random model here from the one that we just ran, this XGBoost model. And you can see a number of different feature engineering steps here. So again, these steps are going to be related to this particular model, and it's going to have to do with the fact that this model has categorical variables, has numerical variables and has text data, and it has a date variable. And so based on that, what I often see is that missing values will be imputed in either one way or multiple ways. And if you click on that and get more information, you can actually see how those missing values are changed, and you can actually go into a tab that's devoted to that to see how those missing values are changed. Pretty much in every model, there's some form of standardization or normalization of the data that's going to happen, especially if we're looking at continuous variables. There's a lot of these different feature engineering methods, and they're getting updated pretty much every single day. So it's hard to kind of go through every single item that could potentially come here. But what we can certainly do is I can look back to see if we have some reference material on all the different types of data transformations and feature engineering steps that will be taken in any sort of model within DataRobot and certainly provided after the call. And maybe what I'll do is I'll quickly check in with Rob as well. I don't know, Rob, if you have anything to add to that?
Rob Hegarty;GM of Financial Markets and Fintech
executiveYes. I think the only thing I'd add to that is, again, sort of outside the integration point but in terms of data preparation, we do a lot even ahead of the feature engineering side. In other words, if you have -- we have a data catalog that -- as part of our offering. We also have a data preparation capability, both of those done through acquisitions we did in the last year. One was company called Cursor data catalog, another one is a company called Paxata, which is a data preparation software. So there's a lot of built-in data preparation and cataloging within the platform.
Bijan Beheshti;Vice President, Director of Quantitative Strategy
executiveGreat. Thanks, Rob. So I think we just have time for one more question. I do apologize for going over. Again, there's just so much that we want to show. So let's see here. Okay. So somebody noticed the Jupyter tab up top over here in this hyperlink. Maybe what I'll do, Rob, do you mind taking this one as well? So the question was, what does the Jupyter tab do up at the top of DataRobot?
Rob Hegarty;GM of Financial Markets and Fintech
executiveYes. That just -- that allows you to actually use Jupyter Notebooks in order to build your model. So it's part of giving the user different ways to be able to build and deploy model. So if you want to do that through Jupyter notebook, you can do that as well.
Bijan Beheshti;Vice President, Director of Quantitative Strategy
executiveGreat. Thanks, Rob. Okay. Just real quick, maybe we'll just do one very last question here. So there's a question about different types of use cases, especially outside of equities. So I just want to highlight that the system is not limited to a particular type of asset class, it's not even limited to a particular type of data set. So we were looking at constituents of the Russell 3000 Index. Those were the rows in our model. So each row was a different stock. Those rows can be bonds, they can be dividends, they can be funds, they can be factors, they can really be anything. We can generate predictions for any sort of target variable. So if you think about fixed income, we might want to predict rating changes, we might want to predict probability of default, if we have a bunch of NA data for sectors, we might want to predict the sector classification for security, we might want to predict something like fund flows for fund level data. For factors, we might want to predict factor volatility or factor return and use features like different factor statistics, maybe information coefficients or distributions of factor exposures or correlation levels for factors, the system can really run any sort of data that we provide it. So it's not limited in that context. And if there are ideas that any of you have on the phone you want to speak to us about, please feel free to reach out. We'd love to talk to you and work through those use cases. So I'm just going to go ahead and pop up our last slide here as we wrap up. I want to take this time just to thank Rob once again for joining us on this call. DataRobot has been phenomenal in working with us and our clients. They've been super helpful and an extremely valuable resource as we've worked on different projects with our clients that relate to machine learning. If any of you have additional questions, you can feel free to check out the link on this slide or e-mail us or reach out to account team. So with that, again, thank you all very much for taking the time to join us today. Please have a great day and stay safe out there. Thank you.
For developers and AI pipelines
Programmatic access to FactSet Research Systems Inc. earnings transcripts and 32,000+ others is available through the
EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments,
full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.