Praxis Precision Medicines, Inc. (PRAX) Earnings Call Transcript & Summary
November 24, 2025
Earnings Call Speaker Segments
Douglas Tsao
AnalystsOkay. Good afternoon, everybody. Thank you for joining us. I'm Doug Tsao, Senior Analyst at H.C. Wainwright. We are thrilled today for what I think is a very unique event. We are joined by Professor Chuck McCullogh, who is a Professor of Biostatistics at UCSF and a real expert in the use of mixed models and really author of what I think many consider as the sort of definitive textbook on the subject. We are also joined by Marcio Souza, who I think most of you are familiar with, who is the CEO of Praxis. So we're focusing today on the Essential3 program. I think this is a little bit of an unconventional format. The reason we're doing it this way is not only can we get the perspectives and insights from Chuck on some of the questions that have been coming up around Essential3, but also sort of accelerate the sort of cycle time for addressing questions by having Marcio here. So I also want to make it clear upfront that Chuck is not affiliated with Praxis in any way. He is an independent consultant. So my hope is, is that Chuck will provide some insights that will help everybody feel sort of better educated on the program and as well as ask some tough questions that Marcio can provide some insights.
Douglas Tsao
AnalystsSo Chuck, I think as a starting point, I think it would be really helpful if you just provided sort of a 2-minute refresher on why you use an MMRM model and its value for this type of clinical study? And ultimately, do you think this was the right model for them to use. And then maybe if you just have an overall impression of the data as you've seen it.
Chuck McCullogh
AttendeesCertainly. So I mean, this is longitudinal data. We collect data repeatedly over time on the same people. So from a statistical point of view, that's called correlated data because, of course, data within the same person is similar over time, more similar to their own data than to other people's data. So that requires a statistical modeling method that can accommodate this repeated measures correlated data. You also need to be able to have an analysis method that allows for flexible modeling. So for example, you want to adjust for baseline ADLs. In the models that Praxis used, they also adjusted for things like a family history of tremor. So you have to have a flexible modeling framework. So that basically boils you down to 2 choices. There's mixed model repeated measures analyses, MMRMs and what are called generalized estimating equations, name is not so important. It's just another method for flexible modeling of correlated data. But a practical reality of any clinical study is there's going to be missing data, there's going to be dropout, and that needs to be accommodated. And so mixed models are well known to produce more reliable results with missing data compared to these generalized estimating equation approaches. And in this study, there was appreciable dropout leading to missing data. We didn't get to measure the activities of daily living for every person for every week. And so that gives a clear preference to mixed models in this context. So if you had come to me completely independently and said, "Help me write a statistical analysis plan", this would almost certainly the path down which I would have led. So the start is using an analysis method that I would deem most appropriate. My general reaction is I start looking at the results, the results are highly statistically significant. If I back out p-values from the confidence intervals there, the p-values are really tiny, very strong statistical evidence. And so this should give robustness to any violations of assumptions in the analysis methods. So all that is good. There are some concerns that we'll talk about, I think, a little bit later that would have been concerns in my mind as well that need to be addressed by analysis of how robust these results are. But that's -- those are my sort of initial impressions.
Douglas Tsao
AnalystsOkay. And then, Marcio, with that as a backdrop, maybe just provide an overview of the Essential3 program and sort of the key considerations in the design following the Essential1 study results?
Marcio Souza
ExecutivesYes. I think I'm going to try to build a little bit of what Chuck just mentioned, right? So from the very beginning of this program, from the very first interaction with the FDA we had, there was no real other choice of model, other choice of methods in the discussion, right? I think that if you go across the board and look into neurology in general, but even outside of neurology, longitudinal data is almost always analyzed using a mixed model. So that may be important because I think there was some questions about like when did this come to play. Even on the Phase II, the Phase IIb, that was always the case, right? But I believe, and correct me here, Doug, if that's not where your question is coming from, when you look into the actual hypothesis that was generated to design Essential3, right? So we didn't come up with Essential3 with the Phase III program like out of in there. There was a prior study, there was a previous study that generated the hypothesis here. There were a number of key elements there. The first one was the population, right? So we always define a population first. It's like what are we studying this. And very similarly to what we're doing -- what we've done in Essential3, the idea was to use a population that was reasonably severe, right? When you look into the baseline for those patients, that was pretty high in terms of the severity and equally actually a little bit more severe arguably on Essential3, but very similar. So that was important. They were not really treating people without like significant impact on their daily living. The second part that was incredibly important is -- the FDA, actually, they were ahead of us on that in a sense, they insisted that we use the ADL modified in a way that they requested, right? That's why we call it mADL, it's the modification on the scoring after the data is assessed, and that was a request by the agency. Actually, when you talk to physicians, they normally quote the ADL because that's what they assess, but the actual measure is slightly smaller numerically and therefore, harder to reach statistically like significant results, as we said before. With that knowledge in mind, the program was created. You got to remember as well that when you go back to Essential1, the result on the mADL, while it was not the primary at that point in time, it was positive was actually the p-value was lower than the 5% threshold that had been defined for that as well. And based on that, the Essential3 program was created. Now we knew at that point in time, we needed more patients to actually get the certainty to be higher. But that's how the study was created in general. So it's been very consistent, if I may, throughout the program.
Douglas Tsao
AnalystsWell, and I guess just really quickly, just the overall structure of Essential1 because there were 2 studies, right, Study 1, the parallel design as well as Study 2. On Study 1, you did have -- you changed the primary endpoint in the study midway through. Maybe just talk through quickly the process what led you to do that? That was a decision you made after the interim analysis, but you didn't sort of make the change until September. So kind of what took so long?
Marcio Souza
ExecutivesSure, sure. So I'm glad you asked those questions in sequence, right? Because if you look into back to Essential1, that was an 8-week study. When we conducted -- the IDMC conducted the interim analysis and we decided to continue the study, we had to, I'm going to say, slow down, slow down by asking the questions like everything we know about the program, how do we know the drug to be effective at how fast and all these questions. And for how long do we have certainty on the estimations for the projection of the primary endpoint, right? Like reasonable, I believe, questions to be asked. And there was one thing we knew that 8 weeks was how we base the study, so day 56, number one. But there was something that was quite important as well when you talk about the design of Essential3. We made the decision, one could argue a very high bar decision, a very complex one to randomize patients to Study 1 to the parallel group or to Study 2 to the stable responder randomized withdrawal. That is incredibly difficult to do, number one. And a lot of people don't do it because you increase the -- like the bar in terms of enrolling 2 studies at the same time. One of the things that came from that decision was that the studies were identical for the first 8 weeks, identical in every possible way, the patient pool because they're randomized based on the same stratification factors that Chuck's just mentioned that we had as well as covariance for the baseline. We -- or the assessment, they were blinded to study personnel, to patients, all the way. And that equality, if I may, ended at week 8. And then after week 8, the responders on Study 2 got randomized to stay on drug or placebo. So we changed the assessment after that. So in order for us to actually even further refine the estimation, we went back and said, where does that end at week 8. And it took a while, of course, one would always like go back, this is a high stakes decision. We should not take in a rush. We like took our time to like think about that to simulate and so on. And it takes time to think. It takes time to simulate and takes time to write the changes and to submit to the agency. That's why we've done this way. In the grand scheme of things, I don't actually believe that's a lot of time.
Douglas Tsao
AnalystsAnd Marcio, just to confirm, right, you did submit that ahead of locking the database for the study.
Marcio Souza
ExecutivesAbsolutely. There will not be a valid change if we had not done this without knowledge, right? So that was no knowledge of the allocation on the study or the group level, the individual level, any of that before the change were memorialized, number one. Implemented because we implemented the change. We amended the protocol, amended the SAP, we submitted to the IND, we sent a letter to the agents, so on and so forth. So everything was done way before the database was actually locked.
Douglas Tsao
AnalystsAnd I think Marcio, it's important to remember that sort of changing the midpoint or the endpoint mid-study is not without significant precedent, right? I mean we even saw it recently with donanemab's application where, in fact, you actually had the agency disagree that ultimately was not deemed sort of a consequential to the assessment of efficacy. And then I want to come back to Chuck in a second, but Marcio, I know you've sort of provided a lot of detail and you said that it was actually highly significant. What was the actual p-value for at day 56 on the primary endpoint?
Marcio Souza
ExecutivesYes. I think that is the reason why we didn't show this before, I'll tell you right now, but it gets to a point there was like why are we even going there, right? Like I described this before as silly, right? There are certain things that we do that they get silly. But the actual p-value was to the order of the 10 of minus 6. So if you think about that was after the dots there, the point was like five 0s and then the first positive integer, I think it's hard to believe that anyone would consider that even close on the payer tests. But again, I should ask Chuck, what he thinks.
Douglas Tsao
AnalystsAnd Marcio, you have said that you were positive on the original endpoint, right? As much as you had made the change, ultimately, in some ways, there was much to do about nothing. I mean, can you care to provide some color on how successful it was on that endpoint as well?
Marcio Souza
ExecutivesYes. So it's worth to say that was not only at the original endpoints, right? There was the day 84 assessed as the average contrast between day 77 and 84, and that was to the 10 of minus 3, so like the p-value. But it was at every time point, each one of the time points assessed, including day 14, right? So day 14, day 28, day 49, 56 like 63, 77 and 84, all of them were significant. And I think it matters, right? We're basically saying at no point in time, there was like a weird fluctuation here that you lost significant -- that would be a problem. We've seen that in several trials before that there is some fluctuation that did not impair their ability to get approved. But I think in this case, it's quite important as well to mention that.
Douglas Tsao
AnalystsAnd Chuck, when you hear those degrees of sort of statistical significance, I'm curious, does that -- how does that influence your sort of initial assessment?
Chuck McCullogh
AttendeesYes. So let me back up just a second. So whenever there's sort of last-minute changes in an analysis plan, especially the primary outcome, it sort of sends up signals to me that I better pay attention, I better scrutinize this a little more carefully. And so the same considerations that you brought up come to my mind. Was this decision made before database lock? Yes. And so that's important because things haven't been unblinded yet. And so you don't have -- you're not making these decisions based on what you know the results are going to be. So that's number one. I'm less swayed by the necessity of having equal time periods in the 2 arms of the overarching studies. You've got the data for both. You can analyze them using only the consistent data if you want to. It's a question of what you declare to be the primary endpoint. Okay. So it's been changed. Okay. So stepping back, in the bigger scheme of things, this is a pretty minor change. It's not like you change from one metric to another metric. All you did was you shifted the time at which the primary assessment was going to be declared. I've seen people in clinicaltrials.gov just say, the primary outcome is mADL, they don't even specify the time in clinicaltrials.gov. So it's a relatively minor change. Okay. So still, okay, there's a little suspicion here. There was a last-minute change in the primary outcome. So now I turn to the sort of issues that Marcio was talking about. Okay. So in kind of worst-case scenario, let's imagine a conceptual trial where you declared those endpoints, 84 and 56 to be co-primary. What would you have done? A relatively draconian adjustment would be to do a Bonferroni adjustment. That means instead of testing at 0.05, you test at 0.025. That lets you look at both endpoints. and take either one that's statistically significant and declare success. Well, these p-values are tiny, as Marcio was saying. And again, I back calculated them from the confidence intervals that are public information. And they're very, very small. So that still leaves me pretty convinced because even if I say, okay, I'm going to adjust for co-primary outcomes, I've still got statistically significant results.
Douglas Tsao
AnalystsAnd Chuck, that's really helpful. And I think one of the things that you talked about, right, when you talked about the value of an MMR, there is missing data. And that has become a focal point for investors and sort of thinking about sort of the various sensitivity analyses that the company has presented. Can you just give us sort of a brief tutorial on missing data and how an MMRM model handles it and help us understand what sort of missing at random and missing not at random actually means.
Chuck McCullogh
AttendeesSure. And again, I think we're transitioning to a different topic because I do see these as sort of separate issues, the choice of the primary outcome versus how you deal with missing data. So let me talk a little bit about mixed models and missing data. So as I said earlier, mixed models are clearly preferred in situations where there's missing data. And it's almost always the case in any clinical study, there is some. And so that's often the reason that people are guided to use these. Why is it that people like them? It's because analysis of data using a mixed model approach without any formal consideration of missing data. So just sort of pretending that, okay, the data is unbalanced, but there's no real bias to why we had missing data for some people under certain circumstances still gives valid results. So that's the key reason people like that. Okay. What are the certain assumptions? These assumptions are technically known as missing at random. Now that's a terrible term, and I wish that whoever made that term popular would be strung up because if you try and parse it as an English language, it leads you to the wrong conclusion. I prefer to think of it as missing that's predictable from the observed data. In this case, that would include anything in the model like family history of tremors as well as any previously recorded value of ADL on that person. And so that then extends the protection fairly widely. Missing not at random, that's the more problematic case because the mixed models don't necessarily protect you there, means the missingness is dependent upon other things like the value of ADL if we got to see it, which, of course, we didn't. When the data are missing not at random, then a mixed model analysis can give you results that deviate systematically from an analysis that you would get if you suddenly magically had access to all the missing data. And so then that's the legitimate concern. Again, it's not just a concern in this particular study. That's a concern with any study that has missing data, which again is virtually any clinical longitudinal study.
Douglas Tsao
AnalystsAnd I want to clarify because I think this has been sort of a misconception amongst some is that a patient dropping out because they're doing poorly on an observed basis is not a violation of MAR, right, missing at random as well as an imbalance and discontinuation between active and placebo is also not automatically a violation of MAR.
Chuck McCullogh
AttendeesRight. So when people are dropping out because you've seen this -- look now, say, a patient is not getting better and they decide to drop out of the study, that's predictable from their pattern of ADL measurements up until that point. And so that could well be missing at random. And you're right, the mere fact that the discontinuation rates or dropout rates are different between the arms is not necessarily an indication that the missing at random assumption is violated. I'm a senior statistician for a recently completed randomized trial treating depression. And one of the arms was an antidepressant drug. Our endpoint was declared to be depression after the end of treatment, but we had, of course, intermediate measures of depression and people taking these pills got immediately better and discontinued. But that was completely predictable from these early measurements of depression. So again, that led to a big difference in discontinuation rates even though the MAR, missing at random assumptions are still quite plausible.
Douglas Tsao
AnalystsAnd is it fair to say that with the model of the Praxis used assumes more, but we do or should consider the possibility that the mechanism of missingness is missing not at random, and we need to stress test the data for that possibility as well?
Chuck McCullogh
AttendeesYes. I'm not so sure I'd say it assumes that. But again, I'd go back to under missing at random assumptions, it still gives valid results. and is not guaranteed to give the results when it's missing not at random. And it's an unfortunate fact, maybe not too surprising, but you can't tell if data process is missing at random or missing not at random, by looking at the data because it depends on things you didn't get to see and assumptions about things you didn't get to see. So it's almost always a good idea to stress test these missing data assumptions by doing sensitivity analyses.
Douglas Tsao
AnalystsAnd what would be sort of the way that you do sort of assess the plausibility of MAR and what are some of the questions that we should be thinking about here?
Chuck McCullogh
AttendeesYes. So I go through this process and try and think about it in a sort of twofold. First, is the situation. Unfortunately, this is -- because it depends on unknowable things from the data. It depends strongly on the clinical context and what you know about what's likely to cause missing data. So first, I try and think through is the scenario likely to generate data that's missing at random? Or is it likely to generate missing not at random. I mean just to give an example, so in a recent study that had completed, we were -- we had a scale we were using. It was not validated nor proven to be useful in the population we were studying until the study had already started. So the first few people, the first 100 people out of many hundreds we recruited, we were not able to use this scale because it had not yet been validated. But after the first 100, we decided it was the better scale. We made the hard decision to replace the one we were using. So we now introduced missing data for the first 100 people recruited into this trial. So you know from the context that there's a strong argument that that's missing at random because the fact that it was missing was just related to the fact we recruited them earlier. I mean you can imagine things that it's missing not at random, but they're pretty implausible. So there, the situation gives you sort of confidence that it's missing at random. On the other hand, and probably more relevant to today's discussion, when dropout is related to adverse events and causes discontinuation in the active comparator arm, then we have more suspicion that things are likely to be missing not at random. And that's where it's even more important to stress test by doing sensitivity analysis.
Douglas Tsao
AnalystsAnd when you -- Marcio, sort of how do you think about it? How did you go about addressing these issues?
Marcio Souza
ExecutivesYes, absolutely. So the first thing is like what we hypothesized before, right? So the question is always -- should be always first is what is the primary model and then we talked about that already. And then how do we stress test the model, right? Like what do you -- if you consider things that might not be a random that are missing a random, how do you stress test? So the prespecified sensitivity analysis, which, by the way, was the same one on Essential1, the previous study was the one in every statistical analysis plan here, and it's actually a commonly used one for MMRMs is the tipping point. The principle there that there is a point by which you keep removing benefits or adding a penalty, whatever you want to think about it, that would tip, right, would become nonsignificant. And the second judgment you have to make there is like what is clinically plausible because you can't interpret just a mathematical change. You have to interpret that as clinically plausible as well. And when you put the 2 together, the proposed and the memorialized one on the statistical analysis plan was a tipping point analysis starting at 0.5 a point as penalty and going to 2.5 points. So that was the maximum on the actual analysis plan. The subsequent to that, and as you know, it did not tip like not even close to actually becoming nonsignificant at that point actually much larger than that, if you care to know, is much, much larger than 2.5 points, which is already pretty absurd because the patients don't get worse and so on this case. But then the other question that we asked ourselves, and you've seen this in our disclosures is like for these patients that we didn't have information, I think Chuck just talked about it, we can only hypothesize things. But you can ask what could be a reasonable replacement for their values, right? That's what all we are doing here is replacing the things we don't know. And we said, okay, placebo would be a reasonable replacement for that. A reference, right, would be reasonable. And we've done a different method. That was very clearly not prespecified. It's very common to be done, but it was not. The only prespecified was the tipping point. We checked the box already. But we've done that. And we tested, although the methods are very similar when you complete data, we tested using the MMRM feeding again the MMRM now the data is no longer missing, right? So you just fit the model there. And [ Onconova ] well was in very similar terms. And while we're seeing that once again, one could actually expect that is highly significant results on those sensitivities. So you're stress testing the model. We're not necessarily -- you made the example of Donanemab. So I'm going to bring that back because I actually think it's quite important since Chuck was mentioning about recent studies as well, right? That is relatively recent. As you know, there was not only a change on endpoint, but the agents actually comment quite eloquently that it doesn't really matter because the other endpoint was actually positive as well, kind of similar to the change in time points for us. But quite interestingly, they actually requested a tipping point analysis. The agent said -- asked Eli Lilly, which they sponsor here to conduct and actually tipped at the first level. That did not preclude at all. the first level is very low, by the way, precluded at all the approval of the drug was just stress testing, I'm going to call it stress testing, how reliable to stress the endpoint was. It happens to be that on that study, it was pretty high the discontinuation rate as well. So it's not unequal to the situation we're dealing. So it's not only that there is plenty of regulatory precedents, number one, very recent as well, same personnel in the division for that matter. We're also tested with other methods, and those methods also result into very robust results at the end.
Douglas Tsao
AnalystsAnd Chuck, we did get a question from somebody watching who did want to clarify or just -- but you alluded to it that if you do have discontinuations because of adverse events, how does that impact our assumption of MAR?
Chuck McCullogh
AttendeesYes. So when you have discontinuation due to adverse events that you think are treatment related, so people are discontinuing the drug. It's unrealistic to expect that they're going to follow, let's say, we have a couple of values of ADL for them already and then they have an adverse event and they discontinue. It's probably unreasonable to assume that they're going to follow the trajectory that they were on while they were on drug and then they suddenly discontinued. So this is the place for what Marcio was talking about where you'd either say there are a couple of widely accepted methods for doing these MNAR, missing not a random analyses, one of which is tipping point and the other is return to reference where you say, well, I think they're going to switch to some other profile and a reasonable profile might say, okay, I think they're going to switch to look like a placebo patient that otherwise had sort of similar characteristics to this person who started in the active comparator arm. So that's a situation where I do think stress testing is more indicated and a missing not a random mechanism is more likely. Again, we can never know for certain, but the clinical context here would suggest it.
Douglas Tsao
AnalystsAnd so -- but based on what you've heard so far in your assessment of the data, do you think MAR is still plausible?
Chuck McCullogh
AttendeesWell, it depends on which type of patients we're talking about. If we're talking about somebody who's got quite a few measurements of ADL, leading up to the primary endpoint of 56 days and is still on drug, probably quite plausible. If we're talking about somebody who dropped out very early, we have no follow-up measurements or somebody who discontinued drug, I think, less likely.
Douglas Tsao
AnalystsAnd Marcio, I think you sort of referenced it, and you did talk about the tipping point analysis that you conducted. Maybe, Chuck, it might be helpful for you to just sort of walk through how a tipping point analysis works. And Marcio said that I think it was -- they got to 2.5 standard deviation, 2.5 points. What is your initial impression of that, right? And I think that the p-value at half standard deviation was 0.026. Correct me if I'm wrong, Marcio.
Marcio Souza
ExecutivesNo, you're right.
Chuck McCullogh
AttendeesYes. So okay. So let me back up and talk a little bit about how a tipping point analysis works. So again, this is in the context of data that's missing, not at random. So we have to make certain assumptions about how much, in this case, worse an active comparator patient would do when the data are missing. So we're basically saying, okay, usually, you start from, okay, here's what we'd expect their trajectory to be ordinarily. But now that they have generated missing data, we're not -- we're going to take that expected trajectory and make it worse by some amount. So typically, these are called like a delta adjustment. So we make a little drop. So we say, okay, maybe we expected that patient to have improved by 3 points on the ADL scale. Now we're going to decrement their improvement. They're not -- they didn't improve by 3 points. If we have a delta of 1, we are going to say, okay, we're going to just make them say they only had an improvement of 2. So that would be like a delta of 1. And so we apply this value of delta, and we check to see whether or not it overturns the results, typically asking, are the results still statistically significant after I've employed this little delta. And so -- and then typically, you march and you increase the delta. Now I would nitpick with Marcio's approach. I mean a tipping point analysis should keep going until you tip it over and no longer get statistical significant results. They stopped at about 0.5 standard deviation. I'll come back and talk about that in just a second. So that's how the tipping point analysis works. And in their case, they went up to 0.5 standard deviation, still had statistically significant results. So basically, they hadn't yet reached the tipping point for this analysis. There's been some questions about is 0.5 standard deviation too much, too little. Well, 0.5 standard deviation is a moderate-sized effect, a 2.5 point change on the scale is right around what a clinically important difference is. So even though, again, I would nitpick and said they should have gone farther until they actually tipped the analysis to be not statistically significant. They did go up to a pretty big effect size and did not see a tipping of the analysis towards not statistically significant.
Douglas Tsao
AnalystsAnd Marcio, a couple of questions. But first, I want to -- somebody pointed out, I misspoke. I think I dropped to 0 your...
Marcio Souza
ExecutivesYou did drop to 0, I was going to correct you.
Douglas Tsao
AnalystsYes, you were 2.5 [indiscernible].
Marcio Souza
ExecutivesThere is an extra 0 there, but that's the start of this program, having an [indiscernible].
Douglas Tsao
AnalystsToo many 0s. How did you settle 2.5 points as half a standard deviation? What was sort of the rationale or identification of that? I mean you haven't necessarily talked or fully disclosed all the standard deviations on the primary endpoint. I think we had the baseline at 2.4, but just what was that? And at what point to sort of Chuck's question, I think that's a good one. At what point did you lose significance?
Marcio Souza
ExecutivesYes. So number one, I agree, right? It's like if you were to reasonably hypothesize that you're going to keep growing, like remember, you control studies. I know we are going for this MAR and MR and scrutinizing the other endpoints. When you control the study at the 5% alpha level. And in a sense, deliver an eye on that, right? That's how we declare the study success or not. I think all we are saying right now is in a sense, gravy, the way I look into it. Should we have, in our craziest dream, imagine we'll go beyond 2.5 points of penalty on the sensitivity. No. Did the FDA comment when we submitted 2.5 as the maximum. No, they did not criticize that either. Now in retrospect would I have prespecified keep going. Yes, absolutely. Like there's nothing wrong with that, right? We're talking about like a 0.006 value at 2.5% of course, the number is much bigger. And I would kind of leave a sense with that is like we normally would say you have to cross the 5%. We're in the 0.26% like you've got to imagine that the number is significantly bigger there. Do we need to be bigger is the question? And the answer is no, are we bigger? And the answer is yes. Are we bigger by a lot? Yes. And that should kind of close a chapter, right? Because this is a sensitivity, it's not even the primary. I think if the primary was 0.0026, we would be like happy. And we're talking about a sensitivity with penalty here. So I talk about silliness. This is one of those things that got a little silly.
Douglas Tsao
AnalystsAnd Chuck, maybe you could comment just from your perspective, when you look at a study like this and how standard and half standard deviation, penalty is for tipping analysis, right? How robust is that? Or would you have said, gee, maybe it should have been 0.75 standard deviation or a full standard deviation. I mean I think to Marcio's point, like they came up with a number, and they could clearly go past it. So -- but to people who are saying, "Oh, gee, why didn't they go further?" From standard practice or from your perspective, how robust is half of standard deviation?
Chuck McCullogh
AttendeesYes. So again, to be nitpicky, I would have preferred to see a formal dipping analysis where you keep going until you tip. And then you have the thing in hand, you say, "Look, I didn't tip until one whole standard deviation, something. And that's completely implausible. I mean that's how these sensitivity analyses work. But that said, half of standard deviation is pretty well accepted as a moderate effect size. So you're saying that you're going to decrement the active comparator arm by a moderate-sized effect. And again, I'm also convinced by the very small P-value there because I know that just as Marcio said, that means you can go further and not flip it over. So again, I feel like I'm more nitpicking than strong concerns. And no, yes, I looked at it and I said, "Well, why didn't they keep going? They said, well, okay, the P-value is 0.026 and standard deviation is a moderate effect size. And so I'm not overly concerned with this.
Marcio Souza
ExecutivesOne could jump there, right? And kind of, again, giving the -- as we are in active discussions with the FDA, I'm going to be a little bit careful as well. But is -- does it tip at 3? No. Does it tip at 3.5? No. Does it tip at 4. No. And then at one point, it gets to the point that I said it's silly, right? Because now we're penalizing the entire like study by an amount that is not reasonable on a highly heterogeneous patient population on 1 that you know placebo does not do much. You've got to put the clinical context on this statistic analysis as well.
Douglas Tsao
AnalystsRight. And to Marcio, to your point, when you're -- if it's holding at 4 points, you're taking patients to well below even the placebo response at that level, right?
Marcio Souza
ExecutivesNow we are damaging them. And like is that reasonable to actually say that. And I think that that's where it becomes unreasonable.
Douglas Tsao
AnalystsAnd Marcio, the analyses have all been on the MITT population. And to be in that, you needed to have a post-baseline assessment. Now I know you had the FDA agreement and alignment that this was the primary analysis population. We have gotten questions in terms of the ITT population as well. And so can you provide some perspective on sort of how that impacted your analysis?
Marcio Souza
ExecutivesYes. No, absolutely. I would maybe separate without things here, right? So one is what the ITT would do to the actual results. That is nothing because there is no post baseline term. So the MMRM will just drop that. But that is not probably where your question is coming from. Your question is like what happened to these patients that are not available. And I think that that's where using the jump to reference here is important, right? Like you're saying, okay, for those patients as well, not only the other ones, we're going to replace their value. We're going to treat tens if they're not responders. We know it's not the case, right? We know that a very large proportion of patients respond. So we're actually penalizing quite like aggressively, and we show that data for the juncture reference. But I'm going to throw another one here because since this is a relative call. We did do another sensitivity analysis that is not in our deck that we completely replace their numbers by a 0 change. So like basically a baseline carried forward, right, or saying there is no chance, like these patients are straight line because placebo arguably has small chains in the direction of effects. And that is also positive. So if there is any concern about what sensitivity does to the primary, that shouldn't be. Because the -- again, when you're talking about a 10 to minus 6 primarily when you're talking about every time point being positive, it is just very, very hard to actually negate the node by just going through this different imputations as we may.
Douglas Tsao
AnalystsChuck, how do you think about the reasonableness, if you will, of considering patients that dropped out before day 14 as missing at random since we don't have sort of an observed outcome for these patients.
Chuck McCullogh
AttendeesYes. So as I said before, unfortunately, we can never know for certain whether things are missing at random or missing rather just from the observed data. We sometimes get clues. Again, as I was talking about earlier, if we've got a sequence of values and then we know that a person had an adverse event and discontinued drug, we've got sort of clinical expectation that they're going to change somehow. But these are patients for whom we have virtually no information, not even clues as to what we should be assuming. So again, this is a situation where if you wanted to go to an ITT, again, the primary is modified intention to treat. So I wouldn't -- I don't get too concerned when there's an accepted modified intention to treat is the preplanned analysis. I always good to think about intention to treat. But this is a situation where, yes, you would definitely want to be doing some stress testing and by using one of the methods of assessing sensitivity to missing not a random data just to see what happens.
Douglas Tsao
AnalystsAnd Marcio referenced jump to reference and that being sort of one of the analyses that they did. Chuck, maybe provide sort of a quick refresher or sort of tutorial, if you will, and how that works and how that adds to the robustness of our picture?
Marcio Souza
ExecutivesYes. So it's another well-accepted method of trying out a missing not a random mechanism. Again, it's a one that corresponds in many situations to be clinically appropriate. It's just saying that if I've got a drug, and I think that as soon as somebody discontinues it, they will look like a placebo patient and I'm going to choose the placebo pattern of data as my reference group. I'm assuming that a person immediately goes to look like a control patient. I'm assuming that's the reference group that was used in this situation. And so that's a pretty conservative approach by saying, I'm assuming that instantaneously from the beginning, this person looked exactly like a placebo patient. So you're immediately diluting the active comparator effect about as much as possible by assuming it looks exactly like a placebo.
Douglas Tsao
AnalystsAnd Marcio, your question that came in that I think might be helpful is just how do you think secondary endpoints sort of help us inform us in our overall population? And then I'd be curious to hear Chuck's preventive on that as well.
Marcio Souza
ExecutivesYes. So maybe first on the statistic, how they are treated, right? So the primary had to be positive and then the secondaries were sequentially tested. I think it's incredibly important. I'm glad you asked that, right? There are a number of things we assess here as the secondaries and they're all positive and they're all very small P-values. One thing we haven't said publicly before, they're also positive at every single time point, right? And when you consider the way we structure the secondary endpoints, right? Like the first one is a clinical outcome assessments, the primary. So the MADL, the clinician assess the patients, then we ask what happens on the overall effects? What happens on the entire trajectory of these patients, not only at one time point, but on the entire trajectory. That is to the order of like 10 to the minus 7, the P-value, right? So overall, in the study, they're doing really well. Then we ask the question like how patients see from the beginning of the study, they are health improving like using the PGI, how the clinicians see, how the clinician see the severity of the chains. For all those things, if you look across the studies in general, they don't tend to be positive. They don't tend to be positive throughout on studies that are not really giving a benefit because they are not assessing the same thing, right? So we consider, like, yes, we've been talking about sensitivity to the primary, but then go and ask the question, what happens to the overall health of these patients, and it's incredibly strong as well each one of those. So I personally think it's always very good to see secondary end points supporting the overall effect of the drug and not being conflicting, right? Because sometimes we see conflicting secondary endpoints in other studies. That was not the case here at all. They're all showing benefits.
Douglas Tsao
AnalystsAnd Chuck, I guess the question to you, how do you sort of look at secondary endpoints sort of influencing your overall assessment of the data?
Chuck McCullogh
AttendeesYes. So I often work outside the nonregulatory environment and it's especially important there. But even in the regulatory environment, I don't really have much to add over Marcio. I mean I do exactly the same thing. I look at the secondary outcomes, especially ones that I expect to be highly correlated with the primary outcome. And I start seeing it as a red flag when they disagree, especially if they go in the wrong direction, which I sometimes even happen. And again, so when everything lines up, pack sizes are in the right direction, especially when everything is statistically significant. I think that is pretty strong support for the primary analysis.
Douglas Tsao
AnalystsAnd Marcio, I guess, you did reference right jump to reference as having sort of being successful and obviously, a fairly at a robust level. I mean, how much further past placebo were you able to take patients before the model breaks?
Marcio Souza
ExecutivesYes. Well, and you can go back and already said like you can replace their chains for 0 on the ones that just continue before day 14, right? Meaning you replace completely -- you remove the effect by on that case, 1.7 points in average because that's what placebo was and it's still highly significant. So you can make patients worse than they were at baseline. And it still be significant on the missing data perspective. I think someone would have to be institutionalized if they think this drug makes patients worse. And so therefore, like that should be a chapter that is closed, right, in terms of like how robust the primary is on this analysis.
Douglas Tsao
AnalystsOkay. And Marcio, I think I just want to before we hit -- we talk about sort of some of the integrated data. I thought it might be helpful to just sort of do a quick summary of kind of where -- what we've covered so far today. So I'm actually been working during the call to sort of pull together a little bit of a matrix. So if you give me a second.
Marcio Souza
ExecutivesOkay. I don't like surprise, but go for it.
Douglas Tsao
AnalystsAnd so just as we go through, I thought we were just hit the primary endpoint, as you talked about, Marcio, was sort of significant. When we did the sort of all-time points MMRM analysis, we're still significant. DGI, CGI, secondary endpoints, as observed, you just noted significant. You going to a sensitivity analysis using missing not at random with an imputation of greater than 2.5, we're still okay. Jump to reference placebo, we're still okay. I think you referenced maybe an ANCOVA jumped to placebo, we're still okay as well as going back earlier to the day 77, 84 right, MMRM sort of initial analysis, we are still there. So just sort of sticking to that. Chuck, I guess when you look at this and we'll -- we don't hold it up for another second. But I guess, Chuck, when you are coming back to just the main thing, when you look at that matrix, what is your response to perspective when you think about sort of a program that has that body of work.
Chuck McCullogh
AttendeesYes. So again, I kind of distinguish the 2 issues. One is choice of the primary end point. which, again, I find to be relatively minor, and the strength of evidence is there, again, even if I go co-primary endpoints. And then the key to dealing with missing data is doing lots of reasonable sensitivity analyses. So again, I'm pleased that there's multiple ways in which the sensitivity analyses were approached for missing data.
Douglas Tsao
AnalystsAnd then I want to sort of turn to -- we're sort of getting close to the hour, and I want to cover the integrated analysis, and we've gotten some questions from the audience, as you can imagine. But Marcio, we have some sort of alternative hypotheses, and sort of additional integrated effectiveness, I think hypothesis 3 as well as 4. Maybe just quickly provide some perspectives on how that you think will sort of inform the agency's view? And Chuck, maybe you could provide some perspectives on how you look at these types of analyses.
Marcio Souza
ExecutivesYes. So the -- I'm going to start with the -- I never do this, but I'm going to start with the problem with comparing things like this, right? And normally, what we hear is there was no control on the second study. They're not done concomitantly. They didn't use the same covariant, blah, blah, blah. It's not the case here, right? So these two studies were literally stratified on the same parameters. And we -- from the beginning, we wanted to ask a question that how consistent is the arm on Study 2 to the one on Study 1. Since they were -- patients were unaware, right? They couldn't know which study they were at. And of course, we're not looking for exactly the same because by definition, like they wouldn't exactly the same is not something we see, but it's incredibly consistent, incredibly consistent. You saw the integrated analysis. What I can tell you is -- what is super interesting in this study is placebo on Essential1 and placebo on Essential3 are very, very similar, number one, and drug on the first arm of Study 1 for Essential3 and on the run-in period for Study 2 are incredibly consistent. But then we ask a slightly different question was actually another brilliant statistician that suggested, I wish I could take credit for that, it's like, why don't you formally compare placebo and Study 1 to drug on Study 2 since they were unaware right? And that's one of the hypothesis for. And all of that is significant. Of course, when everything is significant when you put them together, they get even more significant. We're talking about a small p-value. That p-value is insanely small, right? But the consistency of the effect, I think it's important, number one, as we're making decisions on putting patients on drug clinically. But secondly, you see again and again in FDA reviews how they talk about consistent of effect being important, but love to hear Chuck's perspective on that.
Chuck McCullogh
AttendeesI think you're right. When -- any time you look at sort of different ways to address the same question, and you see results that are not similar that sends up alarm bells. So yes, again, and of course, you're right that if you're finding statistically significant results with each individual comparison when you combine them, and the results are consistent in estimated size, it's just going to get more statistically significant. I'm much more suspicious when it looks like people are rescuing a bunch of nonstatistically significant results by combining. That's when Red Flag start going up in my mind.
Douglas Tsao
AnalystsAnd so what you would mean by that is when by sort of expanding or sort of by adding the populations you sort of start to overpower your analysis for arguably a non-clinically meaningful effect?
Chuck McCullogh
AttendeesNo, no. I mean, I did this 1 study, and I got a P-value of 0.06 and it didn't quite meet the threshold and I got a similar size effect in this other part of our overarching program, and it had a p-value of 0.08. So because they're consistent, I'm going to put them together and then magically, I get a p-value of 0.03.
Marcio Souza
ExecutivesNot at all, right? What happened here is like you have positive studies. Of course, you put it together, just trying to be intellectually honest. And when you get a p-value to be 10 of minus 12. Of course, if everything is positive and you continue to put them together, they're going to get smaller and smaller, like that is just logic in general, but we do get very, very consistent, which tells me the true benefit is being assessed on individual studies and on the combined study, and that's important. That's the #1 reason why you have to submit an integrated summary of efficacy to the FDA on an NDA to assess whether or not they are similar and here we are, in a sense, in a controlled way, getting that effect.
Douglas Tsao
AnalystsAnd Chuck, a question from the audience was sort of coming back to the question of dropout, right, especially due to AEs and when you have an imbalance, which we saw in this study, does it ever reach a point where it makes sort of you lose trust in the sensitivity analysis or tipping point analysis even?
Chuck McCullogh
AttendeesSo I think the -- it's not that I lose faith at a certain missing data rate. The way that I think about it is it puts more emphasis on how appropriate you think the modeling of the missing at a random is. So again, that's why you have to go to pretty high extremes in order to really stress test the system because the results then are very much model dependent, dependent on the model you're hypothesizing for the missing at a random data. So that's why I think it's important to do things like assume that the drug has no benefit or maybe even push it as far as a slight detriment just to make sure that things still hold.
Douglas Tsao
AnalystsSo to your point, that's where things like the jump to reference analysis becomes more important.
Chuck McCullogh
AttendeesRight and pushing the tipping point all the way out. So again, it's at some ridiculous level that it requires to tip it over, then you feel much more confident that even if some of the assumptions in your modeling weren't quite correct, you're still getting very strong and convincing results.
Douglas Tsao
AnalystsChuck, one thing that I meant to ask, so I'm going to ask , and I think that some people sort of looking at this have struggled with is the fact that the company had a futility analysis back in March. And they've had difficulty sort of wrapping their heads around that we would have a futility analysis in March, yet when it comes time to finally continuing with reading out the full study, not only do we have a positive study, but we have one that is overwhelmingly statistically significant. And so maybe just walk through from your perspective, does that raise red flags for you in any way? Or just what is the real possibilities or probabilities that, that would actually occur?
Chuck McCullogh
AttendeesYes. So yes, the things that would cause me to sort of scrutinize a study like this more carefully where, as we've already talked about, change in outcome, relatively high rates of missing data and sort of more of a curiosity, a futility analysis that suggested that there might be a reason to stop the study. So I don't know the details of how the futility analysis was conducted. But typically, you project ahead and you ask what's the probability of a positive i.e. statistically significant result once I've concluded with all the data collection. There are certain assumptions that have to go into that model. And if the data that's collected in the interim analysis is out at the halfway point, and then you've got another full half of the data to collect. If it's more optimistic than the data that you use to project ahead, of course, that projected probability of futility, not finding a statistically significant result can be off. And the proof is in the pudding in my mind. We don't need to go back and say what was the probability back then, given that we have convincing results now. I've seen things like this happen in the past, things that only happen 5% or 10% of the time happen 5% or 10% of the time. And so yes, it can happen.
Douglas Tsao
AnalystsAnd is there any way that the sort of futility analysis could have informed the company's decision to change the endpoint? And does that sort of raise questions for you?
Chuck McCullogh
AttendeesRight. So again, I turn back to the issue that it's a multiple testing issue. So what were the options at the time. So you did a futility analysis. And if I put myself in Marcio's place, it's like, what could I do that's legit that I can help to make this study be successful in the end. And changing little things like tiny tweaks on the analysis strategy, probably not going to make much of a difference. But for example, if I thought that the drug was going to have a much more immediate impact early on that might suggest I should move the time point earlier. Okay. So now if I'm going to critique this, I'm in the multiple testing arena, okay, what's the possible benefit that might be gained by the company? Again, even if I go with thinking of these things as 2 co-primary end points, that's the capitalization on the multiple testing issue that I might have advantaged by knowing about this futility analysis the penalties that I would apply wouldn't account for that sort of a discrepancy.
Douglas Tsao
AnalystsMarcio, in hindsight, did you -- what were sort of your the futility sort of threshold that you set? And do you think that -- have you in hindsight sort of recognize what perhaps that was flawed about that assumption?
Marcio Souza
ExecutivesLike we make -- like it's very easy, right? Sometimes we don't like the outcome, then we say the decision was wrong. It's actually decisions are a priority kind of always right and then we judge the outcome here is like -- can you go back and say, if we hadn't changed anything, it would be positive. Therefore, we made the wrong call. Now I think the call we made was scientifically sound just like Chuck just mentioned, asking about what do we know about the drug. And we know it act pretty fast, like we have very high concentrations here as we expected, right? And a minor change, I completely agree with that, would be a time point assessment. Now when you go back and recalculate knowing the results was the probability of being futile, it's actually not that small. Most people think when you're like, "Oh, that is a futility recommendation, you are on the 0.001% probably not being successful." That's not the case, right? It is actually -- you're making an assumption on that point in time. And it happened. I think it's I'm glad we didn't stop. I'm glad you were like very enrolled on this study, pretty much fully enrolled on the study at that point in time, which allowed us to finish. But I don't think like sitting here and saying what if is actually very helpful. It's like they said it's positive.
Douglas Tsao
AnalystsAnd, I think we're out of time. I mean, one final question, Marcio, I had, maybe did you perform sensitivities including things like jump to reference, which is arguably the most conservative for things like the original primary endpoint.
Marcio Souza
ExecutivesYes. And you can -- yes, and you can imagine it's sensitive to stress as well, right? And we're looking to other things, I think that Chuck mentioned, like we remove all the covariants and rerun. We added each one of them and rerun like things that you could say, is there anything that could break this, which for us was important, like is there -- are we being misled by the results? And the answer is no. This is just a very strong result. And to be honest, I'll end with this, Chuck, like I know we all care about like the markets, definitely our clients do, and I appreciate all our investors. But there is nothing for these patients out there. And ultimately, that's why we are developing this. So this is a very strong drug that's going to give a lot of relief to a lot of people, and we're just happy to be in this position to have the discussion with the FDA.
Douglas Tsao
AnalystsOkay. And so with that, Chuck, if you can give me sort of 1 or 2 minutes, just an overview of everything that you learned today because you sort of gave an assessment you felt the data step was strong. We've learned some more things from Marcio in the course of the call, which I don't think have necessarily been that dramatic, but they sort of add to the body of knowledge that we have. And so when you're leaving this call, how do you come out feeling about the robustness of this data set?
Chuck McCullogh
AttendeesYes, to both summarize and update slightly, strong analysis strategy. I wouldn't have suggested anything different. So I don't fault them at all on the analysis strategy. There are some technical details we haven't talked about on this call that were chosen that seemed to be pretty conservative and would lead to robust analyses. So I have absolutely no problem with the analysis strategy. The potential minor red flags, of course, are changed in the outcome. The relatively high rate of missing data, tipping point analysis that wasn't continued all the way to the end and the sort of curiosity about the futility analysis. And I feel pretty reassured about all those. They did a large number of sensitivity analyses, some preplanned and some not preplanned. And I think that's important because, again, you don't want to be depending on one single model for data that's missing not at random, again, as brought up by one of the participants in this call, especially when the rates are high. So it is important to try different mechanisms. And I'm also convinced by the fact that even when you stress test the system, the resulting P-values are relatively small.
Douglas Tsao
AnalystsOkay. Chuck, that was really helpful. And Marcio, thank you very much for taking the time and accommodating us and being willing to sort of come up with a gun both from myself as well as Chuck. And so with that, we'll let everybody go back to their day.
Marcio Souza
ExecutivesSounds good. Thanks. Very nice meeting you, Chuck, and thanks, Doug for everything.
For developers and AI pipelines
Programmatic access to Praxis Precision Medicines, Inc. earnings transcripts and 32,000+ others is available through the
EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments,
full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.