© 2002 AlphaMed Press Interpreting Measures of Treatment Effect in Cancer Clinical Trialsa Department of Public Health Sciences, b Department of Internal Medicine, c Comprehensive Cancer Center of Wake Forest University, Wake Forest University School of Medicine, Winston-Salem, North Carolina, USA Correspondence: L. Douglas Case, Ph.D., Section on Biostatistics, Department of Public Health Sciences, Wake Forest University School of Medicine, Medical Center Boulevard, Winston-Salem, North Carolina 27157, USA. Telephone: 336-716-1048; Fax: 336-716-5425; e-mail: dcase{at}wfubmc.edu
The efficacy of a new cancer regimen is usually assessed by analyzing outcomes such as tumor response and overall survival. Many publications summarizing results of cancer clinical trials report measures such as odds ratios and hazard ratios, as these are the estimators of treatment effect obtained from regression models used to analyze the data. However, these measures are sometimes misinterpreted, as they are not necessarily familiar to many readers. The most common mistake is to interpret both measures as relative risks, an interpretation that can lead to an incorrect impression of the impact of the treatment on response and survival. Key Words. Relative risk • Tumor response • Odds ratio • Survival • Hazard ratio
The effectiveness of a new agent or combination therapy in a prospective cancer clinical trial is usually assessed by analyzing outcomes such as tumor response and overall survival. Due to the widespread use of regression techniques for analyzing these outcomes, many publications summarizing results of cancer trials report measures of treatment effect such as odds ratios (ORs) and hazard ratios (HRs). Logistic regression, used to analyze tumor response, and Cox's proportional hazards regression, used to analyze censored survival data, give rise to the OR and HR, respectively, as measures of treatment effect. These are not necessarily familiar concepts to many readers, and they are sometimes misinterpreted. The tendency is to interpret both of these measures as relative risks (RRs), a more natural measure of effect in prospective clinical trials. For example, a doubling of the odds might be interpreted as a doubling in response or a 30% decrease in the hazard might be interpreted as a 30% improvement in 1-year or 5-year survival, both of which are incorrect interpretations. This misunderstanding is due partly to lack of training and partly to confusing definitions in the literature, as discussed by Granados [1]. Several articles have appeared in the past few years on the confusion between the OR and the RR [2-9]. Holcomb et al. [2] recently reviewed the obstetrics and gynecology literature for uses of the OR. In 151 articles, they found that the OR was interpreted as a risk ratio without justification 26% of the time; in addition, it overestimated the RR by more than 20% in 44% of the articles for which an RR could be calculated. McNutt et al. [3] discuss this issue in their study of health outcomes in violence research. Altman et al. [4] and Sinclair and Bracken [5] discuss the extent to which the OR overestimates the RR in prospective studies and clinical trials. Altman et al. [4] cite an example where an RR of 88 was reported, based on the OR, when the actual value should have been 7. Sinclair and Bracken [5] cite a similar example where an RR, based on the OR, was 16.2, when the actual value should have been 8.6. Davies et al. [6], Senn [7], Sackett et al. [8], and Walter [9] add additional insight to the controversy concerning the OR. Clinicians likely want to know what the OR means in terms of differences or ratios in tumor response, and they might want to know what the HR means in terms of 1-year or 5-year survival. In this paper, we briefly review some of the measures of treatment effect typically reported in cancer clinical trials. Discussion is limited to the case of prospective trials, although these measures have applicability for other study designs as well. The definition of RR is reviewed first, since this is how the other measures are frequently interpreted. We then discuss ORs and HRs and show how they are related to the RR. We point out areas where, in our experience, misinterpretations are common. Examples are given to illustrate appropriate interpretations. An appendix is included to show how the OR and the HR arise from the logistic and proportional hazards regression models, respectively.
In this section, we consider what is meant by RR, one of the most natural and least confusing of all the measures of treatment effect in prospective clinical trials. Consider the 2 x 2 cross-tabulation shown in Table 1
An RD of 0 means the event occurred equally in the two regimens, while an RD of 0.5 means the event occurred 50% more often (in absolute terms, e.g., 70% versus 20%) for patients on regimen A compared with those on regimen B. An RR of 1 indicates the probability of an event is the same in both treatment groups, while an RR of 1.5 indicates that the event occurred 50% more often (in relative terms, e.g., 30% versus 20%) for patients treated with A compared with those treated with B. Another measure of effect that is sometimes reported is the reciprocal of the risk difference, which denotes the number of patients that must be treated with the experimental therapy to obtain an additional favorable event or to prevent an additional unfavorable event.
Examples Consider another study looking at antiemetic therapies to reduce the incidence of nausea and vomiting (N/V). Suppose 20% of the patients on the experimental and 30% of the patients on the standard therapy experience N/V. This can be interpreted as an absolute decrease of 10%, an RR of 0.2/0.3 = 0.67, or a 33% reduction in N/V (100%(0.3-0.2)/0.3 = 33%). Since the risk difference is 0.1, we would need to treat 10 patients with the experimental regimen (1/0.1) to keep one additional patient from experiencing N/V.
Cautions
The odds of an event, for those who do not gamble, is not necessarily an intuitive concept. If the probability of an event is p, the odds of that event are given by p/(1 p). For example, if the probability of an event is 0.75, then the odds of that event are 0.75/0.25 = 3, or 3 to 1. That is, the probability of the event occurring is three times greater than the probability that the event will not occur. Referring back to Table 1
Likewise, the odds of an event in group B, OB, is c/d. The ratio of these two odds, the OR, is given by
Note that this is the odds of having an event for group A patients relative to that for group B patients. A nice feature of the OR, which is not shared by the RR, is that the reciprocal of the OR for having an event is the OR for not having the event for patients on regimen A relative to those on regimen B. That is, the measure of association is the same whether the outcome is an event or the lack of an event. This symmetry led Senn [7] to suggest that the OR should be the gold standard measure of association. Sinclair and Bracken [5] note the lack of symmetry of the RR, but discount this as a reason for choosing the OR over the RR.
The OR is frequently reported in prospective cancer clinical trials, since it is the measure of effect that arises naturally from logistic regression. The popularity is certainly not due to its interpretability, as it has been labeled "incomprehensible" by some [10], a difficulty compounded by erroneous and misleading definitions in medical and statistical dictionaries [1]. The tendency is to interpret an OR as an RR, a practice that is reasonable if the probability of an event is small. However, in many clinical trials, the probability of an event (e.g., tumor response) is not small, so interpreting the OR as an RR can be misleading. Table 2
As mentioned above, the OR owes much of its popularity in prospective clinical trials to the use of logistic regression. Although, in a randomized clinical trial, one expects patient and disease characteristics to be fairly well balanced between the two regimens, treatment differences in response are usually assessed after adjusting for the chance imbalance in covariates that may have occurred between the two regimens. The principal analytic method used to assess the effect of treatment on response after adjustment for covariates is logistic regression [11-13]. A brief description of logistic regression can be found in the appendix. There, it can be seen that the exponential of the treatment coefficient can be interpreted as an estimate of the odds of response for patients treated with regimen A relative to that for patients treated with regimen B (i.e., the OR). So, for example, if the estimate of ß associated with treatment is 0.693, then the odds of response for the experimental patients are twice [exp(0.693) = 2.0] that of the patients on standard therapy. This does not mean that patients on the experimental regimen are twice as likely to respond (that is the interpretation of an RR).
Examples In the second example, the probability of nausea and vomiting in the groups receiving standard and experimental therapy was 0.3 and 0.2, respectively. Thus, the odds of nausea and vomiting are 0.3/0.7 = 0.4286 and 0.2/0.8 = 0.25, respectively, and the OR = 0.25/0.4286 = 0.5833, fairly close to the RR of 0.6667.
Cautions
The most important and most objective end point in many cancer clinical trials is survival time, typically measured as the time from randomization until death or the last date of contact. Methods used to analyze time to death can be used to analyze other time to event end points (generically termed survival data), such as the time to disease recurrence or progression (i.e., disease-free survival and time to progression). Most clinical researchers are familiar with measures such as median survival and 1- or 5-year survival estimates, which are typically obtained using methods described by Kaplan and Meier [16]. A simple analysis of survival, and the one usually done first, is to calculate and present graphically the Kaplan-Meier estimates of survival for each treatment regimen. In addition, estimates of the median survival (with confidence intervals) or 1-, 3-, or 5-year survival probabilities (with standard errors) can be presented. A log-rank test [17] might then be used to assess the significance of any differences observed in these survival curves. As is the case for treatment response mentioned above, most survival differences are usually assessed after adjustment for covariates. A large proportion, if not the majority, of these analyses is done using Cox's proportional hazards model [18]. A good basic introduction to this methodology is given by Kleinbaum [19], and more advanced discussions can be found in Kalbfleisch and Prentice [20] and Lawless [21]. A brief review of the Cox's proportional hazards model is given in the appendix. In this model, the ratio of the hazard functions (i.e., the HR [HR] of regimen A to regimen B) is given by exp[ß], where ß is the estimate of treatment effect derived from the model. If, for example, the estimate of treatment effect in a proportional hazards model is -0.5, this means the HR of regimen A to regimen B patients is exp[-0.5] = 0.6065. That is, the hazard at any time for patients on the experimental therapy is 61% that of patients on the standard therapy. How does this translate into survival at time t (which is denoted by S(t))? It is shown in the Appendix that the relationship between the two is given by SA(t) = (SB(t))HR.
Thus, an experimental treatment that decreases the hazard by 35% relative to the standard therapy (i.e., an HR of 0.65) increases survival at time t to SB(t)0.65. Table 3
A few observations are immediately obvious. First, despite the fact that the Cox model looks complicated, it is relatively easy to interpret the model coefficients (i.e., exp(ß) is an HR). Second, a decrease in the hazard results in an increase in survival. Third, it is easy to calculate the expected effect that a reduction in the hazard will have on survival (i.e., predicted survival in group A will equal the survival in group B raised to the HR power).
Examples
Cautions
Logistic regression and Cox's proportional hazards regression are valuable tools for analyzing response and survival data in prospective cancer clinical trials. These methods allow for the assessment of treatment outcomes after adjustment for multiple continuous and categorical covariates. Despite the seeming mathematical complexity of the models, the measures of treatment effect that result from their use are relatively easy to understand. However, they are not the measures of effect with which medical researchers are most familiar, so they are sometimes misinterpreted. The most common mistake is to interpret ORs and HRs as RRs. Unfortunately, some of the medical, epidemiologic, and statistical literature adds to the confusion. Granados [1] reported that the definition of odds was not given in several medical and mathematical dictionaries, that another dictionary incorrectly defined the OR as an RR, and that three others defined the OR using the definition of odds, rather than the ratio of two odds. Given such inconsistency, it should not be surprising that measures of treatment effect are misinterpreted. Some of the misinterpretations (such as interpreting the OR as an RR) can lead to an overestimation of the clinical effect. It is important, given the ubiquity of statistical models for analyzing response and survival in cancer clinical trials, that measures such as ORs and HRs be clearly understood, lest we misstate the benefit of clinical treatments. This is particularly relevant for physicians, who must communicate the statistical findings of cancer clinical trials into terms their patients can understand.
Logistic Regression The logistic model can be written as follows:
Equation 1
So, for group A (X1 = 1) the odds of response can be written as
The ratio of these two odds is given by
Thus, the treatment effect estimated with a logistic regression analysis can be interpreted rather simply. That is, the exponential of ß1 is the estimate of the odds of response for patients treated with regimen A relative to that for patients treated with regimen B. So, for example, if the estimate of ß associated with treatment is 0.693, then the odds of response for the experimental patients are twice [exp(0.693) = 2.0] that of the patients on standard therapy. This does not mean that patients on the experimental regimen are twice as likely to respond (that is the interpretation of an RR). The effect of other covariates can be interpreted similarly. The exponential of their regression coefficients represents the increase in the odds of response for a one unit change in the covariate.
Note that Equation 1
so the effect of changes in the covariates on the probability of response (p) can be easily calculated.
Cox's Proportional Hazards Regression
t, given that he has survived up to time t, divided by t, as t approaches zero. In this model, h0(t) denotes the baseline hazard at time t (the hazard of a patient with all covariates = 0), and X1, X2, and X3 might denote treatment, age, and gender, respectively, as defined above. Equation 3
So, for regimen A patients (i.e., X1 = 1), the hazard function is given by
The ratio of these hazard functions, (i.e., the HR of regimen A to regimen B), is given by
If, for example, the estimate of treatment effect in a proportional hazards model is -0.5, this means the HR of regimen A to regimen B patients is exp[-0.5] = 0.6065. That is, the hazard at any time for patients on the experimental therapy is 61% that of patients on the standard therapy. How does this translate into survival at time t (which is denoted by S(t))? S(t) and h(t) go hand in hand; that is, one can be derived from the other. The relationship between the two is given by
Thus, the survival functions for patients on the two regimens are given by
Supported in part by grants P30-CA-12197 and U10-CA-81851 from the Public Health Service, National Institutes of Health.
This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||