The Oncologist, Vol. 7, No. 3, 181-187,
June 2002
© 2002 AlphaMed Press
Interpreting Measures of Treatment Effect in Cancer Clinical Trials
L. Douglas Casea,c,
Gretchen Kimmickb,c,
Electra D. Pasketta,c,
Kurt Lohmana,
Robert Tuckerb,c
a Department of Public Health Sciences,
b Department of Internal Medicine,
c Comprehensive Cancer Center of Wake Forest University, Wake Forest University School of Medicine, Winston-Salem, North Carolina, USA
Correspondence:
L. Douglas Case, Ph.D., Section on Biostatistics, Department of Public Health Sciences, Wake Forest University School of Medicine, Medical Center Boulevard, Winston-Salem, North Carolina 27157, USA. Telephone: 336-716-1048; Fax: 336-716-5425; e-mail: dcase{at}wfubmc.edu
 |
ABSTRACT
|
|---|
The efficacy of a new cancer regimen is usually assessed by analyzing outcomes such as tumor response and overall survival. Many publications summarizing results of cancer clinical trials report measures such as odds ratios and hazard ratios, as these are the estimators of treatment effect obtained from regression models used to analyze the data. However, these measures are sometimes misinterpreted, as they are not necessarily familiar to many readers. The most common mistake is to interpret both measures as relative risks, an interpretation that can lead to an incorrect impression of the impact of the treatment on response and survival.
Key Words. Relative risk • Tumor response • Odds ratio • Survival • Hazard ratio
 |
INTRODUCTION
|
|---|
The effectiveness of a new agent or combination therapy in a prospective cancer clinical trial is usually assessed by analyzing outcomes such as tumor response and overall survival. Due to the widespread use of regression techniques for analyzing these outcomes, many publications summarizing results of cancer trials report measures of treatment effect such as odds ratios (ORs) and hazard ratios (HRs). Logistic regression, used to analyze tumor response, and Cox's proportional hazards regression, used to analyze censored survival data, give rise to the OR and HR, respectively, as measures of treatment effect. These are not necessarily familiar concepts to many readers, and they are sometimes misinterpreted. The tendency is to interpret both of these measures as relative risks (RRs), a more natural measure of effect in prospective clinical trials. For example, a doubling of the odds might be interpreted as a doubling in response or a 30% decrease in the hazard might be interpreted as a 30% improvement in 1-year or 5-year survival, both of which are incorrect interpretations. This misunderstanding is due partly to lack of training and partly to confusing definitions in the literature, as discussed by Granados [1].
Several articles have appeared in the past few years on the confusion between the OR and the RR [2-9]. Holcomb et al. [2] recently reviewed the obstetrics and gynecology literature for uses of the OR. In 151 articles, they found that the OR was interpreted as a risk ratio without justification 26% of the time; in addition, it overestimated the RR by more than 20% in 44% of the articles for which an RR could be calculated. McNutt et al. [3] discuss this issue in their study of health outcomes in violence research. Altman et al. [4] and Sinclair and Bracken [5] discuss the extent to which the OR overestimates the RR in prospective studies and clinical trials. Altman et al. [4] cite an example where an RR of 88 was reported, based on the OR, when the actual value should have been 7. Sinclair and Bracken [5] cite a similar example where an RR, based on the OR, was 16.2, when the actual value should have been 8.6. Davies et al. [6], Senn [7], Sackett et al. [8], and Walter [9] add additional insight to the controversy concerning the OR.
Clinicians likely want to know what the OR means in terms of differences or ratios in tumor response, and they might want to know what the HR means in terms of 1-year or 5-year survival. In this paper, we briefly review some of the measures of treatment effect typically reported in cancer clinical trials. Discussion is limited to the case of prospective trials, although these measures have applicability for other study designs as well. The definition of RR is reviewed first, since this is how the other measures are frequently interpreted. We then discuss ORs and HRs and show how they are related to the RR. We point out areas where, in our experience, misinterpretations are common. Examples are given to illustrate appropriate interpretations. An appendix is included to show how the OR and the HR arise from the logistic and proportional hazards regression models, respectively.
 |
RELATIVE RISK
|
|---|
In this section, we consider what is meant by RR, one of the most natural and least confusing of all the measures of treatment effect in prospective clinical trials. Consider the 2 x 2 cross-tabulation shown in Table 1
. In this table, a is the number of patients treated with regimen A who have an event (this could be something positive [e.g., response or pain relief] or something negative [e.g., toxicity or death within a year]), b is the number of patients treated with regimen A who do not have an event, and pA = a/(a + b) is the probability (or risk) of getting the event for regimen A patients; c, d, and pB = c/(c + d) are interpreted likewise for regimen B patients. The difference between the two risks is called the risk difference and is denoted by RD = pA pB, while their ratio is called the risk ratio or the RR and is given by RR = pA/pB.
An RD of 0 means the event occurred equally in the two regimens, while an RD of 0.5 means the event occurred 50% more often (in absolute terms, e.g., 70% versus 20%) for patients on regimen A compared with those on regimen B. An RR of 1 indicates the probability of an event is the same in both treatment groups, while an RR of 1.5 indicates that the event occurred 50% more often (in relative terms, e.g., 30% versus 20%) for patients treated with A compared with those treated with B. Another measure of effect that is sometimes reported is the reciprocal of the risk difference, which denotes the number of patients that must be treated with the experimental therapy to obtain an additional favorable event or to prevent an additional unfavorable event.
Examples
Consider a clinical trial in which 100 patients receive an experimental treatment and 100 receive the standard therapy. Each patient is followed for 1 year; 30 patients respond to standard therapy, and 70 patients respond to the experimental therapy. The probability of response in the first year is 30% for patients receiving the standard therapy and 70% for those receiving the experimental therapy, an absolute increase of 40%. The RR is 0.7/0.3 = 2.33, or a 133% increase in the probability of response in the experimental arm (100%(0.7-0.3)/0.3 = 133%).
Consider another study looking at antiemetic therapies to reduce the incidence of nausea and vomiting (N/V). Suppose 20% of the patients on the experimental and 30% of the patients on the standard therapy experience N/V. This can be interpreted as an absolute decrease of 10%, an RR of 0.2/0.3 = 0.67, or a 33% reduction in N/V (100%(0.3-0.2)/0.3 = 33%). Since the risk difference is 0.1, we would need to treat 10 patients with the experimental regimen (1/0.1) to keep one additional patient from experiencing N/V.
Cautions
As mentioned earlier, there is generally little confusion in the interpretation of an RR. The confusion that sometimes arises is when a reduction in risk is interpreted as an equivalent improvement in the event-free proportion of patients. This is only true if the probability of the event is 50%. For example, suppose 20% of patients have an immediate toxic reaction to standard therapy. A new experimental drug reduces this risk by 50% so that 10% of patients have the toxic reaction. The percentage of patients reaction free increases from 80% to 90%, a 12.5% relative increase. Had those numbers been 80% and 40%, the reaction-free percentage would have increased from 20% to 60%, a 200% increase. Had the numbers been 50% and 25%, the reaction-free percentage would have increased from 50% to 75%, a 50% increase. Additionally, since RR is a relative measure, an increase in response from 5% to 10% gives the same RR as an increase from 50% to 100%. But clearly the two results are quite different, and it is important also to provide information on the proportion of patients responding in each arm.
 |
ORS AND LOGISTIC REGRESSION
|
|---|
The odds of an event, for those who do not gamble, is not necessarily an intuitive concept. If the probability of an event is p, the odds of that event are given by p/(1 p). For example, if the probability of an event is 0.75, then the odds of that event are 0.75/0.25 = 3, or 3 to 1. That is, the probability of the event occurring is three times greater than the probability that the event will not occur. Referring back to Table 1
, let OA be the odds of an event in group A, then
Likewise, the odds of an event in group B, OB, is c/d. The ratio of these two odds, the OR, is given by
Note that this is the odds of having an event for group A patients relative to that for group B patients. A nice feature of the OR, which is not shared by the RR, is that the reciprocal of the OR for having an event is the OR for not having the event for patients on regimen A relative to those on regimen B. That is, the measure of association is the same whether the outcome is an event or the lack of an event. This symmetry led Senn [7] to suggest that the OR should be the gold standard measure of association. Sinclair and Bracken [5] note the lack of symmetry of the RR, but discount this as a reason for choosing the OR over the RR.
The OR is frequently reported in prospective cancer clinical trials, since it is the measure of effect that arises naturally from logistic regression. The popularity is certainly not due to its interpretability, as it has been labeled "incomprehensible" by some [10], a difficulty compounded by erroneous and misleading definitions in medical and statistical dictionaries [1]. The tendency is to interpret an OR as an RR, a practice that is reasonable if the probability of an event is small. However, in many clinical trials, the probability of an event (e.g., tumor response) is not small, so interpreting the OR as an RR can be misleading. Table 2
illustrates the relationship between ORs and RRs. A few observations can be made from this table. First, the OR and RR agree only when there is no association between response and treatment (i.e., RR = OR =1). Second, when there is association, the OR is always further from 1.0. Thus, if the OR is being used to estimate the RR (or is interpreted as such), it is likely to be an overestimate. Third, the two estimates are closer for smaller probabilities and for RRs closer to 1.0. For example, if the response to standard therapy is only 10%, RRs of 2.0 and 3.0 correspond with ORs of 2.25 and 3.86. However, if the response to standard therapy is 30%, RRs of 2.0 and 3.0 correspond with ORs of 3.5 and 21.0.
As mentioned above, the OR owes much of its popularity in prospective clinical trials to the use of logistic regression. Although, in a randomized clinical trial, one expects patient and disease characteristics to be fairly well balanced between the two regimens, treatment differences in response are usually assessed after adjusting for the chance imbalance in covariates that may have occurred between the two regimens. The principal analytic method used to assess the effect of treatment on response after adjustment for covariates is logistic regression [11-13]. A brief description of logistic regression can be found in the appendix. There, it can be seen that the exponential of the treatment coefficient can be interpreted as an estimate of the odds of response for patients treated with regimen A relative to that for patients treated with regimen B (i.e., the OR). So, for example, if the estimate of ß associated with treatment is 0.693, then the odds of response for the experimental patients are twice [exp(0.693) = 2.0] that of the patients on standard therapy. This does not mean that patients on the experimental regimen are twice as likely to respond (that is the interpretation of an RR).
Examples
Consider the examples mentioned above in which 70/100 patients responded to experimental therapy compared with 30/100 patients receiving standard therapy. The odds of response for patients receiving experimental therapy are 0.7/0.3 = 2.33. The odds of response for patients receiving standard therapy are 0.3/0.7 = 0.43. Thus, the OR equals 2.33/0.43 = 5.44. (Compare this to the RR of 2.33.) Were logistic regression to be used in this example with treatment as the only covariate, the estimate of the ß associated with treatment would be ln(5.44) = 1.695.
In the second example, the probability of nausea and vomiting in the groups receiving standard and experimental therapy was 0.3 and 0.2, respectively. Thus, the odds of nausea and vomiting are 0.3/0.7 = 0.4286 and 0.2/0.8 = 0.25, respectively, and the OR = 0.25/0.4286 = 0.5833, fairly close to the RR of 0.6667.
Cautions
ORs are not RRs, and in prospective clinical trials, one should be cautious in interpreting them in that fashion. This is because response in prospective trials is frequently too high for the OR to be a good estimate of the RR. Care should be taken when reporting the results of a logistic regression analysis. For example, suppose the estimated OR is 2. One should not state that the event is twice as likely to occur for patients receiving the new therapy, as that would refer to an RR. One could state that the odds of response were twice as great for patients receiving the new therapy. This is correct, but assumes that readers will interpret the OR correctly. Alternatively, one could translate the estimated OR into an estimated RR [14] or provide estimated responses in the two treatment groups [15].
 |
SURVIVAL ANALYSIS AND HRS
|
|---|
The most important and most objective end point in many cancer clinical trials is survival time, typically measured as the time from randomization until death or the last date of contact. Methods used to analyze time to death can be used to analyze other time to event end points (generically termed survival data), such as the time to disease recurrence or progression (i.e., disease-free survival and time to progression). Most clinical researchers are familiar with measures such as median survival and 1- or 5-year survival estimates, which are typically obtained using methods described by Kaplan and Meier [16]. A simple analysis of survival, and the one usually done first, is to calculate and present graphically the Kaplan-Meier estimates of survival for each treatment regimen. In addition, estimates of the median survival (with confidence intervals) or 1-, 3-, or 5-year survival probabilities (with standard errors) can be presented. A log-rank test [17] might then be used to assess the significance of any differences observed in these survival curves.
As is the case for treatment response mentioned above, most survival differences are usually assessed after adjustment for covariates. A large proportion, if not the majority, of these analyses is done using Cox's proportional hazards model [18]. A good basic introduction to this methodology is given by Kleinbaum [19], and more advanced discussions can be found in Kalbfleisch and Prentice [20] and Lawless [21]. A brief review of the Cox's proportional hazards model is given in the appendix. In this model, the ratio of the hazard functions (i.e., the HR [HR] of regimen A to regimen B) is given by exp[ß], where ß is the estimate of treatment effect derived from the model. If, for example, the estimate of treatment effect in a proportional hazards model is -0.5, this means the HR of regimen A to regimen B patients is exp[-0.5] = 0.6065. That is, the hazard at any time for patients on the experimental therapy is 61% that of patients on the standard therapy. How does this translate into survival at time t (which is denoted by S(t))? It is shown in the Appendix that the relationship between the two is given by SA(t) = (SB(t))HR.
Thus, an experimental treatment that decreases the hazard by 35% relative to the standard therapy (i.e., an HR of 0.65) increases survival at time t to SB(t)0.65. Table 3
shows survival estimates (and percent relative improvements in survival) in a group of patients receiving experimental therapy for various treatment effects (i.e., reductions in the hazard) and various survival estimates for patients receiving the standard therapy. Note that a reduction in the hazard of x% is equivalent to an HR of 1 x/100 (e.g., a reduction in the hazard of 30% results in an HR of 0.7). Also, note that the survival estimates listed in the table apply to any time, as long as the times are the same for the patients on both regimens. That is, if the value is a 2-year value for the patients on the standard regimen, the corresponding value for the patients on the experimental regimen is also 2 years.
View this table:
[in this window]
[in a new window]
|
Table 3. Survival estimates (relative increase in estimate) for patients receiving experimental therapy for various reductions in the hazard relative to patients on standard therapy, assuming proportional hazards
|
|
A few observations are immediately obvious. First, despite the fact that the Cox model looks complicated, it is relatively easy to interpret the model coefficients (i.e., exp(ß) is an HR). Second, a decrease in the hazard results in an increase in survival. Third, it is easy to calculate the expected effect that a reduction in the hazard will have on survival (i.e., predicted survival in group A will equal the survival in group B raised to the HR power).
Examples
Suppose an experimental regimen reduces the hazard (relative to a standard regimen) by 20% (i.e., HR = 0.8). If patients receiving standard therapy had a 1-year survival of 0.8, patients receiving experimental therapy would be expected to have a 1-year survival of 0.80.8 = 0.8365, that is a 4.6% relative improvement at 1 year ([(0.8365-0.8)/0.8]100%). If survival at 5 years in the standard group was 0.2, the experimental group would be expected to have a 5-year survival of 0.20.8 = 0.2759 a 38% relative improvement at 5 years.
Cautions
A few cautions are also in order. First, the HR should not be interpreted as an RR as described earlier (i.e., the ratio of survival or death probabilities). The ratio of survival probabilities in the two groups at any given time typically will not equal the HR. For example, if survival at 1 year for patients on the standard therapy is 60%, a decrease in the hazard of 40% (an HR of 0.6) would result in a 1-year survival of 74% (a ratio in survival probabilities at 1 year of 0.82 or 1.23). Note that under the proportional hazards assumption, the ratio of the logs of the survival times would equal the HR. Second, the relative improvement in survival (given by [(SA(t) SB(t))/SB(t)]100) may or may not be close to the percent reduction in the hazard (chances are it will not be close). This should not be surprising. Consider, for example, a 1-year survival of 90% for patients receiving standard therapy. The maximum percent relative increase in 1-year survival, corresponding with a 100% reduction in the hazard, would be [(1 - 0.9)/0.9]100% = 11.1%. However, had 1-year survival been 10%, the maximum percent increase would be [(1 - 0.1)/0.1]100% = 900%.
 |
DISCUSSION
|
|---|
Logistic regression and Cox's proportional hazards regression are valuable tools for analyzing response and survival data in prospective cancer clinical trials. These methods allow for the assessment of treatment outcomes after adjustment for multiple continuous and categorical covariates. Despite the seeming mathematical complexity of the models, the measures of treatment effect that result from their use are relatively easy to understand. However, they are not the measures of effect with which medical researchers are most familiar, so they are sometimes misinterpreted. The most common mistake is to interpret ORs and HRs as RRs.
Unfortunately, some of the medical, epidemiologic, and statistical literature adds to the confusion. Granados [1] reported that the definition of odds was not given in several medical and mathematical dictionaries, that another dictionary incorrectly defined the OR as an RR, and that three others defined the OR using the definition of odds, rather than the ratio of two odds. Given such inconsistency, it should not be surprising that measures of treatment effect are misinterpreted. Some of the misinterpretations (such as interpreting the OR as an RR) can lead to an overestimation of the clinical effect. It is important, given the ubiquity of statistical models for analyzing response and survival in cancer clinical trials, that measures such as ORs and HRs be clearly understood, lest we misstate the benefit of clinical treatments. This is particularly relevant for physicians, who must communicate the statistical findings of cancer clinical trials into terms their patients can understand.
 |
Appendix 1. Logistic and proportional hazards regression
|
|---|
Logistic Regression
The logistic model can be written as follows:
 | (1) |
where p denotes the probability of an event for a patient with covariates X1, X2, X3,... That is, the natural log of the odds (called logit) is written as a linear combination of the covariates. In this model, ß0 denotes the logit for a patient with all covariates = 0, X1 might denote treatment (coded such that X1 = 1 for patients treated with regimen A, the experimental therapy, and X1 = 0 for patients treated with regimen B, the standard therapy), X2 might be age (coded in years), and X3 might be gender (1 for females, 0 for males).
Equation 1
can be rewritten as
So, for group A (X1 = 1) the odds of response can be written as
while for group B (X1 = 0) the odds are written as
The ratio of these two odds is given by
Thus, the treatment effect estimated with a logistic regression analysis can be interpreted rather simply. That is, the exponential of ß1 is the estimate of the odds of response for patients treated with regimen A relative to that for patients treated with regimen B. So, for example, if the estimate of ß associated with treatment is 0.693, then the odds of response for the experimental patients are twice [exp(0.693) = 2.0] that of the patients on standard therapy. This does not mean that patients on the experimental regimen are twice as likely to respond (that is the interpretation of an RR). The effect of other covariates can be interpreted similarly. The exponential of their regression coefficients represents the increase in the odds of response for a one unit change in the covariate.
Note that Equation 1
can be rewritten as
 | (2) |
so the effect of changes in the covariates on the probability of response (p) can be easily calculated.
Cox's Proportional Hazards Regression
The Cox proportional hazards model can be written as follows:
 | (3) |
where h(t,X) denotes the hazard at time t for a patient with covariates X1, X2, X3, ... A hazard is a failure rate at time t. It is the probability that a patient fails between t and
t, given that he has survived up to time t, divided by
t, as
t approaches zero. In this model, h0(t) denotes the baseline hazard at time t (the hazard of a patient with all covariates = 0), and X1, X2, and X3 might denote treatment, age, and gender, respectively, as defined above. Equation 3
can be rewritten as
 | (4) |
So, for regimen A patients (i.e., X1 = 1), the hazard function is given by
while for regimen B patients (i.e., X1 = 0), the hazard function is given by
The ratio of these hazard functions, (i.e., the HR of regimen A to regimen B), is given by
If, for example, the estimate of treatment effect in a proportional hazards model is -0.5, this means the HR of regimen A to regimen B patients is exp[-0.5] = 0.6065. That is, the hazard at any time for patients on the experimental therapy is 61% that of patients on the standard therapy. How does this translate into survival at time t (which is denoted by S(t))? S(t) and h(t) go hand in hand; that is, one can be derived from the other. The relationship between the two is given by
Thus, the survival functions for patients on the two regimens are given by
and
 |
ACKNOWLEDGMENT
|
|---|
Supported in part by grants P30-CA-12197 and U10-CA-81851 from the Public Health Service, National Institutes of Health.
 |
REFERENCES
|
|---|
-
Granados JAT. Odds and odds ratio: an odd confusion [letter]. Epidemiology 1995;6:571572.[CrossRef][Medline]
-
Holcomb Jr WL, Chaiworapongsa T, Luke DA et al. An odd measure of risk: use and misuse of the odds ratio. Obstet Gynecol 2001;98:685688.[CrossRef][Medline]
-
McNutt LA, Holcomb JP, Carlson BE. Logistic regression analysis: when the odds ratio does not workan example using intimate partner violence data. J Interpers Violence 2000;15:10501059.[Abstract/Free Full Text]
-
Altman DG, Deeks JJ, Sackett DL. Odds ratios should be avoided when events are common [letter]. BMJ 1998;317:1318.[Free Full Text]
-
Sinclair JC, Bracken MB. Clinically useful measures of effect in binary analyses of randomized trials. J Clin Epidemiol 1994;47:881889.[CrossRef][Medline]
-
Davies HTO, Crombie IK, Tavakoli M. When can odds ratios mislead? BMJ 1998;316:989991.[Free Full Text]
-
Senn S. Rare distinction and common fallacy [electronic response]. Available at: bmj.com/cgi/eletters/317/7168/1318 #3089 May 1999.
-
Sackett DL, Deeks JJ, Altman DG. Down with odds ratios! Evidence-Based Med 1996;1:164166.
-
Walter SD. Choice of effect measure for epidemiological data. J Clin Epidemiol 2000;53:931939.[CrossRef][Medline]
-
Lee J. Odds ratio or relative risk for cross-sectional data [letter]? Int J Epidemiol 1994;23:201203.[Free Full Text]
-
Lee J. Covariance adjustment of rates based on the multiple logistic regression model. J Chronic Dis 1981;34:415426.[CrossRef][Medline]
-
Kleinbaum DG, Kupper LL, Morgenstern H. Epidemiologic Research: Principles and Quantitative Methods. New York: Van Nostrand Reinhold, 1982;1-529.
-
Hosmer DW, Lemshow S. Applied Logistic Regression. New York: John Wiley & Sons, 1989;1-307.
-
Zhang J, Yu KF. What is the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. JAMA 1998;280:16901691.[Abstract/Free Full Text]
-
Brant R. Digesting logistic regression results. Am Stat 1996;50:117119.
-
Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc 1958;53:457481.[CrossRef]
-
Peto R, Pike MC, Armitage P et al. Design and analysis of randomized clinical trials requiring prolonged observation of each patient. II. Analysis and examples. Br J Cancer 1977;35:139.[Medline]
-
Cox DR. Regression models and life-tables. J R Stat Soc B 1972;34:187202.
-
Kleinbaum DG. Survival Analysis: A Self-Learning Text. New York: Springer, 1997;1-324.
-
Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. New York: John Wiley & Sons, 1980;1-321.
-
Lawless JF. Statistical Models and Methods for Lifetime Data. New York: John Wiley & Sons, 1982;1-580.
Received January 25, 2002;
accepted for publication March 1, 2002.
This article has been cited by other articles:

|
 |

|
 |
 
P. L. Morgan, G. Farkas, and Qiong Wu
Kindergarten Predictors of Recurring Externalizing and Internalizing Psychopathology in the Third and Fifth Grades
Journal of Emotional and Behavioral Disorders,
June 1, 2009;
17(2):
67 - 79.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
P. L. Morgan, G. Farkas, P. A. Tufis, and R. A. Sperling
Are Reading and Behavior Problems Risk Factors for Each Other?
J Learn Disabil,
September 1, 2008;
41(5):
417 - 436.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
F Richy, O Bruyere, O Ethgen, V Rabenda, G Bouvenot, M Audran, G Herrero-Beaumont, A Moore, R Eliakim, M Haim, et al.
Time dependent risk of gastrointestinal complications induced by non-steroidal anti-inflammatory drug use: a consensus statement using a meta-analytic approach
Ann Rheum Dis,
July 1, 2004;
63(7):
759 - 766.
[Abstract]
[Full Text]
[PDF]
|
 |
|