The Oncologist, Vol. 11, No. 6, 541-552, June 2006; doi:10.1634/theoncologist.11-6-541 © 2006 AlphaMed Press
Uses and Abuses of Tumor Markers in the Diagnosis, Monitoring, and Treatment of Primary and Metastatic Breast CancerDepartment of Internal Medicine, Breast Oncology Program, University of Michigan Comprehensive Cancer Center, Ann Arbor, Michigan, USA Key Words. Tumor marker • Her-2/neu • Breast cancer • Estrogen receptor • Prognostic factor • Predictive factor Correspondence: Daniel F. Hayes, M.D., University of Michigan Comprehensive Cancer Center, 1500 East Medical Center Drive, Ann Arbor, Michigan 48109, USA. Telephone: 734-615-6725; Fax: 734-615-3947; e-mail: hayesdf{at}umich.edu Received February 22, 2006; accepted for publication April 11, 2006.
Access and take the CME test online and receive 1 AMA PRA category 1 credit at CME.TheOncologist.com
Although breast cancer incidence continues to increase, mortality has been decreasing, principally as a result of earlier detection and improvements in adjuvant systemic therapy. Nonetheless, because antineo-plastic agents are associated with substantial morbidity and occasional mortality, efforts to individualize treatment strategies are desirable. In addition to classic histopathologic diagnosis, molecular and cellular tumor markers may help in establishing prognosis or prediction of benefit. Recommendations for routine use of tumor markers in breast cancer have been conservative. Although several studies have been reported, few are of sufficiently high level of evidence to permit solid conclusions. Three key issues in tumor marker evaluation are utility, magnitude, and reliability. Poorly conceived study designs cloud the issue of how the marker might be used. Reliance on p-values rather than the size of the differences in outcome between patients who are positive and those who are negative for the factor obscures the importance. Technical issues result in poor reproducibility and interpretability of assays. Analytical issues lead to poorly defined cutoff values for marker levels. Poor patient selection leads to difficulty interpreting results because of confounders such as differences in treatment regimens. This review focuses on these issues, with an emphasis on currently accepted tumor markers. Finally, new tumor marker reporting recommendations are discussed, the adoption of which may lead to improved design and publication of tumor marker studies in the future.
Breast cancer is the most common cancer in women in the U.S., with an estimated 213,000 new cases diagnosed in the U.S. in 2005 [1]. Despite these increasing numbers, mortality from breast cancer continues to decline. This decline is felt to be a result of a combination of earlier detection of disease as a result of screening and improved treatment of disease with adjuvant systemic therapies [2, 3]. A majority of patients are cured with surgery and radiation therapy alone, and these patients will gain no additional benefit from adjuvant systemic therapies. In addition, having a high risk for recurrence does not imply that systemic therapy will prevent it. Even for those who recur, overall survival and palliation of symptoms for patients with metastatic breast cancer (MBC) has improved with the advent of new therapies. However, currently available methods are inadequate to help the clinician precisely predict a priori which patients will benefit from many of the available therapies. For patients with early-stage breast cancer, it would be helpful to identify which patients will relapse without adjuvant systemic therapy, so that only patients who receive benefit are exposed to the inherent toxicities. Approach to treatment of metastatic disease is generally with palliative intent rather than for cure. In this setting, identification of those patients with rapidly progressive disease permits selection of more rapidly acting but perhaps more toxic therapy. During the past few decades, with the explosion of molecular technology and understanding of the biology of breast cancer, numerous studies have been performed to identify prognostic and predictive factors in breast cancer, with mixed success. Multiple expert panels have convened to analyze available data in order to establish guidelines for the use of tumor markers, but their recommendations have been very conservative [4, 5]. In this review, we address the pitfalls that have led to difficulties establishing tumor markers for routine clinical use, with a specific focus on tumor markers in breast cancer.
The American Society of Clinical Oncology (ASCO) convened a panel of experts that first published recommendations regarding the use of circulating and tissue-based tumor markers in breast cancer in 1996 [6] and most recently updated these recommendations in 2001(Table 1
Routine measurement of multiple tissue markers was also discussed in the guidelines. The panel recommended routine measurement of estrogen and progesterone receptors (ER and PgR, respectively) to identify patients most likely to benefit from endocrine therapy in either the early or metastatic diseasesettings.Inaddition,measurementofHer-2/neuover-expression and possibly amplification was recommended for all patients at the time of initial diagnosis or recurrence, as it is predictive of response to trastuzumab (Herceptin®; Genentech, South San Francisco, CA), a monoclonal antibody directed against the Her-2/neu receptor [810]. The panel felt that data to support assessment of other tissue-based markers, including p53, cathepsin D, and flow cytometry-derived estimates of DNA content or S phase, were insufficient to recommend usage in routine clinical practice. Therefore, despite the large number of research studies evaluating the prognostic and predictive ability of numerous tumor markers in breast cancer, the ASCO panel recommended few for routine use in clinical practice. Why were these recommendations so conservative? In the succeeding sections of this paper, we outline the multiple factors that underlie this conservative approach.
When evaluating tumor markers for use in clinical practice, clinicians should consider their utility, the magnitude of their effects, and their reliability (Table 2
Tumor markers can also help determine prognosis independent of therapy and predict response to therapy. Prognostic factors reflect the metastatic potential and/or growth rate of the tumor and are used to select patient outcomes without consideration of treatment given [13]. Predictive factors, on the other hand, reflect the sensitivity or resistance of a tumor to a therapeutic agent and therefore are used to predict which patients are likely to respond to a specific treatment [14]. Pure prognostic and predictive factors are depicted schematically in Figures 1A and 1B
Few tumor markers are purely prognostic or predictive. In fact, most tumor markers have mixed prognostic and predictive features, and the utility typically depends on the therapeutic agent in question. For example, ER expression is weakly favorably prognostic but strongly predictive of response to treatment with endocrine therapy, as illustrated in Figure 1C
Once a tumor marker use has been identified, it is important to determine the magnitude of the difference in outcomes for that particular use between those who are marker positive and those who are not. By evaluating the difference in outcome, regardless of treatment, between a patient positive for a given prognostic factor and one who is negative for the factor, the relative strength of a prognostic factor can be determined [15]. This assessment requires the selection of an appropriate outcome of interest, such as improvement in symptoms or survival, or surrogates of these end points, such as response rates or progression-free survival.
For example, a breast cancer patient with disease in the lymph nodes at the time of diagnosis is two to three times more likely to have a breast cancer event (local recurrence or distant metastasis) than a patient without lymph node involvement, regardless of treatment. Since lymph node status has classically been used to make clinical decisions, we have arbitrarily designated it as a "strong" prognostic factor, using it as the gold standard to set the criteria for consideration of other, putative markers [16]. A strong prognostic factor is depicted by Factor 1 in Figure 1A
Predictive factors can also be classified as weak (Factor 1), moderate, or strong (Factor 2), depending on their ability to predict response to, and therefore benefit from, a given therapy, as illustrated in Figure 1B
The preceding discussion illustrates the importance of estimating the magnitude of the relative tumor marker effect for a selected use. However, the marker is only useful if the estimate of its magnitude is reliable and reproducible. In this regard, many investigators conclude that their marker of interest has clinical utility if in their study the difference in outcomes between marker "positive" and marker "negative" patients is less than conventional measures of statistical significant (p < .05). This conclusion may be mistaken. Statistical significance only suggests that in the population chosen for that study, the differences observed are likely not to be a result of chance alone. It does not imply clinical utility, nor does a p-value <0.05 document the validity of the tumor marker. Although it is important to determine that the differences in outcome achieve statistical significance, statistical significance alone does not determine clinical utility. In addition to determining when to use a tumor marker and the magnitude of its effect, it is important to ensure that the technical aspects of the marker are reliable and reproducible and that the study design and conduct are appropriate to test the marker for a clinical use of interest. Several problems with tumor marker studies, including technical, analytical, and trial design issues, have limited the introduction of new prognostic and predictive factors into routine clinical practice [11].
What Technical Factors Influence Measurement of Markers? For example, Her-2/neu status can be determined by measures of protein expression (by immunohistochemistry [IHC], Western blotting, or enzyme-linked immunosorbent assay), measures of RNA expression (by Northern blotting or reverse transcriptase-polymerase chain reaction [RT-PCR]), and/or measures of DNA amplification (by fluorescence or chromogenic in situ hybridization [FISH and CISH, respectively]). Furthermore, even within these categories, different reagents (e.g., different antibodies in IHC assays) are used in different tests. The results are not interchangeable, either within or between classes of assays, and therefore researchers must decide which methodology they will employ. Once that decision is made, researchers must then decide how to perform the assay. For example, when assessing Her-2/neu overexpression by IHC, technical issues such as antibody concentration and antigen retrieval methods may cause unacceptably high false-positive or false-negative rates. In one study, IHC and FISH resulted in only a 65% agreement for Her-2/neu status [23]. In a different study, results obtained from local laboratories were compared with those from a central laboratory for two Her-2/neu assays, the HercepTestTM IHC assay (Dako North America, Inc., Carpinteria, CA) and the FISH assay, with 79% concordance for HercepTestTM and 85% concordance for FISH [24]. Therefore, for the same test at multiple laboratories, and for different tests for the same marker, there is a significant degree of discordance for two commonly used tests for the evaluation of Her-2/neu status. The stakes are high. Recently reported data suggest that adjuvant trastuzumab decreases recurrence rates by 50%. However, up to 5% of patients who receive trastuzumab develop cardiac dysfunction, and the cost of 1 year of therapy may exceed $100,000. Therefore, it is essential that Her-2/neu, the target for trastuzumab, be assayed accurately and precisely for every tissue sample. Expert panels are now being convened to establish guidelines for the conduct and interpretation of common tumor marker assays, including ER and Her-2/neu. These guidelines should lead to standardization of the assays, which should allow for more reliable results both for routine clinical practice and for use of these assays in clinical trials.
What Analytical Issues Are Important to Consider? Assay Interpretation
Cutoff Point Determination
Cutoff points may be defined using either arbitrary or data-derived methods (Table 4
Deriving cutoff points based on patient outcome data may provide more accurate values. For example, the cutoff point for ER expression was first defined by limits of the assay and later by determining the optimal level that distinguished those patients who respond to hormonal therapy from those who do not. In another example, the cutoff point for the CellSearchTM assay for circulating tumor cells was initially determined based on differences in time to progression of a test set of patients with metastatic disease, and this cutoff was then validated with an independent but similar patient cohort from the same study [31]. Another common method to generate a data-derived cutoff point is to construct a receiver operating characteristic curve, which demonstrates the tradeoff between the sensitivity and specificity of an assay at different cutoff points. Recently, a novel data-derived method to select cutoff points, designated subpopulation treatment effect pattern plot (STEPP) analysis, has been proposed [32]. STEPP analysis evaluates outcomes to specific treatments in sub-populations of patients within randomized clinical trials or meta-analyses [32]. For example, it has been proposed that recurrence rates after treatment with chemotherapy should be evaluated in the context of the endocrine responsiveness of tumors, since HR-positive and -negative tumors appear to behave differently. Rather than arbitrarily defining cutoff points for ER positive and negative, the authors performed a STEPP analysis of data from a previously conducted randomized clinical trial and were able to demonstrate a benefit from chemotherapy only in the subset of patients with very low ER values.
Cutoff Point Validation For example, the Oncotype DXTM assay is based on the principle of evaluating expression of multiple candidate genes using quantitative RT-PCR [29]. The investigators initially screened more than 200 candidate genes with the aim of developing a test that would predict the likelihood of recurrence of cancer in patients with HR-positive, lymph node-negative breast cancer. Breast cancer tissues from 447 patients with HR-positive, lymph node-negative tumors were used retrospectively to generate an algorithm using 16 of these genes that permitted division of patients into subgroups with very low, intermediate, or very high risk for recurrence. Patients are assigned to these groups based on a "recurrence score" derived from the algorithm. The majority of the test samples were obtained from patients treated with tamoxifen alone in the National Surgical Adjuvant Breast and Bowel Project (NSABP) B-20 trial [33], and the data-derived algorithm was then validated using a separate retrospective cohort of patients from the NSABP B-14 trial [34], who had similar clinical characteristics and were also treated with tamoxifen alone. If the cohorts had different clinical characteristics or had been treated differently, the validation would not have been legitimate. Validation of the results in a separate patient population strongly suggests that the test is reliable and that the results are likely to be meaningful in a larger population, as long as the patients tested are similar to the cohorts included in the original studies.
Statistical Analysis
One of the key steps in identifying and confirming the benefit of a new tumor marker is appropriate study design. The Tumor Marker Utility Grading System (TMUGS) was initially developed by members of the ASCO panel to provide a framework within which the utility of tumor markers can be graded based on published information [11]. Markers are assigned a grade based on the level of evidence (LOE) available. These LOE reflect the relative quality of the studies used to generate an estimate of the effect of the marker (Table 5
Systematic overviews and/or pooled analyses of well-conducted LOE II studies are equivalent to LOE I studies, especially if the correlative studies address a specific use but are underpowered in a single study. It has been estimated that a clinical trial that has been appropriately powered to determine a clinical end point, such as progression-free survival, is underpowered for analysis of tumor marker-designated subgroups by one fourth, even if tissue from 100% of enrollees is available. Moreover, interpretation still requires judgment regarding the clinical importance of the finding. For example, from the prospective randomized clinical trials of adjuvant trastuzumab versus placebo, we anticipate combined analyses of the multiple underpowered LOE II studies evaluating novel markers for benefit from this drug. These pooled results should help focus trastuzumab therapy in the subgroups of Her-2/neu-positive patients most likely to benefit. The combined analyses of small LOE III studies that contain patients with variable clinical characteristics and treatments, however, are more likely useful to generate new hypotheses than to provide clinically useful and validated results. For all studies, whether prospective or retrospective, it is important to identify the appropriate patient population to be investigated. All patients should have a similar profile based on known prognostic factors. Importantly, the effects of systemic treatment are critical and must be considered. If the study is addressing the prognostic value of a marker, all patients should have been treated uniformly and without whatever treatment might be considered if the patients have a "poor prognosis." If the question is addressing the utility of adding any treatment, all patients should be untreated. If the question is whether more treatment should be given, then all patients should have received the same treatment. If the study is addressing predictive factors, a control group that has been treated identically to the study group, with the exception that they did not receive the treatment in question, is essential. Although the control group might be from a selected historical control, predictive factors are ideally studied in the context of prospective, randomized, controlled trials comparing the patients who received the treatment in question with those who did not receive that treatment. Tumor marker studies should be carefully designed, using the above criteria, to obtain clinically useful information. Researchers frequently have practical difficulties designing such studies, however, because of the need for significant numbers of patients with particular clinical characteristics in order to address a specific clinical question. As discussed above, this can sometimes be overcome by pooling the results of several well-done but underpowered studies. Another significant drawback is obtaining funding for tumor marker studies, as pharmaceutical companies and third-party payers derive relatively smaller financial benefit from the results compared to the enormous payoffs for a "blockbuster" therapeutic agent. Regardless, given the consequences, one has to question why it is acceptable for tumor marker studies to be performed with less scientific rigor than studies of new pharmaceutical agents. Reporting of tumor marker studies has also been historically haphazard. Recently, in order to standardize reporting of tumor marker study results, the National Cancer Institute-European Organization for Research and Treatment of Cancer (NCI-EORTC) Working Group on Cancer Diagnostics developed REporting recommendations for tumor MARKer prognostic studies (REMARK) [35]. The guidelines outline items that should be addressed by researchers when reporting the results of tumor marker studies, including prospectively defining the question the study is trying to address, identifying the appropriate patient population and controls, determining the end point, and identifying potential confounding factors. Explicit recommendations are given regarding which information must be contained in publications of tumor marker studies, including patient and treatment information, specimen characteristics, assay methods, study design, and statistical analysis methods.
In the preceding sections we identified the essential elements for establishing the usefulness, strength, and reliability of tumor markers. Let us now discuss the data supporting two currently used tumor markers, Her-2/neu and Oncotype DXTM.
Her-2/neu
The potential uses for Her-2/neu are for prognosis and prediction, as outlined in Table 6
More data exist to support the role of Her-2/neu status for prediction of response to standard therapies and trastuzumab. For selective estrogen receptor modulators (SERMs), such as tamoxifen, preclinical and clinical studies suggest that Her-2/neu positivity confers a relative resistance, with moderate magnitude, although the data are LOE III at best [39]. Data for aromatase inhibitors (AIs) are mixed, although in one pilot study of neoadjuvant endocrine therapy, Her-2/neu overexpression correlated with lower response to tamoxifen than to AIs [39]. At present, Her-2/neu status is not used to determine which endocrine therapy to use, because of the poor level of available evidence and conflicting data. Confirmation of these results may lead to preferential use of AIs in patients with Her-2/neu-positive disease. Patients with tumors that overexpress Her-2/neu appear to have relative resistance to some chemotherapy regimens, such as cyclophosphamide, methotrexate, and 5-fluorouracil (CMF), but not to others, such as anthracycline-containing regimens [37, 40, 41]. Her-2/neu status is not generally a consideration when choosing a chemotherapy regimen, however, because, as is the case with endocrine therapy, the level of available evidence does not support using Her-2/neu status to predict response to chemotherapy. In contrast, Her-2/neu status appears to be strongly predictive of response to trastuzumab. A patient with a tumor that overexpresses Her-2/neu is usually treated with trastuzumab in either the adjuvant or metastatic settings [8, 9, 42, 43] because the benefits outweigh the risks in the majority of cases. A patient with a tumor that fails to overexpress Her-2/neu appears not to respond to treatment with trastuzumab [10, 44] and therefore would not be treated with trastuzumab to avoid both unnecessary toxicity and cost. Thus, use of Her-2/neu to select trastuzumab is recommended based on "use" and "magnitude." However, as discussed above, there are substantial apparent difficulties with the technical reliability of all available assays for the marker. Nonetheless, despite the shortcomings of Her-2/neu studies outlined above, at present there is sufficient LOE II evidence to support the routine clinical use of Her-2/neu overexpression for selection of trastuzumab therapy, as indicated in the most recent ASCO tumor marker guidelines [4].
Multigene Expression: Oncotype DXTM Oncotype DXTM has also been evaluated as a predictive factor, although less rigorously. In the NSABP B-14 trial, the patient cohorts were treated with tamoxifen versus observation, and comparison of the cohorts suggested that the Oncotype DXTM assay is predictive for tamoxifen [45]. Similarly, analysis of NSABP B-20, in which patients were randomized to CMF and tamoxifen versus tamoxifen alone, permitted the investigators to determine that the assay is predictive for response to chemotherapy [45]. Are the available data sufficient to conclude that Onco-type DXTM has been validated to the extent that patient treatment decisions should be based on the results? Perhaps, but because these studies were all proposed using available samples from trials performed many years ago and represented only subsets of the overall population entered into the trials, concerns have been raised about wholesale clinical adoption of this assay. In that regard, the North American Breast Cancer Intergroup is developing the TailorRx clinical trial to further validate and extend the Oncotype DXTM results. The trial design assumes that the assay is prognostic, and will confirm the ability of the assay to predict response to chemotherapy. In the TailorRX trial, tumors of patients with ER-positive and lymph node-negative breast cancer will be tested using the Oncotype DXTM assay. Patients with low recurrence scores, who have good prognoses without chemotherapy, will receive hormonal therapy alone. At the other end of the spectrum, patients with high recurrence scores will receive chemotherapy in addition to hormonal therapy. Those patients whose scores fall in the intermediate range will all receive hormonal therapy and be randomly assigned to chemotherapy or not. This trial design will permit validation of the Oncotype DXTM results in a similar patient population in a large prospective clinical trial, and will allow for generation of new data on which to base treatment recommendations for patients whose recurrence scores are intermediate. Given the substantial technical, analytical, and trial design problems with previously performed tumor marker studies, it is imperative to address these issues. It is especially important to standardize the assays for commonly used tumor markers. Otherwise, patients with false-positive test results for predictive factors will receive treatments that are not beneficial but which may cause significant toxicity, and those with false-negative results will not be offered potentially life-saving therapies. In addition, a better understanding of the potential pitfalls in tumor marker study design will allow for the development of new, potentially more useful assays.
Tumor markers, when well defined, can play a significant role in prediction and prognosis for breast cancer patients. Because of the abundance of poorly designed tumor marker studies to date, however, very few markers have been accepted for routine use by groups such as ASCO. When designing studies to establish a new tumor marker, or new use for an old marker, it is important to address the utility, magnitude, and reliability of the marker (Table 2
Frameworks such as TMUGS can be useful when designing and conducting these studies to ensure that appropriate components are included, thereby leading to the establishment of new tumor markers for routine clinical use [11]. By progressively generating and refining a hypothesis, based on data derived from increasingly well-developed studies, tumor markers with clinical utility can be identified (Fig. 2
Supported by National Institutes of Health grant 5R01CA092461-03 and Fashion Footwear Charitable Foundation of New York/QVC Presents Shoes-on-SaleTM.
D.F.H. has served as an unpaid consultant for Genomic Health, Inc. in the past year and has received research funding from Immunicon. D.F.H. has been a consultant/advisory panel participant or held lecture/honorarium position during past year for Dendreon, Immunicon, Novartis, Pfizer, Precision Therapeutics, Inc., Oncotech, and Veridex. D.F.H. was the principle or coinvestigator during the past year on studies funded by Immunicon, Wyeth Ayerst-Genetics Institute, Pfizer, and Novartis.
Hayes DF, Bast RC, Desch CE et al. Tumor marker utility grading system: a framework to evaluate clinical utility of tumor markers. J Natl Cancer Inst 1996;88:14561466. McShane LM, Altman DG, Sauerbrei W et al. REporting recommendations for tumour MARKer prognostic studies (REMARK). Br J Cancer 2005;93:387391. This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||