The Oncologist, Vol. 13, No. suppl_2, 32-40, April 2008; doi:10.1634/theoncologist.13-S2-32 © 2008 AlphaMed Press
Selection of Response Criteria for Clinical Trials of Sarcoma TreatmentaDepartment of Internal Medicine, Division of Hematology/Oncology, University of Michigan, Ann Arbor, Michigan, USA; bDepartment of Sarcoma Medical Oncology, University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA; cBristol-Myers Squibb, Wallingford, Connecticut, USA Key Words. Sarcoma • Clinical trials • Response • Imaging Correspondence: Scott Schuetze, M.D., Ph.D., Department of Internal Medicine, Division of Hematology/Oncology, 1500 E. Medical Center Drive, C409 MIB, Ann Arbor, Michigan 48109-5843, USA. Telephone: 734-936-0453; Fax: 734-747-8792; e-mail: scotschu{at}umich Received August 20, 2007; accepted for publication October 27, 2007. Disclosure: R.B. has acted as a consultant to Novartis. R.C. is an employee of Bristol-Myers Squibb and owns stock in Bristol-Myers Squibb and Zimmer. S.S. has acted as a consultant to Sanofi-Aventis. No other potential conflicts of interest were reported by the authors, planners, reviewers, or staff managers of this article.
Soft tissue sarcomas are a heterogeneous group of malignancies arising from mesenchymal tissues. A large number of new therapies are being evaluated in patients with sarcomas, and consensus criteria defining treatment responses are essential for comparison of results from studies completed by different research groups. The 1979 World Health Organization (WHO) handbook set forth operationally defined criteria for response evaluation in solid tumors that were updated in 2000 with the publication of the Response Evaluation Criteria in Solid Tumors (RECIST). There have been significant advances in tumor imaging, however, that are not reflected in the RECIST. For example, computed tomography (CT) slice thickness has been reduced from 10 mm to 2.5 mm, allowing for more reproducible and accurate measurement of smaller lesions. Combination of imaging techniques, such as positron emission tomography with fluorine-18-fluorodeoxyglucose (18FDG-PET) and CT can provide investigators and clinicians with both anatomical and functional information regarding tumors, and there is now a large body of evidence demonstrating the effectiveness of PET/CT and other newer imaging methods for the detection and staging of tumors as well as early determination of responses to therapy. The application of newer imaging methods has the potential to decrease both the sample sizes required for, and duration of, clinical trials by providing an early indication of therapeutic response that is well correlated with clinical outcomes, such as time to tumor progression or overall survival. The results summarized in this review support the conclusion that the RECIST and the WHO criteria for evaluation of response in solid tumors need to be modernized. In addition, there is a current need for prospective trials to compare new response criteria with established endpoints and to validate imaging-based response rates as surrogate endpoints for clinical trials of new agents for sarcoma and other solid tumors.
Soft tissue sarcomas are a heterogeneous group of malignancies arising from mesenchymal tissues [1]. Based on the Surveillance, Epidemiology, and End Results (SEER) database, nearly 15,000 new cases of sarcoma, both bone and soft tissue, are diagnosed in the U.S. each year [2]. Anthracyclines and ifosfamide have been established as the most active drugs for the treatment of patients with advanced, soft tissue sarcomas of most histologic subtypes, with the exception of gastrointestinal stromal tumors (GISTs). However, after failure of these drugs, patients with advanced soft tissue sarcomas have few treatment options [3]. Limitations in current therapeutic options for these tumors have prompted the development and evaluation of a very large number of new chemotherapeutic and biologic agents for their treatment [3]. Assessment of new therapies for sarcomas requires agreement on, and consistent use of, endpoints sensitive to the effects of these treatments. Longer survival is the generally accepted gold standard for demonstrating clinical benefit of an oncologic therapy. However, a wide range of surrogate endpoints has been employed as the basis for approving new therapeutic agents [4], and considerable controversy exists regarding which endpoints may be most appropriate for specific tumors [4]. This review article analyzes endpoints in oncology clinical trials, with a focus on sarcomas. This issue is timely, because there has been considerable evolution in approaches for assessment of these tumors and endpoints employed in clinical studies.
The acceptance of new cancer treatments requires demonstration of their benefit in clinical trials. Results from these studies may be confounded or invalidated by a large number of factors, including invalid comparisons in nonrandomized studies because of patient selection or stage migration, inadequate power of the study (i.e., sample size) to detect clinically meaningful differences in appropriately controlled randomized trials, use of multiple comparisons and retrospective analyses to make inappropriate conclusions from randomized trials, and endpoints that are inappropriate for the cancer under study [5]. Thus, documenting benefit for a new cancer therapy requires attention to study design and data analysis, and particularly to selection of study endpoints. Selection of endpoints for studies of new cancer therapies may be especially challenging because many of them require long-term patient follow-up and thus demand long-term studies. This conflicts with pressure to provide patients with rapid access to new treatments [6]. It has also been noted that there is greater tolerance for agents with high toxicity profiles that demonstrate efficacy, particularly in cancers where treatment options are limited, and that this may result in acceptance of therapies with high risk-to-benefit ratios [7]. The objective of a phase III oncology trial is to determine the risk-to-benefit ratio of an experimental agent or regimen, compared with a standard treatment [8].
Overall Survival Several additional problems may arise with the use of OS as the primary endpoint in an oncology study. The current availability of multiple effective lines of systemic or local therapy, as well as treatment switching, may obscure the impact of the agent under study upon survival in an intent-to-treat analysis. In addition, the interval from the end of recruitment to primary efficacy analysis for OS is protracted, such that subsequent studies taking the best therapy from the previous trial cannot begin until years after the previous trial has completed recruitment. Trials based on OS, which require a minimum of 5 years to complete, are inevitably lengthy and expensive [10, 11] and inhibit drug development, especially in uncommon cancers such as sarcoma.
Time to Progression and Progression-Free Survival However, both PFS and TTP also have disadvantages as surrogate endpoints for OS in clinical trials. The clinical significance of small differences in TTP or PFS may be unclear (as in OS), especially when one is evaluating toxic treatments, and careful assessment of progression at frequent intervals can be costly and labor-intensive. There are also concerns about ascertainment bias in unblinded trials and questions about the reliability of modest differences in TTP or PFS that are often observed in such studies. Because OS is not the primary endpoint in studies employing either PFS or TTP as surrogates, a design to cross over to the investigational treatment from the standard or placebo arm when tumor progression occurs can also be considered. A crossover design has the potential to improve patient enrollment and patient benefit. However, it is also a weakness in that it can dilute the contribution of a survival benefit [10, 11].
Response Rates
The World Health Organization Criteria Consensus criteria for defining a response to cancer therapy are essential for comparison of data from studies carried out by different research and clinical trial groups. In 1979, the World Health Organization (WHO) handbook set forth four specific operationally defined criteria for the codification of response evaluation in solid tumors. In this scheme, the lesions are measured bidimensionally and the product of the greatest tumor diameter and the greatest perpendicular distance, summed over all measured tumors, is used in determining responses. The four response categories were: CR, tumor disappearance confirmed at 4 weeks; PR, 50% or greater decrease in tumor size confirmed at 4 weeks; no change (NC), neither PR nor PD criteria met; and progressive disease (PD), 25% or greater increase in tumor size with no CR, PR, or SD documented before increased disease [13, 14]. Three major problems with these definitions gradually became apparent with their use in clinical trials [15, 16]. Methods of integrating the change in tumor size into response assessments varied among research groups, minimum lesion size and number of lesions documented varied from one study to the next, and what constituted PD was based on the change in size of a single lesion by some researchers and a change in the overall tumor load (including measurements of all lesions) by others. The advent of new technologies, particularly computed tomography (CT) and magnetic resonance imaging (MRI), further confused matters with respect to the relevance of volumetric and three-dimensional measurements versus bidimensional measures in response assessments. The combination of all these factors resulted in a situation in which response criteria were no longer comparable among research organizations. This was the circumstance that the original WHO publication had aimed to avoid.
Response Evaluation Criteria in Solid Tumors
Potential advantages of the RECIST over the WHO criteria [17] include the fact that the RECIST give specific size requirements for measurable lesions at baseline, distinguish target from nontarget lesions, allow the maximum number of target lesions to be followed up to a total of 10, and provide a baseline tumor burden (smallest sum of longest diameters from the start of treatment) for determining PD. The RECIST also state that all target lesions should be measured to determine PD instead of one or more measurable lesions [17]. The RECIST are predicated on unidimensional and bidimensional measurements being comparable and assume metastases are spherical and change proportionally. Application of the WHO criteria and RECIST to the same patients in 14 studies with a wide range of cancers indicated very similar results for all response categories. Results from this analysis indicated that 91.9% of patients evaluated had the same date of disease progression with the WHO criteria and RECIST; 7.3% had earlier disease progression with the WHO criteria and 0.9% had earlier disease progression with the RECIST (Table 2) [12]. This change is important to PFS since the PFS time by the RECIST will be longer than by the WHO criteria.
Problems with the RECIST in Clinical Trials The RECIST have been used extensively in clinical trials and are generally viewed as an important advancement over the criteria provided by the WHO [18]. However, these guidelines and criteria do have significant limitations. The RECIST are intentionally terse, and this can lead to confusion in how to apply appropriate measurement techniques across centers [16]. In addition, unidimensional measurements of the type set forth in the RECIST may not be suitable for all tumor types, most notably those with nonspherical growth patterns (e.g., malignant pleural mesothelioma) [19–22]. Definition of target lesions may also limit the utility of the RECIST. Target lesions defined by the RECIST may not represent burden of disease [23], and changes in tumor characteristics may confound evaluation (e.g., pleural effusions in lung cancer [24]). There are also limitations of the RECIST with respect to determination of disease progression. Response assessment as measured by the RECIST has been shown to have some discrepancies with WHO-determined responses. These appear to occur most often at the PR–SD and SD–PD "borders." This difference may be problematic when new experimental therapies are compared with conventional agents whose response rates have been established in historical trials. The apparent lower rate of disease progression with the RECIST may mean that more patients remain on therapy, and the percentages of patients with SD thus need to be interpreted with caution [25]. The RECIST also ignore the fact that changes in tumor size may not be directly correlated with disease progression in all therapeutic situations. Qualitative changes in tumors (e.g., myxoid degeneration in GIST) may not be reflected in tumor measurements, and this can result in erroneous classification of the response to treatment. Standard anatomic imaging techniques are often inadequate for evaluating malignancies, particularly when monitoring treatment responses for agents that do not cause tumor shrinkage (i.e., cytostatic agents) or for slow-progressing cancers or those malignancies that metastasize diffusely [26]. Thus, morphologic evaluation based solely on one- or two-dimensional measurements may not directly reflect biological changes in tumors associated with either the disease itself or its treatment [27]. Moreover, anatomical changes in the tumor as described by the RECIST may be detected later than functional changes in some circumstances (e.g., in GISTs treated with imatinib) [18]. The use of a primary tumor for response assessment, if the tumor is localized in a hollow organ (e.g., the esophagus), also makes measurements based on the RECIST difficult [18]. Finally, it is important to remember that the RECIST were developed on the basis of discussions carried out in the 1990s and published in 2000. As a result, they do not reflect many advances in imaging technology that have occurred over the past decade. Newer imaging and image-processing modalities may allow changes not considered in the RECIST to be included in revised response criteria [28]. For example, a comparison of relative values of manual unidimensional measurements and automated volumetry with multidetector-row computed tomography (MDCT) for longitudinal treatment response assessment in patients with pulmonary metastases indicated that MDCT provided better reproducibility of response evaluation and should be preferred over manual measurements in these patients [28]. The following section further explores the application of newer imaging technologies in assessing the efficacy of therapies for solid tumors.
Great advances in image acquisition and processing techniques are improving both staging of solid tumors for treatment planning and evaluation of new therapies. Surrogate markers of tumor response are also being developed with the use of functional imaging techniques that provide greater insight into tumor responses to therapy (e.g., changes in tumor perfusion, permeability, blood volume, and oxygenation). Combination of imaging techniques, such as positron emission tomography (PET) and computed tomography (CT), can provide investigators and clinicians with both anatomical and functional information. There is now substantial evidence that the use of PET with fluorine-18-fluorodeoxyglucose (18FDG-PET), particularly in conjunction with CT, can improve the accuracy of cancer staging with a high sensitivity for detecting small-volume disease [29]. The application of newer imaging methods that permit more rapid and precise evaluation of tumors has the potential to decrease both the sample sizes required for and duration of clinical trials by providing an early indication of therapeutic response that is well correlated with clinical outcomes (e.g., OS) for chemo- and radiotherapy [26]. The use of these new modalities and advances in transmission, storage, quality assurance, and analysis of images could streamline clinical trials of new treatments and accelerate new drug approvals [26].
CT
Choi and associates evaluated a series of 40 patients treated with imatinib for recurrent or metastatic GISTs who had undergone both PET and CT evaluation to determine the CT findings that could differentiate those who had a good response by PET and those who did not [30]. They found that a decrease in tumor size of Benjamin and associates confirmed the observations of Choi et al. [30] in a separate group of 58 patients and then evaluated all 98 patients by the RECIST and the Choi criteria. All patients had pretreatment and follow-up CT scans. Disease-specific survival (DSS) and TTP were analyzed by response category. There were 45 (46%) good responders and 53 (54%) poor responders by the RECIST. In contrast, there were 81 (83%) good responders and 17 (17%) poor responders by the Choi criteria [31]. Despite the almost doubling of the response rate when patients were assessed by the Choi criteria versus the RECIST, patients with good responses by the Choi criteria on CT at 8 weeks after the start of treatment had equivalent DSS to that of patients with a CR or PR at any time by the RECIST. In addition, TTP and DSS were significantly correlated with the Choi response group, but not with the response group by the RECIST [31]. These results support the conclusion that the Choi response criteria, which incorporate tumor density and small changes in tumor size on CT, are more sensitive and accurate than the RECIST in assessing the response of GISTs to imatinib treatment. These results have been reproduced at other institutions; however, further validation needs to be completed [32]. Advances in CT technology are likely to further increase its usefulness for the evaluation of cancer therapies. Greater numbers of detectors in CT scanners offer better three-dimensional reconstruction and volumetric measurement [33], but the lack of a sufficient number of centers with appropriate scanners to process data limits the organization of large-scale, multicenter clinical trials. Automated collection and analysis of CT data are vital but not widely available, and manual data collection and analysis are expensive and time-consuming [34]. Another important limitation of CT methods is that heterogeneity of tumors (e.g., hypoxic regions) can confound volumetric measurements [35].
18FDG-PET
Effectiveness of 18FDG-PET
18FDG-PET scanning has also been shown to be a useful method for prediction of outcomes in patients with high-grade extremity soft tissue sarcomas treated with chemotherapy. Schuetze and colleagues evaluated 46 patients with high-grade localized sarcomas with 18FDG-PET. The maximum standardized uptake value (SUVmax) of tumors was measured before neoadjuvant chemotherapy and again prior to surgery. Resected specimens were examined for residual viable tumors. Patients with a baseline tumor SUVmax
Results from additional studies have demonstrated that 18FDG-PET is useful for the staging of solid tumors and for assessing responses to neoadjuvant therapy [41, 42]. Review of the results from 16 studies of patients with esophageal carcinoma indicated that the accuracy of 18FDG-PET in assessing response to treatment was similar to that for endoesophageal ultrasonography and significantly superior to that for CT. The staging value of 18FDG-PET was limited for the detection of locoregional metastases, but the technique was effective for the detection of distant lymphatic and hematogenous metastases [42]. 18FDG-PET has also been shown to be useful for monitoring results of therapy in patients with stage III non-small cell lung cancer [41] and for identification of progressing lesions and for detecting flares in tumor lesions that were previously under control [43, 44]. New positronic substrates will likely expand the utility of PET [44]. The most widely used PET tracer for osteosarcoma is 18FDG. The other clinical PET tracer with reported utility for osteosarcoma imaging is 18F-fluoride ion. 18F-labeled monoclonal antibodies, 18F-fluoromisonidazole, 18F-labeled arginine–glycine–aspartic acid (RGD)-containing glycopeptide, 3H-thymidine, 13N-methionine, and PET of p53 transcriptional activity in osteosarcoma are all being investigated [44].
18FDG-PET Issues
MRI Magnetic resonance spectroscopy may predict response in a manner analogous to PET [48]. Both of these methods permit imaging of the entire body and combine functional and anatomical information. 18FDG-PET and MRI spectroscopy are valuable techniques for monitoring tumor response in patients undergoing chemo- and radiotherapy, particularly when evaluating early responses. In contrast, MRI is particularly useful for assessing metastasis and infiltration of bone marrow and the central nervous system. Dynamic contrast-enhanced MRI (DCE-MRI) is a new imaging method for assessing the physiological state of tumor vascularity in vivo. This method uses available imaging techniques and contrast agents and assays the kinetics of tumor enhancement during bolus i.v. contrast administration [49]. DCE-MRI has been shown to be useful for detecting microvascular changes in tumors in response to isolated limb perfusion within 24 hours of treatment in experimental animals [50] and for correctly predicting tumor responses to therapy in a small cohort of 12 patients with histologically proven high-grade soft tissue sarcoma [51]. DCE-MRI correctly predicted tumor response in 8 of 10 evaluable patients. Early rapidly progressive enhancement was correlated histologically with residual viable tumors, and late and gradual, or absence of, enhancement was correlated with necrosis, predominantly centrally located, or granulation tissue [51]. These preliminary results show that DCE-MRI offers the potential for noninvasive monitoring of responses to isolated limb perfusion in soft tissue sarcomas.
The results summarized in this review support the conclusion that the RECIST and the WHO criteria for evaluation of response in solid tumors need to be replaced by alternative endpoints. While these criteria have been useful, change is mandated in order to provide more rapid assessment of tumor responses and to reflect the advances in imaging technology that have occurred over the past decade. Prospective trials are needed to compare new response criteria with established endpoints and to validate imaging-based response rates as surrogates for traditional endpoints. New approaches developed for evaluation of sarcomas should subsequently be assessed in other tumor types. However, the underlying biology of tumors may impact the applicability of imaging-based criteria, and response kinetics and landmarks may vary among tumor types. In addition, alternative response criteria may have different prognostic values for cytotoxic versus cytostatic therapies.
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||