| HOME | HELP | CONTACT US | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Lung Cancer |
Department of Radiology, New York Presbyterian Hospital, Weill Medical College, New York, New York, USA
Key Words. Computed tomography • Lung cancer • Screening • Diagnostic testing
Correspondence: Claudia I. Henschke, Ph.D., M.D., Department of Radiology, New York Presbyterian Hospital-Weill Medical College, 525 East 68th Street, New York, NY 10065, USA. Telephone: 212-746-2529; Fax: 212-746-2811; e-mail: chensch{at}med.cornell.edu
Received August 21, 2007; accepted for publication November 14, 2007.
Disclosure: No potential conflicts of interest were reported by the authors, planners, reviewers, or staff managers of this article.
![]()
Editor's Note
Top
Footnotes
Editor's Note
Abstract
Key Concepts and Definitions
The Early Lung Cancer...
ELCAP to New York...
Advantages of the ELCAP...
Concerns About the ELCAP...
Lead-Time Bias
Other CT Screening Trials
References
This article is not available for CME
| ABSTRACT |
|---|
|
|
|---|
The goal of the Early Lung Cancer Action Project investigators was to develop an efficient methodology that would provide an ever-accumulating, continually updated body of evidence for evaluation of emerging new technologies for screening for cancer. This methodology recognizes that screening is a sequential process that starts with the pursuit of the early diagnosis of cancer followed by early treatment. It also recognizes that diagnostic research is fundamentally different from treatment research. To fully understand the current discussions on the evidence for lung cancer screening, key definitions are provided, including the differentiation between the first, baseline round of screening and all subsequent rounds of repeat screening and baseline and repeat cancers and their distribution by cell type. These definitions are critical in analyzing the results of various screening reports as they are not used by all.
To provide optimal screening, a regimen for the diagnostic workup must be specified starting with the definition of the initial test, its positive result, and the workup for a positive result leading to a diagnosis of cancer. Assessment of diagnostic performance does not require a control group, but does require confirmation of the diagnosis.
For assessment of the effectiveness of early treatment, a comparison group is needed. The comparison group may be formed by randomly assigning people with screen-diagnosed lung cancer to immediate or delayed treatment, as has been done for prostate cancer. This provides a direct assessment of any potential overdiagnosis of the cancer resulting from screening. Alternatively, a quasiexperimental control group can be used consisting of participants diagnosed with the cancer who have refused or delayed their treatment even though they are candidates for it.
| KEY CONCEPTS AND DEFINITIONS |
|---|
|
|
|---|
Screening for a cancer should be considered when the cancer is significant in terms of incidence and mortality, treatment of early-stage disease is better than treatment of late-stage disease, and there is a test that provides an earlier diagnosis—lead time—rather than a later, symptom-prompted diagnosis [1, 2].
Lung cancer kills more people than any other cancer worldwide. In the U.S. it kills more people than colon, breast, and prostate cancer combined and more women than breast cancer (Fig. 1) [3].
|
10 mm [6] (Fig. 2), and this rate decreases as the tumor size increases [7, 8]. Although the curability rate is high for stage I lung cancer, <15% of lung cancers are diagnosed in that stage and so the overall curability rate for lung cancer for all stages combined is <10% [8].
|
|
The baseline round is inherently different from repeat rounds of screening because it is the first round and no prior results are available for comparison. One consequence of baseline screening is that cancers with a longer latent (asymptomatic) phase are more frequently identified. This has been called length bias [2] and exists for any screening program, regardless of the design of the study or the cancer [2, 9]. While this difference exists between baseline and repeated screening, it does not exist for repeat rounds and thus repeat rounds can be pooled. The other consequence is that cancers found in repeat rounds are found earlier in their latent phase than in the baseline round [9], a fact not usually stated. To address the consequences of this length bias, the baseline round should be reported separately from repeat rounds.
In repeat screenings, the frequency of all cancer diagnoses should reflect that found in usual care, or in the absence of screening, for people having the same risk for lung cancer. The proportion by cell type [10] should also reflect the proportion found in usual care [11]. For lung cancer, the proportion by cell type in repeat rounds of screening is similar to that found in clinical practice [11], and differs from the proportion found in the baseline round (Fig. 4). The size distribution, however, of cancers found by screening is smaller than in clinical practice.
|
| THE EARLY LUNG CANCER ACTION PROJECT APPROACH |
|---|
|
|
|---|
|
Initially, a head-to-head comparison of the new screening regimen with the previous regimen (e.g., the initial test being CT instead of chest radiography [CXR]) is performed. To maximize the efficiency of the study, high-risk participants can be enrolled, but all should be free from recognized symptoms and signs of cancer. Such a study would require a baseline and at least one repeat round of screening. If this limited study shows that the diagnostic performance—that is, stage distribution, proportion of screen-to-interim diagnoses, and lead time — is promising, then the regimen is updated and this updated regimen is then provided to participants at different institutions. Following further confirmation of the diagnostic performance, screening can then be provided to an expanded group of participants who are at a lower risk for the cancer. In the course of these successive studies, new technologies (e.g., positron emission tomography [PET], PET/CT, computer-aided diagnostics) can be integrated into the regimen. The data from these studies can be pooled to determine the curability of those diagnosed early and treated, provided that a common protocol and the proper quality assurance procedures needed for such a collaboration are in place [17, 18].
For curability determination, a comparison group is needed. The comparison group may be formed by randomly assigning people with screen-diagnosed lung cancer to immediate or delayed treatment (Fig. 5) as was done for prostate cancer [13]. The randomization could be further stratified by clinical stage and CT appearance of the cancer, or perhaps even be based on the results of percutaneous needle biopsy. The latter approach provides a direct assessment of the extent of overdiagnosis of lung cancer resulting from screening. Alternatively, a quasiexperimental control group can be used, consisting of participants diagnosed with lung cancer who have refused or delayed their treatment even though they are candidates for it. This is a valid approach as long as the choice of the treatment or lack of it is independent of the cancer prognosis or other factors that might influence the ultimate outcome and these factors are documented at the time of enrollment into the screening program and not at the time of diagnosis or treatment [14]. A third alternative is to compare the mortality rate in the screening program once sufficient deaths have occurred with that in a nonscreened comparison group that has a similar risk profile for lung cancer. Finally, a fourth approach is to analyze the temporal pattern of deaths in the screened cohort after initiation of screening and compare deaths in the early years with deaths in the later years when the benefit of screening should become apparent if the screening is effective [9, 17, 18].
The ELCAP approach provides for further efficiencies. Follow-up of participants is required only for those diagnosed with lung cancer, usually some 1%–6%, depending on the risk of the participants. This markedly reduces follow-up requirements. To fully document all interim diagnoses occurring between the annual rounds of screening, the protocol requires that each person who has not returned for repeat screening be followed up to 18 months after their prior screening [9, 18]. If no cancer has been identified by either a symptom-prompted workup or any other reason, then the person is considered to have stopped participation in the screening program and no further follow-up is required.
| ELCAP TO NEW YORK ELCAP TO INTERNATIONAL ELCAP |
|---|
|
|
|---|
ELCAP enrolled 1,000 participants at two institutions in New York at high risk for lung cancer because of their age and smoking history (
60 years of age with a history of at least 10 pack-years of cigarette smoking) [20]. In the baseline round, each participant received a CT scan and CXR, which were read independently. If the result was positive, workup proceeded according to a stated regimen. In the baseline round, 27 (2.7%) lung cancer cases were screen diagnosed and two (0.2%) were interim diagnosed among the 1,000 participants. CXR identified seven (0.70%) cancers, and also missed the same two (0.2%) interim-diagnosed cases. Thus, CXR missed 20 (76%) of the 27 screen-diagnosed cancers found by CT (Table 1, Fig. 6). More importantly, CXR missed 20 (80%) of the 25 clinical stage I cancers, so that CXR screening was stopped.
|
|
Annual repeat rounds of screening in ELCAP resulted in seven screen diagnoses of lung cancer and no interim diagnoses (7/1,184 = 0.59%) (Fig. 7) [23]. Of the seven, six (86%) were stage I (Table 1). The frequency of repeat diagnosis should be the same as that found in the absence of screening among people with the same risk characteristics. Given that the baseline frequency of lung cancer diagnosis for ELCAP using CXR was quite similar to that found in the MLP [21], it is to be expected that the frequency of diagnosis in the repeat rounds of ELCAP CT of 0.59% would be similar to that in the MLP of 0.55% [24].
|
The diagnostic performance measures are unaffected by the frequency of lung cancer diagnosis. The proportion diagnosed in stage I in all three studies remained around 85% for both the baseline and annual repeat rounds of screening in NY-ELCAP [26] and I-ELCAP [27]. Interim diagnoses were rare and found prior to the first annual repeat screening, but not between the repeat rounds of screening. The estimated lead time for CT screening, based on the ratio of baseline to repeat screening, also remained around 4.5 in all three studies, much higher than the 1.5 years estimated using the same approach for CXR screening using the MLP [21, 24] or the 2.5 years estimated for mammography screening for breast cancer [25].
Diagnostic performance is dependent on the imaging used in the regimen and the regimen itself. The introduction of multidetector-row CT scanners and the reading of images on computer monitors instead of film after ELCAP should increase the proportion of stage I diagnoses and decrease the interim diagnoses. It did slightly increase the proportion diagnosed in stage I, but it also substantially increased the proportion of screen diagnoses in the baseline round; otherwise said, it decreased the proportion of baseline interim diagnoses (symptom prompted) from 7% (2/29) in ELCAP to 5% (3/104) in NY-ELCAP to 1% (5/405) in I-ELCAP.
These three studies provided important information for updating the regimen. The implications of finding nonsolid and part-solid nodules were recognized [28]. Their significance was also recognized [29], the workup of nodules improved [30, 31], the usefulness of growth as an indication for biopsy was tested [32–36], and the workup of other findings such as mediastinal masses [37], cardiac calcifications [38], and emphysema [39] was formulated.
Long-term follow-up of ELCAP, NY-ELCAP, and I-ELCAP provided the estimated curability rate of 82% for all diagnoses (screen and interim combined), regardless of stage and treatment (Fig. 8), essentially unchanged from that reported in 2006 [27]. When diagnosed at an early stage and being promptly treated, the estimated curability rate was 93%, whereas that of the comparison group with screen-diagnosed stage I lung cancer who refused treatment was 0%, because all died of lung cancer. The number of deaths prevented by early treatment of lung cancers diagnosed by CT screening was estimated by the proportion diagnosed in stage I multiplied by the curability rate in stage I (85% x 93% = 78%), or alternatively by the overall curability rate of 82%. This compares with some 7% of deaths that are currently prevented in the absence of screening, given by the ratio of the number of deaths to the number of new cases of lung cancer each year (164,000/174,000, Fig. 1).
|
CT screening in ELCAP and NY-ELCAP had fewer late-stage lung cancers than did CXR screening in the MLP and Memorial Sloan-Kettering Lung Project (MSKLP) [40]. In this comparison, the focus must be on the repeat rounds of screening. The overall frequency of ELCAP lung cancer diagnosis is close to that from the MLP and that from NY-ELCAP is close to that from the MSKLP, but the absolute and proportional numbers of late-stage cancers are significantly less for ELCAP and NY-ELCAP (Fig. 9). Figure 9 shows the higher proportion of late-stage cancer diagnoses, 0.29% from the MLP compared with 0.08% from ELCAP, and 0.22% from the MSKLP compared with 0.05% from the NY-ELCAP. This significant stage shift to early-stage cancers is highly suggestive of a decrease in the mortality rate when CT screening is used compared with CXR screening. Such a shift to earlier stage cancers with a subsequent lower number of deaths has been demonstrated for cervical cancer screening [41, 42].
|
|
Figure 10 also illustrates one of the key problems in a recent paper by Bach et al. [46], which focused mainly on the early years (the first 3–4 years after baseline). However, even in that analysis, a decrease in the cumulative mortality rate in years 5 and 6 can already be seen. A focus on the appropriate time when the reduction in mortality can reasonably be expected has already been highlighted in breast cancer screening [47, 48] and in colorectal screening [49, 50], both of which illustrated the need for longer screening and follow-up.
| ADVANTAGES OF THE ELCAP APPROACH |
|---|
|
|
|---|
The ELCAP approach does not have the problems that have been recognized as occurring in randomized trials [14, 51–61]. Biases of randomized trial results include having: (a) an insensitive outcome measure, the cumulative mortality rate, which does not focus on the relevant time when the number of deaths prevented by screening is seen [47, 51–55]; (b) an inadequate number of rounds of screening, so that a decrease in the mortality rate cannot be seen [19, 45]; (c) protocol nonadherence [45, 51, 59, 61]; (d) delays in diagnosis and/or treatment or participants choosing not to be treated [58], without any analysis of whether this proportion was the same in both arms of the trial; and (e) reliance on death certificates [60]. By the time that randomized screening trials are completed, there may also have been considerable technology drift so that the results are no longer relevant. This is demonstrated by the PLCO trial [62], which started in 1993 and will report the results of CXR in the next decade, when CXR is no longer relevant. Having an inadequate number of rounds of screening was clearly illustrated by the Minnesota Colorectal Study, which required extension to 10 years of screening from the originally planned 5 years [50]. Items (c)–(e) are particularly troublesome if the frequency of occurrence is different in the two arms of the randomized trial [45, 61]. Although randomization is used to provide comparability of the two arms at enrollment, it does not ensure comparability in those diagnosed with lung cancer (only 1%–3% of all participants), nor does it ensure comparability as to whether all had a timely diagnosis or treatment. To help overcome these problems, a large number of participants is required, markedly increasing the cost and time required for such trials. An alternative that was suggested by the designers of some of these large randomized screening studies [61] was to perform a limited mortality analysis of these trials focusing only on the relevant cases in each arm of the randomized trials instead of all the cases.
Probably because of the considerable cost and time and inherent difficulties of these trials, only one randomized screening study for colorectal cancer (the Minnesota study) [50], one for breast cancer [48], and one for lung cancer (which evolved into three separate smaller studies) [63] have been completed in the U.S. Consequently, emerging, promising modalities are not evaluated scientifically. Often the tests are simply integrated into the medical care system. For colon cancer screening, new tests (e.g., colonoscopy and/or virtual colonoscopy with CT) have essentially replaced sigmoidoscopy, and the latter is used in the ongoing PLCO trial, but the efficacy of these new tests has not been tested using a randomized screening trial design. For lung cancer, CT was already available when the CXR and sputum trials were started in the mid 1980s, but CXR is still being tested in the PLCO trial [22, 62], and these results will not be reported in next decade. For coronary artery calcification screening with CT, no randomized trial has ever been performed, although large national cohort studies have been started. Thus, frequently, scientific evidence for formulation of national policies is not available for these emerging technologies.
| CONCERNS ABOUT THE ELCAP APPROACH |
|---|
|
|
|---|
Length Bias
Length bias affects all screening programs, ELCAP as well as randomized screening trials. It is introduced by the very process of screening. The baseline round is inherently different from repeat rounds of screening because cancers with a longer latent (asymptomatic) phase are more frequently identified in the baseline round [2], but cancers found in repeat rounds are found earlier in their latent phase than in the baseline round [9]. While this difference exists between baseline and repeated screening, it does not exist for repeat rounds, and thus repeat rounds can be pooled. The solution is to report the results of the baseline round separately from the results of repeat rounds, as illustrated in Figure 4.
| LEAD-TIME BIAS |
|---|
|
|
|---|
Randomized screening trials also have lead-time bias when they do not provide for sufficient follow-up in their design, but in this case the lead-time bias exists because the mortality difference is underestimated. The period of screening and follow-up needs to be long enough (dependent on lead-time) to focus on the time period when a decrease in deaths as a result of screening can be anticipated and when there is no longer a lead-time bias [45, 47, 49].
Lead-time bias is also introduced when comparing a treatment that has a lead time with a treatment without a lead time. I-ELCAP did not do this. I-ELCAP evaluated the effectiveness of treatment contrasted with no treatment in screened patients, all of whom have the same lead time (Fig. 11).
|
Overdiagnosis occurs in two ways: (a) a "cancer" is detected by screening that would never have been life-threatening, even when not resected, but because of screening it was detected and the person is thus subjected to diagnostic tests and treatment when in fact that person was not at risk for dying from the cancer; (b) a genuine life-threatening cancer is diagnosed but the person dies of another disease or accident so that the screening has not saved the life. In other words the person dies of a competing cause [65, 66]. This concern needs to be addressed in the eligibility criteria so that the life expectancy of participants is sufficient to justify the screening. Of these two concerns, the main concern is that those with a slow-growing cancer may inflate the estimated curability rate and that these patients would have also undergone surgery for a non–life-threatening lesion.
For lung cancer, multiple reports have shown that identifying slow-growing cancers is not a significant concern and does not account for the survival differences reported in the three randomized screening trials for lung cancer. Almost all of those who were diagnosed with stage I lung cancer as a result of CXR screening and refused treatment died of their disease, as demonstrated by Flehinger et al. [58], Yankelevitz et al. [67], and Sobue et al. [68] in Japan. In the absence of screening, this was also found in the analyses by Henschke et al. [69] using data in the Surveillance and End Results (SEER) registry and by Raz et al. [70] based on California registry data.
The I-ELCAP protocol [18] directly addresses the issue in several ways: (a) In both baseline and repeat rounds of screening, the regimen requires demonstration of in vivo growth at a malignant rate prior to recommending biopsy. (b) All resected specimens are reviewed by an international panel of pathology experts who confirm that they are all genuine lung cancers and that 95% of them are already invasive [10]. (c) Those who delayed their diagnosis or treatment showed progression of their disease in NY-ELCAP [26] and I-ELCAP [27] and those who were in stage I who had no treatment all died from lung cancer [27] (Fig. 8).
In ELCAP, NY-ELCAP, and I-ELCAP, the lung cancer diagnosis rate in annual repeat rounds of screening was essentially the same as in prior screening trials performed in the 1970s (Figs. 6 and 7). Thus, there is no evidence of overdiagnosis in the repeat rounds of screening. Also, repeat cancers in ELCAP, by definition, were not seen in the prior screening 1 year earlier. Thus, the volume doubling time of the cancer must be 200 days or less, if the lower limit of nodule detectability is one with a diameter of 2 mm [65]. A cancer with a volume doubling time of 200 days is an aggressive cancer and does not fit the profile of an overdiagnosed one.
For baseline screening, we determined that 87% of clinical stage I patients had genuine, life-threatening cancers, defined as the cancer having a doubling time <400 days [29].When considered separately by nodule consistency, it was 96% for cancers manifesting as solid nodules, 90% for cancers manifesting as part-solid nodules, and 67% for cancers manifesting as nonsolid nodules [29]. Only adenocarcinoma is diagnosed in nonsolid nodules; hence, adenocarcinoma diagnosed in a nonsolid nodule is the most suspect case for being an overdiagnosed cancer [10]. Thus, we identified that some of the adenocarcinomas manifesting as nonsolid nodules were slow growing, in addition to the already well-known cell types such as typical carcinoids, which manifest as solid nodules and which are known to be slow growing.
If these steps are not sufficient to address the concern about overdiagnosis then, ethically, a randomized treatment trial could be performed in which patients with potentially overdiagnosed screen-diagnosed lung cancers are randomly assigned to either immediate treatment or delayed treatment as was done for prostate cancer [13].
The second issue of competing causes of death is addressed by setting reasonable admissibility requirements for screening based on actual data obtained from I-ELCAP. It is suggested that, because the lead time provided by CT is about 4.5 years, the life expectancy of a person undergoing CT screening should be at least 10 years. ELCAP further addressed competing causes of death directly by performing a survival analysis focusing on non–lung cancer deaths, which showed that older smokers and former smokers have a high life expectancy if they do not die from lung cancer. Their 10-year survival rate for death other than lung cancer was 93% [66].
| OTHER CT SCREENING TRIALS |
|---|
|
|
|---|
|
Similarly, the definition of stage I disease should be clearly defined in each study and in the context of screening. Stage I diagnoses should include both non-small cell and small cell lung cancers without lymph node metastases and also identify multiple adenocarcinomas without lymph node metastases, because the latter should have the benefit of resection [65] (Vazquez M, Carter D, Brambilla E et al., unpublished data). All the studies showed a high proportion of stage I diagnoses, in the range of 71%–100%. The frequency of stage I diagnosis depends on the regimen of screening and adherence to it and on the definition of a stage I diagnosis. Not many interim diagnoses of lung cancer were reported, and among those who performed sputum cytology, only a few were detected by sputum cytology alone.
Stimulated by the demand for CT screening resulting from ELCAP publication, the NLST was developed [43, 44]. It was designed by the PLCO trial investigators together with the American College of Radiology Imaging Network and used the same design as prior randomized trials for lung cancer in the U.S. [21, 40, 50, 63]. The control arm of the NLST is identical to the control arms of the MSKLP and Johns Hopkins Hospital Lung Project, in which all received annual CXR screening [63]. Individuals in the intervention arm of the NSLT were provided annual CT. Three rounds of screening were provided—a baseline and two annual repeat rounds of screening. Enrollment in the NLST started in 2002 and ended by mid-2004, so that when follow-up ends in 2008, the median follow-up time after diagnosis will be only some 4 years in 2009 when the results are to be reported. The regimen of screening was not well defined and not enforced or checked at the participating institutions. The traditional outcome measure of cumulative mortality rate is to be used and there is no mention of performing a limited mortality analysis that would focus on the timeliness of diagnosis or treatment or on deaths during the relevant time interval for which the screening benefit is expected, an important limitation of the traditional analysis of such trials that was recognized by the designers of the PLCO trial and NLST as early as 1983 [61]. Reanalysis of the MLP, focusing only on the lack of protocol nonadherence, showed that CXR might have provided as much as a 43% reduction in the number of deaths from lung cancer [59]. A pilot study of 3,000 participants was performed prior to the start of the NLST [78] to demonstrate that randomization was feasible, but it also suggested lack of adherence, particularly for those randomized to the CXR arm. A regimen for the workup of screen-detected nodules was specifically not included in that study and it resulted in a low proportion of stage I diagnoses in the CT arm. Further, the proportion of cancer diagnoses among those having invasive procedures was low in comparison with ELCAP, NY-ELCAP, and I-ELCAP, suggesting that there was a poor understanding of the importance of a workup protocol and also poor compliance with any of the known recommended workup algorithms [18].
Different cost-effectiveness analyses have shown that CT screening for lung cancer is very cost-effective [79–83], with the exception of one theoretical study [84]. These analyses look at the overall cost per life-year saved, but ideally a cost-effectiveness assessment would be done on an individualized basis [85]. Such an individualized assessment would determine the life expectancy of the person (in light of the personal risk indicators) and the risk for competing causes of death in order to determine how many years of life might be saved by the screening round that is being contemplated.
| FOOTNOTES |
|---|
Administrative support: Claudia I. Henschke, David F. Yankelevitz
Provision of study materials or patients: Claudia I. Henschke, David F. Yankelevitz
Collection/assembly of data: Claudia I. Henschke, David F. Yankelevitz
Data analysis and interpretation: Claudia I. Henschke, David F. Yankelevitz
Manuscript writing: Claudia I. Henschke, David F. Yankelevitz
Final approval of manuscript: Claudia I. Henschke, David F. Yankelevitz
| REFERENCES |
|---|
|
|
|---|