The Oncologist, Vol. 13, No. 1, 65-78, January 2008; doi:10.1634/theoncologist.2007-0153 © 2008 AlphaMed Press
CT Screening for Lung Cancer: Update 2007Department of Radiology, New York Presbyterian Hospital, Weill Medical College, New York, New York, USA Key Words. Computed tomography • Lung cancer • Screening • Diagnostic testing Correspondence: Claudia I. Henschke, Ph.D., M.D., Department of Radiology, New York Presbyterian Hospital-Weill Medical College, 525 East 68th Street, New York, NY 10065, USA. Telephone: 212-746-2529; Fax: 212-746-2811; e-mail: chensch{at}med.cornell.edu Received August 21, 2007; accepted for publication November 14, 2007. Disclosure: No potential conflicts of interest were reported by the authors, planners, reviewers, or staff managers of this article.
Screening is the pursuit of the early diagnosis of cancer before symptoms occur. The purpose of early diagnosis is to provide early treatment, which potentially prevents death from the cancer. The usefulness of screening depends on how early the cancer can be diagnosed and how many deaths can be prevented by early treatment as compared with later symptom-prompted diagnosis and treatment. The goal of the Early Lung Cancer Action Project investigators was to develop an efficient methodology that would provide an ever-accumulating, continually updated body of evidence for evaluation of emerging new technologies for screening for cancer. This methodology recognizes that screening is a sequential process that starts with the pursuit of the early diagnosis of cancer followed by early treatment. It also recognizes that diagnostic research is fundamentally different from treatment research. To fully understand the current discussions on the evidence for lung cancer screening, key definitions are provided, including the differentiation between the first, baseline round of screening and all subsequent rounds of repeat screening and baseline and repeat cancers and their distribution by cell type. These definitions are critical in analyzing the results of various screening reports as they are not used by all. To provide optimal screening, a regimen for the diagnostic workup must be specified starting with the definition of the initial test, its positive result, and the workup for a positive result leading to a diagnosis of cancer. Assessment of diagnostic performance does not require a control group, but does require confirmation of the diagnosis. For assessment of the effectiveness of early treatment, a comparison group is needed. The comparison group may be formed by randomly assigning people with screen-diagnosed lung cancer to immediate or delayed treatment, as has been done for prostate cancer. This provides a direct assessment of any potential overdiagnosis of the cancer resulting from screening. Alternatively, a quasiexperimental control group can be used consisting of participants diagnosed with the cancer who have refused or delayed their treatment even though they are candidates for it.
Screening has been defined as the pursuit of the early diagnosis of cancer before symptoms occur. The purpose of early diagnosis is to provide early treatment, which potentially prevents death from the cancer. The usefulness of screening depends on how early the cancer can be diagnosed and how many deaths can be prevented by early treatment as compared with later symptom-prompted diagnosis and treatment. To fully understand current discussions on the evidence for lung cancer screening, key concepts of screening and definitions are needed. Screening for a cancer should be considered when the cancer is significant in terms of incidence and mortality, treatment of early-stage disease is better than treatment of late-stage disease, and there is a test that provides an earlier diagnosis—lead time—rather than a later, symptom-prompted diagnosis [1, 2]. Lung cancer kills more people than any other cancer worldwide. In the U.S. it kills more people than colon, breast, and prostate cancer combined and more women than breast cancer (Fig. 1) [3].
For lung cancer, the staging system is based on differences in lung cancer survival [4, 5]. The curability rate, as estimated by the 10-year survival rate for stage I disease, is high, particularly when the cancer diameter is 10 mm [6] (Fig. 2), and this rate decreases as the tumor size increases [7, 8]. Although the curability rate is high for stage I lung cancer, <15% of lung cancers are diagnosed in that stage and so the overall curability rate for lung cancer for all stages combined is <10% [8].
Screening for a cancer is a repetitive process, starting with the baseline round followed by repeat rounds of screening (Fig. 3) at intervals defined by a regimen of screening. The regimen also defines the initial diagnostic test (e.g., Papanicolaou smear, mammogram, fecal occult blood, chest radiograph, computed tomography [CT] scan) and the sequence of tests to be performed leading to a rule-in diagnosis of the cancer. The regimen for the first, baseline round may be different from the regimen for the repeat rounds as no prior results are available for the former, and for other reasons discussed below.
A diagnosed case of the cancer is classified as a screen diagnosis if the diagnosis resulted from the workup of an abnormality identified on the initial test of the regimen (Fig. 3). A diagnosis of cancer resulting from findings identified in the baseline regimen is classified as a baseline cancer. Similarly, when the diagnosis is made as a result of findings identified in the repeat regimen, it is classified as a repeat cancer. If the diagnosis of cancer resulted because of a symptom-prompted workup before the next screening, it is classified as an interim diagnosis. These are standard definitions and it is critical to ensure that the same definitions are being used when comparing results of different studies. The baseline round is inherently different from repeat rounds of screening because it is the first round and no prior results are available for comparison. One consequence of baseline screening is that cancers with a longer latent (asymptomatic) phase are more frequently identified. This has been called length bias [2] and exists for any screening program, regardless of the design of the study or the cancer [2, 9]. While this difference exists between baseline and repeated screening, it does not exist for repeat rounds and thus repeat rounds can be pooled. The other consequence is that cancers found in repeat rounds are found earlier in their latent phase than in the baseline round [9], a fact not usually stated. To address the consequences of this length bias, the baseline round should be reported separately from repeat rounds. In repeat screenings, the frequency of all cancer diagnoses should reflect that found in usual care, or in the absence of screening, for people having the same risk for lung cancer. The proportion by cell type [10] should also reflect the proportion found in usual care [11]. For lung cancer, the proportion by cell type in repeat rounds of screening is similar to that found in clinical practice [11], and differs from the proportion found in the baseline round (Fig. 4). The size distribution, however, of cancers found by screening is smaller than in clinical practice.
As screening is the pursuit of an early diagnosis followed by early treatment, both the diagnostic and treatment performance need to be determined. Key diagnostic performance measures of the regimen are: (a) the proportion of screen diagnoses among all diagnoses, (b) the stage distribution of the diagnosed cancers, and (c) the estimated lead time given by the ratio of the number of diagnoses in the baseline round to those in a single repeat round. If the ratio is one, meaning that the screening regimen does not provide lead time, it will not provide an earlier diagnosis. Different from diagnosis, treatment potentially changes the natural course of the disease, and thus to determine its effectiveness, a comparison is needed [12–14].
In 1991, the Early Lung Cancer Action Project (ELCAP) investigators decided to develop an efficient methodology to provide an ever-accumulating, continually updated body of evidence for evaluation of emerging new technologies for screening for lung cancer [15–17]. This methodology recognizes that screening is a sequential process that starts with the pursuit of an early diagnosis of cancer followed by treatment (Fig. 5).
To provide optimal screening, a regimen for the diagnostic workup must be specified. It starts with the definition of the initial test, its positive result, and the workup for a positive result leading to a diagnosis of cancer. Once cancer is diagnosed, treatment is typically performed according to usual care standards and is documented. Screening, by definition, involves asymptomatic participants; however, minimum age, smoking exposure, and other such admissibility criteria are flexible and set by each participating institution because these criteria determine the frequency of cancer diagnosis but do not affect the performance measures. Initially, a head-to-head comparison of the new screening regimen with the previous regimen (e.g., the initial test being CT instead of chest radiography [CXR]) is performed. To maximize the efficiency of the study, high-risk participants can be enrolled, but all should be free from recognized symptoms and signs of cancer. Such a study would require a baseline and at least one repeat round of screening. If this limited study shows that the diagnostic performance—that is, stage distribution, proportion of screen-to-interim diagnoses, and lead time — is promising, then the regimen is updated and this updated regimen is then provided to participants at different institutions. Following further confirmation of the diagnostic performance, screening can then be provided to an expanded group of participants who are at a lower risk for the cancer. In the course of these successive studies, new technologies (e.g., positron emission tomography [PET], PET/CT, computer-aided diagnostics) can be integrated into the regimen. The data from these studies can be pooled to determine the curability of those diagnosed early and treated, provided that a common protocol and the proper quality assurance procedures needed for such a collaboration are in place [17, 18]. For curability determination, a comparison group is needed. The comparison group may be formed by randomly assigning people with screen-diagnosed lung cancer to immediate or delayed treatment (Fig. 5) as was done for prostate cancer [13]. The randomization could be further stratified by clinical stage and CT appearance of the cancer, or perhaps even be based on the results of percutaneous needle biopsy. The latter approach provides a direct assessment of the extent of overdiagnosis of lung cancer resulting from screening. Alternatively, a quasiexperimental control group can be used, consisting of participants diagnosed with lung cancer who have refused or delayed their treatment even though they are candidates for it. This is a valid approach as long as the choice of the treatment or lack of it is independent of the cancer prognosis or other factors that might influence the ultimate outcome and these factors are documented at the time of enrollment into the screening program and not at the time of diagnosis or treatment [14]. A third alternative is to compare the mortality rate in the screening program once sufficient deaths have occurred with that in a nonscreened comparison group that has a similar risk profile for lung cancer. Finally, a fourth approach is to analyze the temporal pattern of deaths in the screened cohort after initiation of screening and compare deaths in the early years with deaths in the later years when the benefit of screening should become apparent if the screening is effective [9, 17, 18]. The ELCAP approach provides for further efficiencies. Follow-up of participants is required only for those diagnosed with lung cancer, usually some 1%–6%, depending on the risk of the participants. This markedly reduces follow-up requirements. To fully document all interim diagnoses occurring between the annual rounds of screening, the protocol requires that each person who has not returned for repeat screening be followed up to 18 months after their prior screening [9, 18]. If no cancer has been identified by either a symptom-prompted workup or any other reason, then the person is considered to have stopped participation in the screening program and no further follow-up is required.
Prior to starting ELCAP, we estimated that enrollment of 1,000 very high-risk participants would yield some 200–300 people with nodules and some 15–30 cancers to address the diagnostic performance of CT [15]. We also asked Dr. Flehinger to use the model she and her coworkers had developed based on prior randomized screening trials [19] to estimate the potential benefit of CT screening, which suggested that CT screening might decrease the deaths from lung cancer by as much as 80%.
ELCAP enrolled 1,000 participants at two institutions in New York at high risk for lung cancer because of their age and smoking history (
The baseline ELCAP CXR results are similar to those found in the Mayo Lung Project (MLP) [21] and to the recently reported baseline results of the Prostate, Lung, Colorectal, and Ovarian (PLCO) trial [22] (Fig. 6), which implies that the three cohorts had similar risk characteristics for lung cancer. Interim diagnoses must have been present in the MLP and PLCO baseline round, that is, prior to the first repeat round, but they were not reported. Annual repeat rounds of screening in ELCAP resulted in seven screen diagnoses of lung cancer and no interim diagnoses (7/1,184 = 0.59%) (Fig. 7) [23]. Of the seven, six (86%) were stage I (Table 1). The frequency of repeat diagnosis should be the same as that found in the absence of screening among people with the same risk characteristics. Given that the baseline frequency of lung cancer diagnosis for ELCAP using CXR was quite similar to that found in the MLP [21], it is to be expected that the frequency of diagnosis in the repeat rounds of ELCAP CT of 0.59% would be similar to that in the MLP of 0.55% [24].
Given the high proportion of stage I diagnoses in the baseline round of ELCAP, the prognostic prediction of a curability rate of 60%–80% because of CT screening was made [20], as had originally been predicted by the model based on prior randomized trials [19]. ELCAP also showed that the regimen of screening minimized additional procedures to rates similar to those found in mammography screening for breast cancer [25]. As a result, screening was rapidly expanded to other institutions in New York (NY-ELCAP) [26], which confirmed the ELCAP results. In view of the demand for screening, other institutions joined international (I)-ELCAP [27], which enrolled younger individuals and also people who had never smoked but were exposed to carcinogens by their occupation and/or secondhand smoke. Figure 7 shows that the frequency of baseline and annual repeat diagnosis of lung cancer decreased as the risk characteristics (i.e., age and smoking history) of the screenees decreased (i.e., lower age, lower smoking history, or more ex-smokers) in the subsequent studies. The diagnostic performance measures are unaffected by the frequency of lung cancer diagnosis. The proportion diagnosed in stage I in all three studies remained around 85% for both the baseline and annual repeat rounds of screening in NY-ELCAP [26] and I-ELCAP [27]. Interim diagnoses were rare and found prior to the first annual repeat screening, but not between the repeat rounds of screening. The estimated lead time for CT screening, based on the ratio of baseline to repeat screening, also remained around 4.5 in all three studies, much higher than the 1.5 years estimated using the same approach for CXR screening using the MLP [21, 24] or the 2.5 years estimated for mammography screening for breast cancer [25]. Diagnostic performance is dependent on the imaging used in the regimen and the regimen itself. The introduction of multidetector-row CT scanners and the reading of images on computer monitors instead of film after ELCAP should increase the proportion of stage I diagnoses and decrease the interim diagnoses. It did slightly increase the proportion diagnosed in stage I, but it also substantially increased the proportion of screen diagnoses in the baseline round; otherwise said, it decreased the proportion of baseline interim diagnoses (symptom prompted) from 7% (2/29) in ELCAP to 5% (3/104) in NY-ELCAP to 1% (5/405) in I-ELCAP. These three studies provided important information for updating the regimen. The implications of finding nonsolid and part-solid nodules were recognized [28]. Their significance was also recognized [29], the workup of nodules improved [30, 31], the usefulness of growth as an indication for biopsy was tested [32–36], and the workup of other findings such as mediastinal masses [37], cardiac calcifications [38], and emphysema [39] was formulated. Long-term follow-up of ELCAP, NY-ELCAP, and I-ELCAP provided the estimated curability rate of 82% for all diagnoses (screen and interim combined), regardless of stage and treatment (Fig. 8), essentially unchanged from that reported in 2006 [27]. When diagnosed at an early stage and being promptly treated, the estimated curability rate was 93%, whereas that of the comparison group with screen-diagnosed stage I lung cancer who refused treatment was 0%, because all died of lung cancer. The number of deaths prevented by early treatment of lung cancers diagnosed by CT screening was estimated by the proportion diagnosed in stage I multiplied by the curability rate in stage I (85% x 93% = 78%), or alternatively by the overall curability rate of 82%. This compares with some 7% of deaths that are currently prevented in the absence of screening, given by the ratio of the number of deaths to the number of new cases of lung cancer each year (164,000/174,000, Fig. 1).
The initial rapid decline in the survival rate for all lung cancer patients in Figure 8 is a result of deaths from lung cancer that occurred mostly in the first 4 years after diagnosis. They primarily occurred in those asymptomatic people who already had late-stage lung cancer when it was diagnosed and thus the screening did not prevent their deaths. The people whose deaths from lung cancer are prevented are those found with early-stage cancer by screening who would have otherwise been diagnosed 5–6 years later (given the lead time provided by CT) and then died within 2–3 years of diagnosis, overall some 7–9 years later. Because these deaths are prevented by early treatment following a screen diagnosis provided by the CT screening, both the survival rate in Figure 8 and the cumulative mortality rate will ultimately reach a plateau. CT screening in ELCAP and NY-ELCAP had fewer late-stage lung cancers than did CXR screening in the MLP and Memorial Sloan-Kettering Lung Project (MSKLP) [40]. In this comparison, the focus must be on the repeat rounds of screening. The overall frequency of ELCAP lung cancer diagnosis is close to that from the MLP and that from NY-ELCAP is close to that from the MSKLP, but the absolute and proportional numbers of late-stage cancers are significantly less for ELCAP and NY-ELCAP (Fig. 9). Figure 9 shows the higher proportion of late-stage cancer diagnoses, 0.29% from the MLP compared with 0.08% from ELCAP, and 0.22% from the MSKLP compared with 0.05% from the NY-ELCAP. This significant stage shift to early-stage cancers is highly suggestive of a decrease in the mortality rate when CT screening is used compared with CXR screening. Such a shift to earlier stage cancers with a subsequent lower number of deaths has been demonstrated for cervical cancer screening [41, 42].
Consider the schematic graph shown in Figure 10. The cumulative number of deaths is shown on the y-axis and the number of years of screening relative to baseline screening is shown on the x-axis. As shown in Figure 8, deaths in years 2–4 occur in those asymptomatic individuals whose cancer is found at a late stage. This cumulative number of deaths in the first 4 years reflects the deaths that would be found in an unscreened cohort with same risk for lung cancer. Projecting this same rate of deaths over time (dashed line), the number of deaths in the absence of screening can be compared with that actually observed in a screening program (solid line), and the difference between the dashed and solid lines shows the reduction in deaths from lung cancer that can be achieved by screening. Only after the fourth year of screening does the rate start to decrease. With increasing follow-up (as long as screening continues) there is a further reduction in deaths.
Figure 10 shows why screening needs to continue for long enough to demonstrate the actual reduction in deaths that is provided by screening. If follow-up extends only to year 4, as in the National Lung Screening Trial (NLST) [43, 44] whose median follow-up time will be 4 years by 2009, a <20% reduction can be anticipated. To see such a small reduction would require protocol compliance (of the two arms) and no delay in diagnosis and treatment [45]. As the NLST requires a 20% mortality reduction to be able to reject the null hypothesis of no benefit from screening [43], this is highly unlikely to be reached until there is longer follow-up [45]. Figure 10 also illustrates one of the key problems in a recent paper by Bach et al. [46], which focused mainly on the early years (the first 3–4 years after baseline). However, even in that analysis, a decrease in the cumulative mortality rate in years 5 and 6 can already be seen. A focus on the appropriate time when the reduction in mortality can reasonably be expected has already been highlighted in breast cancer screening [47, 48] and in colorectal screening [49, 50], both of which illustrated the need for longer screening and follow-up.
The ELCAP approach was designed to use screenings performed either as part of a research project or as part of practice-oriented research, both for efficiency and cost considerations as well as for rapid translation into clinical guidelines. Its efficiency is illustrated by ELCAP, which provided information on the diagnostic performance of CT screening [20] and showed the prognostic potential so that it could be expanded to New York state [26] and other sites throughout the world [27]. This accumulating body of evidence has been accomplished with very modest initial funding for 1,000 baseline and repeat screenings in ELCAP and the added enrollment of 30,000 participants, much less than the initial funding of the NLST comparing CT with CXR [43, 44]. The ELCAP approach does not have the problems that have been recognized as occurring in randomized trials [14, 51–61]. Biases of randomized trial results include having: (a) an insensitive outcome measure, the cumulative mortality rate, which does not focus on the relevant time when the number of deaths prevented by screening is seen [47, 51–55]; (b) an inadequate number of rounds of screening, so that a decrease in the mortality rate cannot be seen [19, 45]; (c) protocol nonadherence [45, 51, 59, 61]; (d) delays in diagnosis and/or treatment or participants choosing not to be treated [58], without any analysis of whether this proportion was the same in both arms of the trial; and (e) reliance on death certificates [60]. By the time that randomized screening trials are completed, there may also have been considerable technology drift so that the results are no longer relevant. This is demonstrated by the PLCO trial [62], which started in 1993 and will report the results of CXR in the next decade, when CXR is no longer relevant. Having an inadequate number of rounds of screening was clearly illustrated by the Minnesota Colorectal Study, which required extension to 10 years of screening from the originally planned 5 years [50]. Items (c)–(e) are particularly troublesome if the frequency of occurrence is different in the two arms of the randomized trial [45, 61]. Although randomization is used to provide comparability of the two arms at enrollment, it does not ensure comparability in those diagnosed with lung cancer (only 1%–3% of all participants), nor does it ensure comparability as to whether all had a timely diagnosis or treatment. To help overcome these problems, a large number of participants is required, markedly increasing the cost and time required for such trials. An alternative that was suggested by the designers of some of these large randomized screening studies [61] was to perform a limited mortality analysis of these trials focusing only on the relevant cases in each arm of the randomized trials instead of all the cases. Probably because of the considerable cost and time and inherent difficulties of these trials, only one randomized screening study for colorectal cancer (the Minnesota study) [50], one for breast cancer [48], and one for lung cancer (which evolved into three separate smaller studies) [63] have been completed in the U.S. Consequently, emerging, promising modalities are not evaluated scientifically. Often the tests are simply integrated into the medical care system. For colon cancer screening, new tests (e.g., colonoscopy and/or virtual colonoscopy with CT) have essentially replaced sigmoidoscopy, and the latter is used in the ongoing PLCO trial, but the efficacy of these new tests has not been tested using a randomized screening trial design. For lung cancer, CT was already available when the CXR and sputum trials were started in the mid 1980s, but CXR is still being tested in the PLCO trial [22, 62], and these results will not be reported in next decade. For coronary artery calcification screening with CT, no randomized trial has ever been performed, although large national cohort studies have been started. Thus, frequently, scientific evidence for formulation of national policies is not available for these emerging technologies.
Concerns about biases in the ELCAP approach have been raised and addressed previously [9]. The key bias of concern in the ELCAP design is overdiagnosis [9, 64], because the other two biases—length and lead time—affect both the ELCAP and the randomized screening trials.
Length Bias
Screening for cancer is done to provide earlier diagnosis and earlier treatment, when it is more effective. In the ELCAP design, a bias is introduced when there is insufficiently long follow-up. The ELCAP curability rate is subject to lead-time bias if the Kaplan-Meier survival curve has not yet reached its asymptote (the point where the curve reaches a plateau and no longer decreases). If there is lead-time bias, the estimated curability rate is higher than it will be when it reaches its asymptote. Kaplan-Meier survival analysis is the standard approach used in oncology trials. It adjusts for incomplete follow-up of those diagnosed with the cancer (i.e., the fact that not all patients have been followed for the same length of time). I-ELCAP waited to report 10-year survival rates so that the plateau was clearly reached, which was around 5–6 years after diagnosis (Fig. 8) [27]. Its curability estimate has no lead-time bias. Randomized screening trials also have lead-time bias when they do not provide for sufficient follow-up in their design, but in this case the lead-time bias exists because the mortality difference is underestimated. The period of screening and follow-up needs to be long enough (dependent on lead-time) to focus on the time period when a decrease in deaths as a result of screening can be anticipated and when there is no longer a lead-time bias [45, 47, 49]. Lead-time bias is also introduced when comparing a treatment that has a lead time with a treatment without a lead time. I-ELCAP did not do this. I-ELCAP evaluated the effectiveness of treatment contrasted with no treatment in screened patients, all of whom have the same lead time (Fig. 11).
Overdiagnosis The other concern is that the curability rate might be inflated as overdiagnosed lung cancers might be included [2, 9, 64]. As randomized screening trials focus only on deaths, this bias is not of concern, but it is a concern for the ELCAP approach [9]. Overdiagnosis occurs in two ways: (a) a "cancer" is detected by screening that would never have been life-threatening, even when not resected, but because of screening it was detected and the person is thus subjected to diagnostic tests and treatment when in fact that person was not at risk for dying from the cancer; (b) a genuine life-threatening cancer is diagnosed but the person dies of another disease or accident so that the screening has not saved the life. In other words the person dies of a competing cause [65, 66]. This concern needs to be addressed in the eligibility criteria so that the life expectancy of participants is sufficient to justify the screening. Of these two concerns, the main concern is that those with a slow-growing cancer may inflate the estimated curability rate and that these patients would have also undergone surgery for a non–life-threatening lesion. For lung cancer, multiple reports have shown that identifying slow-growing cancers is not a significant concern and does not account for the survival differences reported in the three randomized screening trials for lung cancer. Almost all of those who were diagnosed with stage I lung cancer as a result of CXR screening and refused treatment died of their disease, as demonstrated by Flehinger et al. [58], Yankelevitz et al. [67], and Sobue et al. [68] in Japan. In the absence of screening, this was also found in the analyses by Henschke et al. [69] using data in the Surveillance and End Results (SEER) registry and by Raz et al. [70] based on California registry data. The I-ELCAP protocol [18] directly addresses the issue in several ways: (a) In both baseline and repeat rounds of screening, the regimen requires demonstration of in vivo growth at a malignant rate prior to recommending biopsy. (b) All resected specimens are reviewed by an international panel of pathology experts who confirm that they are all genuine lung cancers and that 95% of them are already invasive [10]. (c) Those who delayed their diagnosis or treatment showed progression of their disease in NY-ELCAP [26] and I-ELCAP [27] and those who were in stage I who had no treatment all died from lung cancer [27] (Fig. 8). In ELCAP, NY-ELCAP, and I-ELCAP, the lung cancer diagnosis rate in annual repeat rounds of screening was essentially the same as in prior screening trials performed in the 1970s (Figs. 6 and 7). Thus, there is no evidence of overdiagnosis in the repeat rounds of screening. Also, repeat cancers in ELCAP, by definition, were not seen in the prior screening 1 year earlier. Thus, the volume doubling time of the cancer must be 200 days or less, if the lower limit of nodule detectability is one with a diameter of 2 mm [65]. A cancer with a volume doubling time of 200 days is an aggressive cancer and does not fit the profile of an overdiagnosed one. For baseline screening, we determined that 87% of clinical stage I patients had genuine, life-threatening cancers, defined as the cancer having a doubling time <400 days [29].When considered separately by nodule consistency, it was 96% for cancers manifesting as solid nodules, 90% for cancers manifesting as part-solid nodules, and 67% for cancers manifesting as nonsolid nodules [29]. Only adenocarcinoma is diagnosed in nonsolid nodules; hence, adenocarcinoma diagnosed in a nonsolid nodule is the most suspect case for being an overdiagnosed cancer [10]. Thus, we identified that some of the adenocarcinomas manifesting as nonsolid nodules were slow growing, in addition to the already well-known cell types such as typical carcinoids, which manifest as solid nodules and which are known to be slow growing. If these steps are not sufficient to address the concern about overdiagnosis then, ethically, a randomized treatment trial could be performed in which patients with potentially overdiagnosed screen-diagnosed lung cancers are randomly assigned to either immediate treatment or delayed treatment as was done for prostate cancer [13]. The second issue of competing causes of death is addressed by setting reasonable admissibility requirements for screening based on actual data obtained from I-ELCAP. It is suggested that, because the lead time provided by CT is about 4.5 years, the life expectancy of a person undergoing CT screening should be at least 10 years. ELCAP further addressed competing causes of death directly by performing a survival analysis focusing on non–lung cancer deaths, which showed that older smokers and former smokers have a high life expectancy if they do not die from lung cancer. Their 10-year survival rate for death other than lung cancer was 93% [66].
In Japan, starting in 1993, CT was added to an already long existing practice of screening for lung cancer using CXR [71], presumably for similar reasons that led to the initiation of ELCAP. In another study in Nagano prefecture [72–74], which started in 1996, screening was performed using CT and CXR. These two studies have reported their long-term survival rates, which are similar to those of I-ELCAP. Table 2 gives the results of other CT studies as well [75–77].
These studies show that the lung cancer rate depends on risk characteristics (e.g., age and smoking history), and it is in the range of 0.1%–0.8% per 1,000 for those having annual repeat screening. To enhance the comparison, consistent definitions of baseline cancers, repeat cancers, and interim cancers need to be used as stated at the beginning of this paper. Similarly, the definition of stage I disease should be clearly defined in each study and in the context of screening. Stage I diagnoses should include both non-small cell and small cell lung cancers without lymph node metastases and also identify multiple adenocarcinomas without lymph node metastases, because the latter should have the benefit of resection [65] (Vazquez M, Carter D, Brambilla E et al., unpublished data). All the studies showed a high proportion of stage I diagnoses, in the range of 71%–100%. The frequency of stage I diagnosis depends on the regimen of screening and adherence to it and on the definition of a stage I diagnosis. Not many interim diagnoses of lung cancer were reported, and among those who performed sputum cytology, only a few were detected by sputum cytology alone. Stimulated by the demand for CT screening resulting from ELCAP publication, the NLST was developed [43, 44]. It was designed by the PLCO trial investigators together with the American College of Radiology Imaging Network and used the same design as prior randomized trials for lung cancer in the U.S. [21, 40, 50, 63]. The control arm of the NLST is identical to the control arms of the MSKLP and Johns Hopkins Hospital Lung Project, in which all received annual CXR screening [63]. Individuals in the intervention arm of the NSLT were provided annual CT. Three rounds of screening were provided—a baseline and two annual repeat rounds of screening. Enrollment in the NLST started in 2002 and ended by mid-2004, so that when follow-up ends in 2008, the median follow-up time after diagnosis will be only some 4 years in 2009 when the results are to be reported. The regimen of screening was not well defined and not enforced or checked at the participating institutions. The traditional outcome measure of cumulative mortality rate is to be used and there is no mention of performing a limited mortality analysis that would focus on the timeliness of diagnosis or treatment or on deaths during the relevant time interval for which the screening benefit is expected, an important limitation of the traditional analysis of such trials that was recognized by the designers of the PLCO trial and NLST as early as 1983 [61]. Reanalysis of the MLP, focusing only on the lack of protocol nonadherence, showed that CXR might have provided as much as a 43% reduction in the number of deaths from lung cancer [59]. A pilot study of 3,000 participants was performed prior to the start of the NLST [78] to demonstrate that randomization was feasible, but it also suggested lack of adherence, particularly for those randomized to the CXR arm. A regimen for the workup of screen-detected nodules was specifically not included in that study and it resulted in a low proportion of stage I diagnoses in the CT arm. Further, the proportion of cancer diagnoses among those having invasive procedures was low in comparison with ELCAP, NY-ELCAP, and I-ELCAP, suggesting that there was a poor understanding of the importance of a workup protocol and also poor compliance with any of the known recommended workup algorithms [18]. Different cost-effectiveness analyses have shown that CT screening for lung cancer is very cost-effective [79–83], with the exception of one theoretical study [84]. These analyses look at the overall cost per life-year saved, but ideally a cost-effectiveness assessment would be done on an individualized basis [85]. Such an individualized assessment would determine the life expectancy of the person (in light of the personal risk indicators) and the risk for competing causes of death in order to determine how many years of life might be saved by the screening round that is being contemplated.
Conception/design: Claudia I. Henschke, David F. Yankelevitz Administrative support: Claudia I. Henschke, David F. Yankelevitz Provision of study materials or patients: Claudia I. Henschke, David F. Yankelevitz Collection/assembly of data: Claudia I. Henschke, David F. Yankelevitz Data analysis and interpretation: Claudia I. Henschke, David F. Yankelevitz Manuscript writing: Claudia I. Henschke, David F. Yankelevitz Final approval of manuscript: Claudia I. Henschke, David F. Yankelevitz
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||