American Journal of Epidemiology Advance Access originally published online on July 13, 2006
American Journal of Epidemiology 2006 164(7):689-696; doi:10.1093/aje/kwj243
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Practice of Epidemiology |
Up-to-date and Precise Estimates of Cancer Patient Survival: Model-based Period Analysis
1 Department of Epidemiology, German Centre for Research on Ageing, Heidelberg, Germany
2 Division of Clinical Epidemiology and Aging Research, German Cancer Research Center, Heidelberg, Germany
3 Finnish Cancer Registry, Institute for Statistical and Epidemiological Cancer Research, Helsinki, Finland
Correspondence to Dr. Hermann Brenner, Division of Clinical Epidemiology and Aging Research, German Cancer Research Center, Bergheimer Strasse 20, D-69115 Heidelberg, Germany (e-mail: h.brenner{at}dkfz-heidelberg.de).
Received for publication November 18, 2005. Accepted for publication March 20, 2006.
| ABSTRACT |
|---|
|
|
|---|
Monitoring of progress in cancer patient survival by cancer registries should be as up-to-date as possible. Period analysis has been shown to provide more up-to-date survival estimates than do traditional methods of survival analysis. However, there is a trade-off between up-to-dateness and the precision of period estimates, in that increasing the up-to-dateness of survival estimates by restricting the analysis to a relatively short, recent time period, such as the most recent calendar year for which cancer registry data are available, goes along with a loss of precision. The authors propose a model-based approach to maximize the up-to-dateness of period estimates at minimal loss of precision. The approach is illustrated for monitoring of 5-year relative survival of patients diagnosed with one of 20 common forms of cancer in Finland between 1953 and 2002 by use of data from the nationwide Finnish Cancer Registry. It is shown that the model-based approach provides survival estimates that are as up-to-date as the most up-to-date conventional period estimates and at the same time much more precise than the latter. The modeling approach may further enhance the use of period analysis for deriving up-to-date cancer survival rates.
epidemiologic methods; models, statistical; neoplasms; prognosis; registries; survival
| INTRODUCTION |
|---|
|
|
|---|
Monitoring cancer patient survival is an important task of both clinical and population-based cancer registries. To be informative, estimates of cancer patient survival should be as up-to-date as possible. Period analysis, a new method of survival analysis introduced a few years ago (1
There is, though, a trade-off between up-to-dateness and precision of period estimates. To be as up-to-date as possible, the period included in the analysis should ideally be restricted to a relatively short time span, such as the most recent calendar year for which cancer registry data are available (e.g., to the year 2002 in the aforementioned example). However, for cancer registries covering relatively small populations, or for rare forms of cancer, such period estimates for single-year periods may often lack reasonable precision. An obvious way to overcome the lack of precision, which has been followed in previous applications of period analysis (7
11
), is to enlarge the period to, for example, 5 years (to the 19982002 period in the example given above). However, the gain in precision achieved that way comes at the price of a loss of up-to-dateness (in the example given above, this loss would be on average 2 years).
It would therefore be highly desirable to find ways to derive the most up-to-date period estimates of survival with enhanced precision. In this paper, we introduce and evaluate a model-based analytical approach aimed to achieve that goal.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Database
Our analysis is based on data from the nationwide Finnish Cancer Registry, covering a population of about 5 million people, which is well known for its high levels of completeness and data quality (12
Statistical analysis
Throughout this paper, we present relative rather than absolute survival rates, as the former are most commonly reported by cancer registries. Relative survival rates reflect the probability of surviving the cancer of interest rather than the total survival probability (13
, 14
), taking expected deaths in the absence of cancer into account. For this analysis, the expected numbers of deaths were derived from age-, gender-, and calendar period-specific mortality values of the general population of Finland according to the Ederer II method (15
).
We first assessed the overall trends in survival during the past decades by comparing the 5-year relative survival of patients diagnosed in 1962 and of patients diagnosed in 1997.
Next, the 5-year relative survival actually observed for patients diagnosed in 1997 and followed through 2002 (figure 1, solid frame) was compared with the most up-to-date estimates of 5-year relative survival that would potentially have been available in 1997 (the year of diagnosis of this cohort) by the following methods of survival analysis: 1) a period analysis for the year 1997 only (figure 1, dotted frame); 2) a period analysis for the 19931997 period, that is, a period including 5 recent years (figure 1, dashed frame); and 3) a model-based period estimate for the year 1997.
|
For derivation of the model-based period estimate for 1997, we used the same database as in conventional period analysis for 19931997 (figure 1, dashed frame). However, rather than simply pooling observations within that period, we modeled survival probabilities for each combination of calendar year and year of follow-up within that period. For that purpose, we first calculated the numbers of patients at risk and of deaths by year of follow-up for each single calendar year from 1993 to 1997, just as one would do in conventional period analyses for each of these calendar years. Next, we used a Poisson regression model for the total 19931997 period (16
The model-based approach provides a general framework that encompasses conventional cohort or period analyses as special cases of applications of saturated models. For example, a conventional period estimate of 5-year survival can be obtained from a saturated model, in which follow-up year-specific numbers of patients at risk and deaths are pooled over all calendar years included in the period of interest (or in which the period of interest includes just 1 calendar year). This way, only five observations are included in the regression model, from which five parameters are estimated (one for each year of follow-up, none for calendar year). To ensure perfect comparability of results for conventional and modeled period analysis, we derived the former as special cases from saturated models by the same computer programs used for the modeling approach as outlined above.
To address the performance of the various methods in a much broader range of settings, we repeated the analyses explained for the year 1997 in the preceding paragraphs for each single calendar year from 1962 to 1997 (which is the widest possible range of calendar years for which calculations could be carried out with the current database of the Finnish Cancer Registry). We calculated the following summary indicators of the performance of the various methods: the mean difference and the mean squared difference between the 5-year relative survival later observed for patients diagnosed in the respective year and the various estimates potentially available in that year. The mean differences reflect the average under- or overestimation of the 5-year relative survival rates. The mean squared differences, in addition, reflect the random variation in the various estimates.
Finally, we used the model-based approach to provide estimates of 5-year relative survival for the year 2002, the most recent year included in the database, and for estimating the trend in survival within 19982002, the most recent 5-year period included in the database.
The analyses were carried out by use of the SAS statistical software package (SAS Institute, Inc., Cary, North Carolina). For all survival analyses, the macro period was used to derive the numbers of patients at risk and of deaths by year of follow-up only (conventional cohort and period analysis) or by year of follow-up and by calendar year (modeled approach) (6
, 17
). Some minor formal modification of the output was made to facilitate the subsequent steps. Next, the procedure GENMOD (SAS Institute, Inc.) was used to carry out Poisson regression, and the output of the regression models was used to carry out the subsequent calculations as outlined in the Appendix.
| RESULTS |
|---|
|
|
|---|
Overall, 682,867 patients aged 15 years or older were reported to the Finnish Cancer Registry with a first diagnosis of cancer between 1953 and 2002. Of these, we excluded 2.3 percent reported by death certificate only, another 2.5 percent reported by autopsy only, and 0.1 percent because of missing information on month of diagnosis. The 20 forms of cancer specifically addressed in this paper include about 89.9 percent of the remaining cancer cases.
The numbers of patients, as well as 5-year relative survival rates of patients with these 20 forms of cancer, are shown for the years 1962 and 1997 in table 1. These are the earliest and the latest year for which all the calculations outlined above could be carried out with the current database of the Finnish Cancer Registry. In 1962, stomach cancer, followed by lung cancer, was by far the most common form of cancer in Finland. The annual numbers of cases strongly increased for most forms of cancer between 1962 and 1997. The increase was most pronounced for breast cancer and especially prostate cancer (an almost tenfold increase), which have become the leading cancer diagnoses in 1997. For a few forms of cancer (cancers of the esophagus, stomach, and cervix uteri), the numbers of incident cases declined over time.
|
Five-year relative survival strongly varied by cancer site for patients diagnosed in 1962, ranging from 64.3 percent for patients with endometrial cancer to 2.5 percent for patients with pancreatic cancer among the types of cancer included in this study. Five-year relative survival rates increased between 1962 and 1997, albeit to a strongly varying degree, for all of the assessed forms of cancer. A most pronounced increase by about 50 percent of units was seen for patients with prostate cancer and patients with thyroid cancer. With about 91 percent, the latter had the highest 5-year relative survival rates in 1997 among the cancer patients included in this analysis. On the other hand, only very small improvements could be achieved for patients with the most fatal forms of cancer, such as liver, pancreas, or lung cancer.
Table 2 shows the estimates of 5-year relative survival potentially available in 1997 by the three analytical approaches in comparison with the 5-year relative survival rates later observed for patients diagnosed in that year. To facilitate illustration of the major patterns, cancers are ordered according to the increase in survival over time from 1962 to 1997. The point estimates obtained by the conventional and the modeled period analyses for 1997 were in general quite similar, and the standard errors of the modeled period estimates were much lower, that is, at intermediate levels between the levels obtained in the conventional period analyses encompassing periods of 1 year (1997) and 5 years (19931997), respectively. Typically, the standard errors of modeled period estimates were 2540 percent lower compared with the standard errors of the conventional period estimates (i.e., the variances were typically reduced by about 50 percent or even more).
|
For 10 of 20 forms of cancer, the modeled period analysis for 1997 provided point estimates of 5-year relative survival that were closest to the 5-year relative survival rates later observed for patients diagnosed in 1997. For the conventional period analysis for the year 1997 or the period 19931997, this was true for five and six forms of cancer only. The modeled period analysis almost always performed best for those cancers for which the prognosis substantially increased over time, whereas the conventional period analysis for the 19931997 period performed best for the few cancers with very poor and hardly improving prognosis (cancers of the esophagus, lung, liver, and pancreas).
With few exceptions, all types of analyses provided, on average, somewhat too pessimistic estimates of the 5-year relative survival rates later observed for patients diagnosed in each single year between 1962 and 1997, which can be seen from the negative values of most of the mean differences shown in table 3. This underestimation was generally larger for the conventional period analyses for the most recent 5 years than for the conventional and the modeled period estimates for the most recent year that performed best according to this criterion for eight and 10 forms of cancer, respectively.
|
Furthermore, the modeled period estimates showed the smallest mean squared difference for 13 of 20 forms of cancer, whereas this was true for only two and five forms of cancer for the conventional 1- and 5-year period estimates, respectively. The modeled period analysis almost always performed best for cancers with at least some moderate increase in survival according to this criterion as well. The advantages of conventional period analysis were again essentially confined to the few cancers with virtually no improvement over time.
Table 4 shows the results of a conventional period analysis and a modeled period analysis for the 19982002 period, the most recent period for which data were available at the time of this analysis. For the modeled period analysis, modeled 5-year relative survival estimates are shown for 1998 and 2002, the first and the last year of the period. This way, both the most up-to-date estimates for the year 2002 and also the trends within the 19982002 period can be seen, and two-sided p values for tests of linear trend derived from the models are also given. The modeled period analysis discloses major significant improvements in 19982002 for patients with breast, ovarian, prostate, and thyroid cancers. With an increase of 5.3, 7.4, 10.7, and 5.7 percent, respectively, between 1998 and 2002, the 5-year relative survival rates reached 88.8, 50.7, 87.8, and 92.4 percent, respectively, in 2002 for these four forms of cancer. These important trends would not have been disclosed with a conventional period analysis for the 19982002 period.
|
| DISCUSSION |
|---|
|
|
|---|
In this paper, we introduced an extension of period analysis of cancer patient survival that provides the most up-to-date period estimates of survival (restricted to the most recent calendar year for which data are available) at much higher levels of precision. More specifically, standard errors of 5-year survival rates were often reduced by 2540 percent (i.e., variances of estimates were often reduced by about 50 percent or even more). In addition, as illustrated in our analysis for the 19982002 period, the modeling approach can also be used to disclose recent trends in survival in an efficient manner.
The modeling approach presented in this paper does not include any extrapolation of trends beyond the time periods under investigation, which is both a strength and a potential weakness. It is a strength as it prevents the occurrence of potentially misleading results that may arise when previously observed trends change after the period under investigation. It is a potential weakness, as extrapolation may provide even more up-to-date estimates if recent trends in survival are ongoing.
The strong reduction in the variance of survival estimates implies that typically only half as many patients or even less are needed to come up with similarly precise and up-to-date estimates of cumulative survival compared with conventional period analysis. This way, modeled period analyses focusing on the most recent year may often provide reasonably precise survival estimates even in small cancer registries, for rare forms of cancer, or for specific subgroups of cancer patients. However, because the estimates of linear trends may be less reliable in such situations, we carried out additional analyses to assess the performance of the modeling approach with smaller numbers of patients. For that purpose, we repeated the analyses presented in tables 2 and 3 for a randomly selected subsample of 40 percent of patients, thereby simulating a situation of a smaller cancer registry with a population base of around 2 million people as opposed to the population base of about 5 million people in Finland. As expected from theory, the standard errors of the modeled period estimates increased but were still in the same order of magnitude or only slightly higher than were the standard errors of conventional period estimates for the most recent single year in the analyses for the full database. As in the full database, the conventional period estimate and the modeled period estimate for the most recent single year performed much better than did the conventional period estimate for the most recent 5 years in terms of the mean difference from later-observed 5-year relative survival. Regarding the mean squared difference, the modeled period estimate again clearly performed best, and the conventional period estimate for the most recent year performed worst.
In our modeling approach, a linear trend for the conditional survival estimates within the 5-year periods used for modeling was assumed. Alternatively, the calendar years could also be entered as a categorical rather than as a numerical variable in the models. The former approach would be less susceptible to potential bias from violation of the linearity assumption. Obviously, however, the model could not be used to estimate a linear trend any more. Furthermore, use of a single numerical variable would be statistically more efficient in many situations. The difference in results obtained by both methods may often be small though. In our data set, we obtained very similar results when the analyses were repeated with the calendar year entered as a categorical variable in additional sensitivity analyses. In particular, the advantages of the modeled period analysis over conventional period analysis remained essentially unchanged. It therefore appears difficult to give a general recommendation on how the calendar year should be entered into the model. Possibly, flexible use based on preliminary diagnostics (both visual and model based) may be the most prudent recommendation.
In our analysis, we chose a period of 5 calendar years to be used for the modeling. One reason for this choice was to ensure that the differences of the modeled period estimates for the most recent year and the conventional period estimates for the most recent 5 years could be exclusively assigned to the different methods, not to differences in the database, as the same database was used for both approaches. Clearly, periods of different lengths might be used for both the modeling approach and for conventional period analyses. In general, the precision of survival estimates would increase with increasing length of periods for both types of analyses. In conventional period analysis, however, the increasing length of periods would go along with a loss of up-to-dateness. With the modeled period approach, this would not need to be the case, as the trends within the period are taken into account. However, use of a single trend parameter for a linear trend might become increasingly problematic with increasing lengths of periods.
In this paper, presentation was restricted to 5-year relative survival rates. Analogous calculations were carried out for 5-year absolute survival rates that required only one simple modification of the model definition (refer to the Appendix). Results were not shown separately, however, simply to save space, as the differences between the various methods were generally very similar. The modeling approach may also be used, particularly during the early years of follow-up, for more long-term survival rates, such as 10-year or 20-year survival rates, for which the application of period analysis compared with other analytical techniques may be particularly useful (2
4
). Finally, the modeling approach may also provide a convenient framework to assess and to account for the role of additional covariates, such as age or stage at diagnosis, but this is beyond the scope of this paper.
Calculation of the model-based estimates is somewhat more complex than is calculation of the conventional period estimates, as it requires multiple program steps beyond mere application of previously published macros for period analysis (17
). Nevertheless, for an experienced programmer, it is feasible with a reasonable amount of time and effort, and we will be glad to share our model programs on request with researchers interested in implementing this methodology. Once established for the specific data setting of a cancer registry, the method can be easily transferred in routine monitoring practice. The modeling approach has the additional advantage to provide a more general and flexible framework for survival analysis, which includes conventional cohort analysis or period analysis as special applications.
We therefore feel that, despite its limitations, model-based period analysis might be a useful tool to derive both up-to-date and precise estimates of cancer patient survival, and we would like to encourage its application in the analysis of data from both clinical and population-based cancer registries.
| APPENDIX |
|---|
|
|
|---|
Let
- lij = the effective numbers of persons at risk (accounting for late entries and withdrawals as half persons),
- dij = the observed numbers of deaths, and
- eij = the expected numbers of deaths (from population life tables)
- dij = the observed numbers of deaths, and
i
5) and calendar year j. The calendar years are coded in such a way that j = 0 for the first calendar year of the calendar period included in the modeling.
Then, a generalized linear model, dij = f(i,j), is fitted with outcome dij, Poisson error structure, predictor variables i (categorical) and j (linear), link ln(µij
), and offset ln(lij dij/2), where µij = the model-based numbers of deaths and d* = (lij dij/2) x ln((lij eij)/lij).
Let
i and ß be the estimated regression coefficients for follow-up years i (1
i
5) and for a 1-year increase in calendar year, and let var(
i), var(ß), cov(
i,
k), and cov(
i, ß) be the variances and covariances of
i and ß. Then, estimates of conditional relative survival for each combination of follow-up year i and calendar year j are given as
![]() |
![]() |
![]() |
![]() |
| ACKNOWLEDGMENTS |
|---|
Conflict of interest: none declared.
| References |
|---|
|
|
|---|
- Brenner H, Gefeller O. An alternative approach to monitoring cancer patient survival. Cancer 1996;78:200410.[CrossRef][ISI][Medline]
- Brenner H, Hakulinen T. Advanced detection of time trends in long-term cancer patient survival: experience from 50 years of cancer registration in Finland. Am J Epidemiol 2002;156:56677.
[Abstract/Free Full Text] - Brenner H, Hakulinen T. Up to date survival curves of patients with cancer by period analysis. J Clin Oncol 2002;20:82632.
[Abstract/Free Full Text] - Brenner H, Söderman B, Hakulinen T. Use of period analysis for providing more up-to-date estimates of long-term survival rates: empirical evaluation among 370,000 cancer patients in Finland. Int J Epidemiol 2002;31:45662.
[Abstract/Free Full Text] - Talbäck M, Stenbeck M, Rosén M. Up-to-date long-term survival of cancer patients: an evaluation of period analysis on Swedish Cancer Registry data. Eur J Cancer 2004;40:136172.[CrossRef][ISI][Medline]
- Brenner H, Gefeller O, Hakulinen T. Period analysis for up-to-date cancer survival data: theory, empirical evaluation, computational realisation and applications. Eur J Cancer 2004;40:32635.[CrossRef][ISI][Medline]
- Brenner H, Hakulinen T. Long term cancer patient survival achieved by the end of the 20th century. Most up-to-date estimates from the nationwide Finnish Cancer Registry. Br J Cancer 2001;85:36771.[CrossRef][ISI][Medline]
- Aareleid T, Brenner H. Trends in cancer patient survival in Estonia before and after the transition from a Soviet republic to an open market economy. Int J Cancer 2002;102:4550.[Medline]
- Coleman MP, Rachet B, Woods LM, et al. Trends and socioeconomic inequalities in cancer survival in England and Wales up to 2001. Br J Cancer 2004;90:136773.[CrossRef][ISI][Medline]
- Talbäck M, Rosén M, Stenbeck M, et al. Cancer patient survival in Sweden at the beginning of the third milleniumpredictions using period analysis. Cancer Causes Control 2004;15:96776.[ISI][Medline]
- Brenner H, Stegmaier C, Ziegler H. Long-term survival of cancer patients in Germany achieved by the beginning of the 3rd millennium. Ann Oncol 2005;16:9816.
[Abstract/Free Full Text] - Teppo L, Pukkala E, Lehtonen M. Data quality and quality control of a population-based cancer registry. Experience in Finland. Acta Oncol 1994;33:3659.[ISI][Medline]
- Ederer F, Axtell LM, Cutler SJ. The relative survival rate: a statistical methodology. Natl Cancer Inst Monogr 1961;6:10121.[Medline]
- Henson DE, Ries LA. The relative survival rate. Cancer 1995;76:16878.[CrossRef][ISI][Medline]
- Ederer F, Heise H. Instructions to IBM 650 programmers in processing survival computations. Bethesda, MD: National Cancer Institute, 1959. (Methodological note no. 10, End Results section).
- Dickman PW, Sloggett A, Hills M, et al. Regression models for relative survival. Stat Med 2004;23:5164.[CrossRef][ISI][Medline]
- Arndt V, Talbäck M, Gefeller O, et al. Modification of SAS macros for more efficient analysis of relative survival rates. Eur J Cancer 2004;40:7789.[CrossRef][ISI][Medline]
- Chiang CL. Introduction to stochastic processes in biostatistics. New York, NY: Wiley, 1968:18.
This article has been cited by other articles:
![]() |
H. Brenner, A. Gondos, and D. Pulte Recent trends in long-term survival of patients with chronic myelocytic leukemia: disclosing the impact of advances in therapy on the population level Haematologica, October 1, 2008; 93(10): 1544 - 1549. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Pulte, A. Gondos, and H. Brenner Trends in 5- and 10-year Survival After Diagnosis with Childhood Hematologic Malignancies in the United States, 1990-2004 J Natl Cancer Inst, September 17, 2008; 100(18): 1301 - 1309. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Brenner, A. Gondos, and D. Pulte Trends in long-term survival of patients with chronic lymphocytic leukemia from the 1980s to the early 21st century Blood, May 15, 2008; 111(10): 4916 - 4921. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Pulte, A. Gondos, and H. Brenner Improvements in survival of adults diagnosed with acute myeloblastic leukemia in the early 21st century Haematologica, April 1, 2008; 93(4): 594 - 600. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Brenner, A. Gondos, and D. Pulte Ongoing improvement in long-term survival of patients with Hodgkin disease at all ages and recent catch-up of older patients Blood, March 15, 2008; 111(6): 2977 - 2983. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Pulte, A. Gondos, and H. Brenner Ongoing Improvement in Outcomes for Patients Diagnosed as Having Non-Hodgkin Lymphoma From the 1990s to the Early 21st Century Arch Intern Med, March 10, 2008; 168(5): 469 - 476. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Brenner, A. Gondos, and D. Pulte Recent major improvement in long-term survival of younger patients with multiple myeloma Blood, March 1, 2008; 111(5): 2521 - 2526. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Brenner and T. Hakulinen Maximizing the Benefits of Model-Based Period Analysis of Cancer Patient Survival Cancer Epidemiol. Biomarkers Prev., August 1, 2007; 16(8): 1675 - 1681. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Brenner, A. Gondos, and V. Arndt Recent Major Progress in Long-Term Cancer Patient Survival Disclosed by Modeled Period Analysis J. Clin. Oncol., August 1, 2007; 25(22): 3274 - 3280. [Abstract] [Full Text] [PDF] |
||||
![]() |
A Gondos, B Holleczek, V Arndt, C Stegmaier, H Ziegler, and H Brenner Trends in population-based cancer survival in Germany: to what extent does progress reach older patients? Ann. Onc., July 1, 2007; 18(7): 1253 - 1259. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











