|
|
||||||||
ORIGINAL RESEARCH |
From the Department of Obstetrics and Gynecology and Department of Preventive Medicine and Biometrics, Uniformed Services University of the Health Sciences, Bethesda, Maryland; Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Walter Reed Army Medical Center, Washington, DC; Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Tripler Army Medical Center, Honolulu, Hawaii; and Clinical Pathology Associates, Louisville, Kentucky.
Address reprint requests to: Mary F. Parker, MD, 3 English Ivy Court, Rockville, MD 20854; E-mail: parker{at}tatrc.org.
| ABSTRACT |
|---|
|
|
|---|
METHODS: After routine histologic assessment within the hospital pathology department, 119 colposcopic cervical biopsies were interpreted by two subspecialty-trained gynecologic pathologists (GYN I and GYN II) blinded to each others interpretations and to the interpretations of the hospital general pathologists (GEN). Biopsies were classified as normal (including cervicitis), low grade (LG, including CIN I and human papillomavirus changes), and high grade (HG, including CIN II/III). The interobserver agreement rates between GEN and GYN I, between GEN and GYN II, and between GYN I and GYN II were described using the
statistic. The proportions of biopsies assigned to each biopsy class were compared using McNemar test.
RESULTS: Interobserver agreement rates between GEN and GYN I were moderate for normal (
= 0.53) and LG (
= 0.46) and excellent for HG (
= 0.76). There were no significant differences in the classifications between GEN and GYN I. Interobserver agreement rates between GEN and GYN II were moderate for normal (
= 0.50) and LG (
= 0.44) and excellent for HG (
= 0.84). Also, GYN II was significantly more likely to classify biopsies as normal (P < .001) and less likely to classify biopsies as LG (P < .001). The interobserver agreement rates between GYN I and GYN II were moderate for normal (
= 0.61) and LG (
= 0.41) and excellent for HG (
= 0.84). Also, GYN II was significantly more likely to classify biopsies as normal (P < .001) and less likely to classify biopsies as LG (P = .01).
CONCLUSION: Interobserver agreement between two gynecologic pathologists was no better than that observed between general and gynecologic pathologists. Subspecialty review of cervical histology does not enhance diagnostic consensus of CIN.
Cervical biopsy slides from research protocols involving cervical intraepithelial neoplasia (CIN) are frequently sent to a panel of pathologists or a designated study pathologist for review to confirm or refute the initial diagnosis. An underlying assumption of establishing central pathologic review is that consensus of diagnosis will be enhanced when interpretation is rendered by a group of experienced or subspecialty-trained pathologists.
Available literature suggests the overall interobserver agreement rate for the interpretation of CIN among experienced general pathologists or among subspecialty-trained pathologists is moderate at best. Although good-to-excellent agreement is found for CIN III and invasive cancer, poor agreement is noted for low-grade (LG) categories such as borderline atypia, human papillomavirus changes, and CIN I.112 The overall agreement rates between reviewing panels, or a study pathologist, and general pathologists are also moderate, with the greatest degree of discrepancy again found for LG disease.13,14 There is little published information directly comparing the agreement rates among subspecialty-trained pathologists with those among general pathologists.14
We sought to determine if subspecialty review of cervical histology by gynecologic pathologists improves diagnostic consensus of CIN. We planned to measure and compare the interobserver agreement rates between two experienced, dually board-certified gynecologic pathologists and between each of the gynecologic pathologists and a group of general pathologists. The slides used for this study were obtained from a project in which a device using fluorescence spectroscopy is being developed to diagnose CIN. Biopsies were obtained from areas suspected to be normal as well as from colposcopic lesions suggestive of CIN.
| MATERIALS AND METHODS |
|---|
|
|
|---|
After routine histologic assessment by a general pathologist as part of his or her daily workload, 119 consecutive colposcopic cervical biopsies from two military medical centers were sent to two subspecialty-trained, geographically remote gynecologic pathologists (GYN I and GYN II) for independent interpretation. The biopsies were not reviewed by a second general pathologist locally. Both GYN I and GYN II were blinded to each others interpretations and to the interpretations of the local general pathologists (GEN). Biopsies were classifed as normal (including cervicitis), LG (including CIN I and human papillomavirus changes), and high grade (HG, including CIN II/III).
The GEN group consisted of 16 general pathologists who had completed a postgraduate training program in general pathology and were assigned to one of two military medical centers. Staff experience since completion of residency varied from less than 5 years to greater than 20 years.
Both GYN I and GYN II had completed 4 years of postgraduate training in obstetrics and gynecology, 2 years of postgraduate training in general pathology, and 1 year of fellowship training in gynecologic pathology. Both were board-certified by the American Board of Obstetrics and Gynecology and by the American Board of Pathology. Since completion of all postgraduate training, GYN I had 8 years, and GYN II had 14 years of experience in gynecologic pathology, working as both staff pathologists and obstetrician-gynecologists. Both were reviewers of gynecologic pathology for multiple multi-institutional research protocols.
The interobserver agreement rates were determined between GEN and GYN I, GEN and GYN II, and GYN I and GYN II. Because only one interpretation was obtained from a general pathologist for any given biopsy, interobserver agreement rates were not measured among the general pathologists. Weighted
statistics with 95% confidence intervals (CIs) were calculated, using the linear disagreement method, to describe the overall interobserver agreement rates for all three biopsy classes combined.15 The weighted
statistic was used to account for disagreements greater than one class.15,16 Unweighted
statistics with 95% CIs were calculated to describe the interobserver agreement rates for each separate biopsy class. The three binary comparisons implicit in this approach included the classification of biopsies as normal versus dysplastic (LG or HG), LG versus non-LG (either normal or HG), and HG versus less than HG. The
values of more than 0.75 represented excellent agreement, between 0.4 and 0.75 moderate agreement, and less than 0.4 poor agreement. The proportions of biopsies assigned to each biopsy class were compared using McNemar test, at a significance level of P < .05. A power analysis was performed to assess adequacy of sample sizes to detect differences greater than 12% at ß = 0.20, using McNemar test with a 0.05 two-sided significance level. Statistical analysis was performed using the SAS System for Windows 8, 19992000 (SAS Institute, Inc., Cary, NC) and SPSS for Windows, Release 10.0.5, 1999 (SPSS, Inc., Chicago, IL).
| RESULTS |
|---|
|
|
|---|
|
= 0.61, 95% CI 0.49, 0.74). The overall interobserver agreement rate between GEN and GYN II was 72% (
= 0.62, 95% CI 0.49, 0.74). The overall interobserver agreement rate between GYN I and GYN II was 76% (
= 0.69, 95% CI 0.58, 0.80).
The interobserver agreement rates, associated
statistics, and McNemar test results for each separate biopsy class are provided in Table 2
. The
values for normal and LG interpretations were in the moderate range, whereas those for HG interpretations were excellent. Between GEN and GYN I, there were no significant differences in the proportion of biopsies classified as normal, LG, or HG. Also, GYN II was significantly more likely than GEN and GYN I to classify biopsies as normal and less likely to classify biopsies as LG. There were no significant differences in the proportion of biopsies classified as HG between GYN II and either GEN or GYN I.
|
| DISCUSSION |
|---|
|
|
|---|
values for the specialists and
values in the poor range for nonspecialists. Cytologic results were then compared with 54 biopsies interpreted by a single gynecologic pathologist. The smear interpretations of the specialists were found to be in agreement with the biopsy grade more often than the interpretations of the nonspecialists. Their study did not, however, assess interobserver agreement rates for biopsy results.
The overall
values in this study, although still in the moderate range, were higher than those reported in most other studies measuring interobserver agreement rates among pathologists.114 Our use of a two-tiered system for biopsy classification instead of the three-tiered CIN system may have contributed to this finding. Our finding of the highest level of agreement for the diagnosis of HG and the lowest level of agreement for LG is consistent with previously published literature.
We did not find a consistency in downgrading of the biopsy interpretations by our study pathologists, as was found in the Atypical Squamous Cells of Undetermined Significance Low-Grade Squamous Intraepithelial Lesion Triage Study (ALTS) trial.13 The ALTS trial reported interobserver agreement rates between the original interpretations given by the clinical center pathologists and the interpretations given by the first quality control group reviewing pathologist. The overall
value for the interpretation of 2237 biopsies was moderate. Significant downgrading by the quality control pathologists was found for the binary comparisons of negative versus
atypical squamous cells of undetermined significance, and
atypical squamous cells of undetermined significance versus
low-grade squamous intraepithelial lesions. For the comparison of
low-grade squamous intraepithelial lesions versus
high-grade squamous intraepithelial lesions, a nonsignificant trend for the clinical center diagnosis to be more severe was noted. The
values for all three binary comparisons were moderate. Because of the large number of biopsies studied, the ALTS trial provides compelling evidence that study pathologists do, as a group, downgrade interpretations given by nonstudy pathologists. Which diagnosis is truly more "correct" is unknown without additional clinical information, such as human papillomavirus status and clinical follow-up. The ALTS trial did not publish data regarding interobserver agreement rates among its four quality control pathologists.
The use of only two subspecialty pathologists in this study limits generalization of these findings to the total population of gynecologic pathologists. The interobserver agreement rates among a larger group of experienced, subspecialty-trained pathologists need to be determined before more valid conclusions can be made. Ideally, the interobserver agreement rates among specialists would be directly compared with the interobserver agreement rates among a group of general pathologists.
This study illustrates the subjectivity inherent in biopsy interpretation for the diagnosis of CIN, despite the extensive training and experience of our gynecologic pathologists. If diagnostic consensus were improved, possible benefits might include a more uniform approach to patient care, fewer follow-up visits for questionable cases of CIN I, and a decrease in financial, time, and emotional expenses for the patient. Our findings, if validated by studies involving multiple subspecialty-trained pathologists, suggest that referral of slides for expert consultation might not be cost-effective. Research projects designing objective devices to interpret or replace biopsies, such as automated machine vision systems17 or fluorescence imaging devices18 could be significantly affected by the subjectivity associated with interpretation of CIN I, unless the diagnostic consensus among study pathologists were enhanced. Published literature suggests that interobserver agreement rates can be improved by use of consensus conferences and practice cases before interpretation of study cases.1,4 These strategies may become more important as objective alternatives to current histopathologic interpretation of CIN are sought.
| Footnotes |
|---|
Received December 5, 2001. Received in revised form March 11, 2002. Accepted March 21, 2002.
| REFERENCES |
|---|
|
|
|---|
2. McCluggage WG, Bharucha H, Caughley LM, Date A, Hamilton PW, Thornton CM, et al. Interobserver variation in the reporting of cervical colposcopic biopsy specimens: Comparison of grading systems. J Clin Pathol 1996; 49:8335.
3. Kato I, Santamaria M, De Ruiz PA, Aristizabal N, Bosch FX, De San Jose S, et al. Inter-observer variation in cytological and histological diagnoses of cervical neoplasia and its epidemiologic implication. J Clin Epidemiol 1995;48: 116774.[Medline]
4. DeVet HCW, Koudstaal J, Kwee W, Willebrand D, Arends JW. Efforts to improve interobserver agreement in histopathological grading. J Clin Epidemiol 1995;48: 86973.[Medline]
5. Creagh T, Bridger JE, Kupek E, Fish DE, Martin-Bates E, Wilkins MJ. Pathologist variation in reporting cervical borderline epithelial abnormalities and cervical intraepithelial neoplasia. J Clin Pathol 1995;48:5960.
6. Genest DR, Stein L, Cibas E, Sheets E, Zitz JC, Crum CP. A binary (Bethesda) system for classifying cervical cancer precursors: Criteria, reproducibility, and viral correlates. Hum Pathol 1993;24:7306.[Medline]
7. DeVet HCW, Knipschild PG, Schouten HJA, Koudstaal J, Kwee W, Willebrand D, et al. Sources of interobserver variation in histopathological grading of cervical dysplasia. J Clin Epidemiol 1992;45:78590.[Medline]
8. DeVet HCW, Knipschild PG, Schouten HJA, Koudstaal J, Kwee W, Willebrand D, et al. Interobserver variation in histopathological grading of cervical dysplasia. J Clin Epidemiol 1990;43:13958.[Medline]
9. Ismail SM, Colclough AB, Dinnen JS, Eakins D, Evans DMD, Gradwell E, et al. Reporting cervical intra-epithelial neoplasia (CIN): Intra- and interpathologist variation and factors associated with disagreement. Histopathology 1990;16:3716.[Medline]
10. Ismail SM, Colclough AB, Dinnen JS, Eakins D, Evans DMD, Gradwell E, et al. Observer variation in histopathological diagnosis and grading of cervical intraepithelial neoplasia. Br Med J 1989;298:70710.
11. Robertson AJ, Anderson JM, Beck JS, Burnett RA, Howatson SR, Lee FD, et al. Observer variability in histopathological reporting of cervical biopsy specimens. J Clin Pathol 1989;42:2318.
12. Bellina JH, Dunlap WP, Riopelle MA. Reliability of histopathologic diagnosis of cervical intraepithelial neoplasia. South Med J 1982;75:68.[Medline]
13. Stoler MH, Schiffman M. Interobserver reproducibility of cervical cytologic and histologic interpretations: Realistic estimates from the ASCUS-LSIL triage study. JAMA 2001;285:15005.
14. OSullivan JP, Ismail SM, Barnes WSF, Deery ARS, Grad-well E, Harvey JA, et al. Interobserver variation in the diagnosis and grading of dyskaryosis in cervical smears: Specialist cytopathologists compared with non-specialists. J Clin Pathol 1994;47:5158.
15. Cicchetti DV, Allison T. A new procedure for assessing reliability of scoring EEG sleep recordings. Am J EEG Technol 1971;11:1019.
16. Fleiss JL. Statistical methods for weights and proportions. 2nd ed. New York: Wiley and Sons, 1981.
17. Keenan SJ, Diamond J, McCluggage WG, Bharucha H, Thompson D, Bartels PH, et al. An automated machine vision system for the histological grading of cervical intra-epithelial neoplasia (CIN). J Pathol 2000;192:35162.[Medline]
18. Parker MF, Mooradian GC, Karins JP, OConnor DM, Speer BA, Owensby PD, et al. Hyperspectral diagnostic imaging of the cervix: Report on a new investigational device. J Lower Genital Tract Dis 2000;4:11924.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |