Obstetrics & Gynecology Email Alerts
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Obstetrics & Gynecology 2002;100:277-280
© 2002 by The American College of Obstetricians and Gynecologists
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Parker, M. F.
Right arrow Articles by O’Connor, D. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Parker, M. F.
Right arrow Articles by O’Connor, D. M.

ORIGINAL RESEARCH

Discrepancy in the Interpretation of Cervical Histology by Gynecologic Pathologists

Mary F. Parker, MD, Christopher M. Zahn, MD, Kristina M. Vogel, MD, Cara H. Olsen, MS, Kunio Miyazawa, MD and Dennis M. O’Connor, MD

From the Department of Obstetrics and Gynecology and Department of Preventive Medicine and Biometrics, Uniformed Services University of the Health Sciences, Bethesda, Maryland; Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Walter Reed Army Medical Center, Washington, DC; Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Tripler Army Medical Center, Honolulu, Hawaii; and Clinical Pathology Associates, Louisville, Kentucky.

Address reprint requests to: Mary F. Parker, MD, 3 English Ivy Court, Rockville, MD 20854; E-mail: parker{at}tatrc.org.


    ABSTRACT
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
OBJECTIVE: To determine if subspecialty review of cervical histology improves diagnostic consensus of cervical intra-epithelial neoplasia (CIN).

METHODS: After routine histologic assessment within the hospital pathology department, 119 colposcopic cervical biopsies were interpreted by two subspecialty-trained gynecologic pathologists (GYN I and GYN II) blinded to each other’s interpretations and to the interpretations of the hospital general pathologists (GEN). Biopsies were classified as normal (including cervicitis), low grade (LG, including CIN I and human papillomavirus changes), and high grade (HG, including CIN II/III). The interobserver agreement rates between GEN and GYN I, between GEN and GYN II, and between GYN I and GYN II were described using the {kappa} statistic. The proportions of biopsies assigned to each biopsy class were compared using McNemar test.

RESULTS: Interobserver agreement rates between GEN and GYN I were moderate for normal ({kappa} = 0.53) and LG ({kappa} = 0.46) and excellent for HG ({kappa} = 0.76). There were no significant differences in the classifications between GEN and GYN I. Interobserver agreement rates between GEN and GYN II were moderate for normal ({kappa} = 0.50) and LG ({kappa} = 0.44) and excellent for HG ({kappa} = 0.84). Also, GYN II was significantly more likely to classify biopsies as normal (P < .001) and less likely to classify biopsies as LG (P < .001). The interobserver agreement rates between GYN I and GYN II were moderate for normal ({kappa} = 0.61) and LG ({kappa} = 0.41) and excellent for HG ({kappa} = 0.84). Also, GYN II was significantly more likely to classify biopsies as normal (P < .001) and less likely to classify biopsies as LG (P = .01).

CONCLUSION: Interobserver agreement between two gynecologic pathologists was no better than that observed between general and gynecologic pathologists. Subspecialty review of cervical histology does not enhance diagnostic consensus of CIN.

Cervical biopsy slides from research protocols involving cervical intraepithelial neoplasia (CIN) are frequently sent to a panel of pathologists or a designated study pathologist for review to confirm or refute the initial diagnosis. An underlying assumption of establishing central pathologic review is that consensus of diagnosis will be enhanced when interpretation is rendered by a group of experienced or subspecialty-trained pathologists.

Available literature suggests the overall interobserver agreement rate for the interpretation of CIN among experienced general pathologists or among subspecialty-trained pathologists is moderate at best. Although good-to-excellent agreement is found for CIN III and invasive cancer, poor agreement is noted for low-grade (LG) categories such as borderline atypia, human papillomavirus changes, and CIN I.1–12 The overall agreement rates between reviewing panels, or a study pathologist, and general pathologists are also moderate, with the greatest degree of discrepancy again found for LG disease.13,14 There is little published information directly comparing the agreement rates among subspecialty-trained pathologists with those among general pathologists.14

We sought to determine if subspecialty review of cervical histology by gynecologic pathologists improves diagnostic consensus of CIN. We planned to measure and compare the interobserver agreement rates between two experienced, dually board-certified gynecologic pathologists and between each of the gynecologic pathologists and a group of general pathologists. The slides used for this study were obtained from a project in which a device using fluorescence spectroscopy is being developed to diagnose CIN. Biopsies were obtained from areas suspected to be normal as well as from colposcopic lesions suggestive of CIN.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The Institutional Review Committees at Walter Reed Army Medical Center, Washington, DC, and Tripler Army Medical Center, Honolulu, Hawaii, granted approval for initiation of the study from which these data were collected.

After routine histologic assessment by a general pathologist as part of his or her daily workload, 119 consecutive colposcopic cervical biopsies from two military medical centers were sent to two subspecialty-trained, geographically remote gynecologic pathologists (GYN I and GYN II) for independent interpretation. The biopsies were not reviewed by a second general pathologist locally. Both GYN I and GYN II were blinded to each other’s interpretations and to the interpretations of the local general pathologists (GEN). Biopsies were classifed as normal (including cervicitis), LG (including CIN I and human papillomavirus changes), and high grade (HG, including CIN II/III).

The GEN group consisted of 16 general pathologists who had completed a postgraduate training program in general pathology and were assigned to one of two military medical centers. Staff experience since completion of residency varied from less than 5 years to greater than 20 years.

Both GYN I and GYN II had completed 4 years of postgraduate training in obstetrics and gynecology, 2 years of postgraduate training in general pathology, and 1 year of fellowship training in gynecologic pathology. Both were board-certified by the American Board of Obstetrics and Gynecology and by the American Board of Pathology. Since completion of all postgraduate training, GYN I had 8 years, and GYN II had 14 years of experience in gynecologic pathology, working as both staff pathologists and obstetrician-gynecologists. Both were reviewers of gynecologic pathology for multiple multi-institutional research protocols.

The interobserver agreement rates were determined between GEN and GYN I, GEN and GYN II, and GYN I and GYN II. Because only one interpretation was obtained from a general pathologist for any given biopsy, interobserver agreement rates were not measured among the general pathologists. Weighted {kappa} statistics with 95% confidence intervals (CIs) were calculated, using the linear disagreement method, to describe the overall interobserver agreement rates for all three biopsy classes combined.15 The weighted {kappa} statistic was used to account for disagreements greater than one class.15,16 Unweighted {kappa} statistics with 95% CIs were calculated to describe the interobserver agreement rates for each separate biopsy class. The three binary comparisons implicit in this approach included the classification of biopsies as normal versus dysplastic (LG or HG), LG versus non-LG (either normal or HG), and HG versus less than HG. The {kappa} values of more than 0.75 represented excellent agreement, between 0.4 and 0.75 moderate agreement, and less than 0.4 poor agreement. The proportions of biopsies assigned to each biopsy class were compared using McNemar test, at a significance level of P < .05. A power analysis was performed to assess adequacy of sample sizes to detect differences greater than 12% at ß = 0.20, using McNemar test with a 0.05 two-sided significance level. Statistical analysis was performed using the SAS System for Windows 8, 1999–2000 (SAS Institute, Inc., Cary, NC) and SPSS for Windows, Release 10.0.5, 1999 (SPSS, Inc., Chicago, IL).


    RESULTS
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Listed in Table 1Go are the biopsies classified in each category by each pathologist. The 119 biopsies were obtained from a total of 70 women, some of whom had more than one biopsy performed.


View this table:
[in this window]
[in a new window]
 
Table 1. Biopsy Classifications for Each Pathologist*
 
The overall interobserver agreement rate between GEN and GYN I was 72% ({kappa} = 0.61, 95% CI 0.49, 0.74). The overall interobserver agreement rate between GEN and GYN II was 72% ({kappa} = 0.62, 95% CI 0.49, 0.74). The overall interobserver agreement rate between GYN I and GYN II was 76% ({kappa} = 0.69, 95% CI 0.58, 0.80).

The interobserver agreement rates, associated {kappa} statistics, and McNemar test results for each separate biopsy class are provided in Table 2Go. The {kappa} values for normal and LG interpretations were in the moderate range, whereas those for HG interpretations were excellent. Between GEN and GYN I, there were no significant differences in the proportion of biopsies classified as normal, LG, or HG. Also, GYN II was significantly more likely than GEN and GYN I to classify biopsies as normal and less likely to classify biopsies as LG. There were no significant differences in the proportion of biopsies classified as HG between GYN II and either GEN or GYN I.


View this table:
[in this window]
[in a new window]
 
Table 2. Interobserver Comparison for Each Biopsy Class
 
Power analysis demonstrated an adequate number in each biopsy class to detect differences greater than 12% with 80% power, using McNemar test with a .05 two-sided significance level.


    DISCUSSION
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
This study attempts to assess whether or not extensive training and experience in gynecologic pathology improves diagnostic consensus for the histologic interpretation of CIN. The agreement rates between the two gynecologic pathologists in this study were not significantly higher than those found between either of the gynecologic pathologists and a group of general pathologists. In fact, one of the gynecologic pathologists demonstrated greater diagnostic consensus with the group of general pathologists for the diagnosis of LG and nondysplastic biopsies than with the other gynecologic pathologist. Although not directly comparable, these findings contrast with those of O’Sullivan et al,14 who compared interobserver rates of agreement among specialists versus those among nonspecialists in the interpretation of 110 Papanicolaou smears. They reported moderate {kappa} values for the specialists and {kappa} values in the poor range for nonspecialists. Cytologic results were then compared with 54 biopsies interpreted by a single gynecologic pathologist. The smear interpretations of the specialists were found to be in agreement with the biopsy grade more often than the interpretations of the nonspecialists. Their study did not, however, assess interobserver agreement rates for biopsy results.

The overall {kappa} values in this study, although still in the moderate range, were higher than those reported in most other studies measuring interobserver agreement rates among pathologists.1–14 Our use of a two-tiered system for biopsy classification instead of the three-tiered CIN system may have contributed to this finding. Our finding of the highest level of agreement for the diagnosis of HG and the lowest level of agreement for LG is consistent with previously published literature.

We did not find a consistency in downgrading of the biopsy interpretations by our study pathologists, as was found in the Atypical Squamous Cells of Undetermined Significance Low-Grade Squamous Intraepithelial Lesion Triage Study (ALTS) trial.13 The ALTS trial reported interobserver agreement rates between the original interpretations given by the clinical center pathologists and the interpretations given by the first quality control group reviewing pathologist. The overall {kappa} value for the interpretation of 2237 biopsies was moderate. Significant downgrading by the quality control pathologists was found for the binary comparisons of negative versus >=atypical squamous cells of undetermined significance, and <=atypical squamous cells of undetermined significance versus >=low-grade squamous intraepithelial lesions. For the comparison of <=low-grade squamous intraepithelial lesions versus >=high-grade squamous intraepithelial lesions, a nonsignificant trend for the clinical center diagnosis to be more severe was noted. The {kappa} values for all three binary comparisons were moderate. Because of the large number of biopsies studied, the ALTS trial provides compelling evidence that study pathologists do, as a group, downgrade interpretations given by nonstudy pathologists. Which diagnosis is truly more "correct" is unknown without additional clinical information, such as human papillomavirus status and clinical follow-up. The ALTS trial did not publish data regarding interobserver agreement rates among its four quality control pathologists.

The use of only two subspecialty pathologists in this study limits generalization of these findings to the total population of gynecologic pathologists. The interobserver agreement rates among a larger group of experienced, subspecialty-trained pathologists need to be determined before more valid conclusions can be made. Ideally, the interobserver agreement rates among specialists would be directly compared with the interobserver agreement rates among a group of general pathologists.

This study illustrates the subjectivity inherent in biopsy interpretation for the diagnosis of CIN, despite the extensive training and experience of our gynecologic pathologists. If diagnostic consensus were improved, possible benefits might include a more uniform approach to patient care, fewer follow-up visits for questionable cases of CIN I, and a decrease in financial, time, and emotional expenses for the patient. Our findings, if validated by studies involving multiple subspecialty-trained pathologists, suggest that referral of slides for expert consultation might not be cost-effective. Research projects designing objective devices to interpret or replace biopsies, such as automated machine vision systems17 or fluorescence imaging devices18 could be significantly affected by the subjectivity associated with interpretation of CIN I, unless the diagnostic consensus among study pathologists were enhanced. Published literature suggests that interobserver agreement rates can be improved by use of consensus conferences and practice cases before interpretation of study cases.1,4 These strategies may become more important as objective alternatives to current histopathologic interpretation of CIN are sought.


    Footnotes
 
The opinions or assertions contained herein are the private views of the authors and are not to be construed as official or as reflecting the views of the Department of the Army or the Department of Defense.

PII S0029-7844(02)02058-6

Received December 5, 2001. Received in revised form March 11, 2002. Accepted March 21, 2002.


    REFERENCES
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
1. McCluggage WG, Walsh MY, Thornton CM, Hamilton PW, Date A, Caughley LM, et al. Inter- and intra-observer variation in the histopathological reporting of cervical squamous intraepithelial lesions using a modified Bethesda grading system. Br J Obstet Gynaecol 1998;105:206–10.[Medline]

2. McCluggage WG, Bharucha H, Caughley LM, Date A, Hamilton PW, Thornton CM, et al. Interobserver variation in the reporting of cervical colposcopic biopsy specimens: Comparison of grading systems. J Clin Pathol 1996; 49:833–5.[Abstract/Free Full Text]

3. Kato I, Santamaria M, De Ruiz PA, Aristizabal N, Bosch FX, De San Jose S, et al. Inter-observer variation in cytological and histological diagnoses of cervical neoplasia and its epidemiologic implication. J Clin Epidemiol 1995;48: 1167–74.[Medline]

4. DeVet HCW, Koudstaal J, Kwee W, Willebrand D, Arends JW. Efforts to improve interobserver agreement in histopathological grading. J Clin Epidemiol 1995;48: 869–73.[Medline]

5. Creagh T, Bridger JE, Kupek E, Fish DE, Martin-Bates E, Wilkins MJ. Pathologist variation in reporting cervical borderline epithelial abnormalities and cervical intraepithelial neoplasia. J Clin Pathol 1995;48:59–60.[Abstract/Free Full Text]

6. Genest DR, Stein L, Cibas E, Sheets E, Zitz JC, Crum CP. A binary (Bethesda) system for classifying cervical cancer precursors: Criteria, reproducibility, and viral correlates. Hum Pathol 1993;24:730–6.[Medline]

7. DeVet HCW, Knipschild PG, Schouten HJA, Koudstaal J, Kwee W, Willebrand D, et al. Sources of interobserver variation in histopathological grading of cervical dysplasia. J Clin Epidemiol 1992;45:785–90.[Medline]

8. DeVet HCW, Knipschild PG, Schouten HJA, Koudstaal J, Kwee W, Willebrand D, et al. Interobserver variation in histopathological grading of cervical dysplasia. J Clin Epidemiol 1990;43:1395–8.[Medline]

9. Ismail SM, Colclough AB, Dinnen JS, Eakins D, Evans DMD, Gradwell E, et al. Reporting cervical intra-epithelial neoplasia (CIN): Intra- and interpathologist variation and factors associated with disagreement. Histopathology 1990;16:371–6.[Medline]

10. Ismail SM, Colclough AB, Dinnen JS, Eakins D, Evans DMD, Gradwell E, et al. Observer variation in histopathological diagnosis and grading of cervical intraepithelial neoplasia. Br Med J 1989;298:707–10.

11. Robertson AJ, Anderson JM, Beck JS, Burnett RA, Howatson SR, Lee FD, et al. Observer variability in histopathological reporting of cervical biopsy specimens. J Clin Pathol 1989;42:231–8.[Abstract/Free Full Text]

12. Bellina JH, Dunlap WP, Riopelle MA. Reliability of histopathologic diagnosis of cervical intraepithelial neoplasia. South Med J 1982;75:6–8.[Medline]

13. Stoler MH, Schiffman M. Interobserver reproducibility of cervical cytologic and histologic interpretations: Realistic estimates from the ASCUS-LSIL triage study. JAMA 2001;285:1500–5.[Abstract/Free Full Text]

14. O’Sullivan JP, Ismail SM, Barnes WSF, Deery ARS, Grad-well E, Harvey JA, et al. Interobserver variation in the diagnosis and grading of dyskaryosis in cervical smears: Specialist cytopathologists compared with non-specialists. J Clin Pathol 1994;47:515–8.[Abstract/Free Full Text]

15. Cicchetti DV, Allison T. A new procedure for assessing reliability of scoring EEG sleep recordings. Am J EEG Technol 1971;11:101–9.

16. Fleiss JL. Statistical methods for weights and proportions. 2nd ed. New York: Wiley and Sons, 1981.

17. Keenan SJ, Diamond J, McCluggage WG, Bharucha H, Thompson D, Bartels PH, et al. An automated machine vision system for the histological grading of cervical intra-epithelial neoplasia (CIN). J Pathol 2000;192:351–62.[Medline]

18. Parker MF, Mooradian GC, Karins JP, O’Connor DM, Speer BA, Owensby PD, et al. Hyperspectral diagnostic imaging of the cervix: Report on a new investigational device. J Lower Genital Tract Dis 2000;4:119–24.





This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Parker, M. F.
Right arrow Articles by O’Connor, D. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Parker, M. F.
Right arrow Articles by O’Connor, D. M.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS