|
|
||||||||
ORIGINAL RESEARCH |
From the Department of Obstetrics and Gynecology, University of Washington School of Medicine, Seattle, Washington; and University Fertility Consultants, Oregon Health Sciences University, Portland, Oregon.
Address reprint requests to: Barbara A. Goff, MD, Department of Obstetrics and Gynecology, University of Washington School of Medicine, Box 356460, Seattle, WA 98195-6460; E-mail: bgoff{at}u.washington.edu.
| ABSTRACT |
|---|
|
|
|---|
METHODS: A seven-station examination was administered to 24 residents. The tests included laparoscopic procedures (salpingostomy, intracorporeal knot tying, closure of port sites) and open abdominal procedures (subcuticular closure, bladder neck suspension, repair of enterotomy, abdominal wall closure). All tasks were performed using life-like surgical models. Residents were timed and assessed at each station using three methods of scoring: a task-specific checklist, a global rating scale, and a pass/fail grade.
RESULTS: Assessment of construct validity, the ability of the test to discriminate among residency levels, found significant differences on the checklist, global rating scale, time for procedures, and pass/fail grade by level of training. Reliability indices calculated with Cronbachs
were 0.77 for the checklists and 0.94 for the global rating scale. Overall interrater reliability indices were 0.91 for the global rating scale and 0.92 for the checklists. Total cost for replaceable parts and facilities was $1900.
CONCLUSION: The less costly and more portable bench station objective structured assessment of technical skills can reliably and validly assess the surgical skills of gynecology residents. This type of examination can be a useful tool to identify residents who need additional surgical instruction, provide remediation, and may become a mechanism to certify surgical skill competence.
Assessment of technical skills is an important part of insuring that we have trained competent surgeons.1 In a recent survey of obstetrics and gynecology residency directors in the United States, we found that most programs rely on subjective faculty assessment of residents technical skills.2 That evaluation is usually performed at the end of the rotation and is based on the recollection of events that occurred during that rotation. Very few programs use evaluations that occur while the resident is performing a surgical procedure. Studies that have evaluated retrospective assessments have shown poor reliability and unknown validity. For example, two surgeons evaluating the same resident without structured criteria will typically exhibit a low level of agreement.1 In our survey, we also found that 25% of programs did not even evaluate the surgical skills of their residents.2
To develop a better assessment tool for surgical skills, we designed structured assessment of technical skills for obstetrics and gynecology residents based on the work of others.36 In that exam, we had all of the residents in our program perform seven surgical tasks in a pig: four laparoscopic and three open abdominal procedures. We found that objective structured assessment of technical skills can assess residents surgical skills with high reliability and validity. Reliability refers to the consistency of the exam, the extent to which results are replicated each time the test is given. Validity is the extent to which the test measures what it is intended to measure. If a resident does well on an objective assessment of technical skills, it should mean that resident is a competent surgeon. In reality, this is a very difficult thing to measure, so we often settle for a proxy measure such as construct validity, which is the ability to distinguish between different levels of training.
Although preliminary results of our surgical skills assessment have been encouraging, the cost ($6000 for 24 residents) and the resources needed (animal surgical facilities, veterinary technician) have led us to pursue testing in surgical models. The purpose of our current study was to develop a less costly bench station, objective structured assessment of technical skills and to evaluate the feasibility, reliability, and validity of this exam.
| MATERIALS AND METHODS |
|---|
|
|
|---|
A seven-station examination was developed, which included laparoscopic (linear salpingostomy, intracorporeal knot, port closure) and open abdominal procedures (abdominal wall closure, repair of enterotomy, bladder neck suspension, subcuticular closure). Tasks were selected for a range of difficulty based upon our previous research and included three procedures (linear salpingostomy, intracorporeal knot, repair of enterotomy), which we had evaluated in a pig model.3
Each task was performed in a surgical model purchased from Limbs and Things, Inc. (Bristol, England, www.limbsandthings.com). Because of the need to evaluate laparoscopic skills, the testing was performed in the animal surgical facilities so that we would have access to laparoscopic equipment and cameras. For each procedure, the faculty member grading the resident acts as a qualified assistant but does only what is asked by the resident and provides no input on surgical management. The resident is responsible for choosing the appropriate instruments and suture and directing the assistant. For instruments and suture, appropriate choices as well as distractors are provided. For example, both traumatic and atraumatic forceps are available for closing the bowel. In addition, both permanent and absorbable sutures ranging in size from 4-0 to #1, on a variety of needles (cutting, GI, GS-21, etc) are available to choose from.
At each examination session, there were six faculty members. The faculty members evaluated residents in teams so that we could assess interrater reliability. Faculty members were assigned specific procedures, and the residents rotated through the tasks. The same faculty member evaluated the same task for each resident. A total of nine faculty participated (two gynecologic oncologists, two urogynecologists, two reproductive endocrinologists, and three generalists). Three of the faculty members had participated in the previous study using pigs and the other six volunteered to participate. Each volunteer was given a 15-minute orientation session on how to conduct the exam and how to grade. Each volunteer was paired with a more experienced examiner.
Three evaluation methods were used to score the residents, including a task-specific checklist, a global rating scale, and an overall pass/fail judgment.3 The total possible points for the global rating scale is 35, and the total possible points for the checklists ranged from 26 to 52, depending on the individual task. Residents could fail for two reasons: poor performance or inability to complete the task in 10 minutes. Reason for failure was recorded for each task. A checklist, global rating scale, pass/fail grade, and time to perform procedure were recorded for each task the resident performed.
Statistical analyses were performed using SPSS for Windows, 8.0 (SPSS, Inc., Chicago, IL). Internal consistency of the examination, which is a measure of the reliability of the test, was calculated using Cronbachs
Interrater reliability was calculated using intraclass correlation coefficients. Construct validity was assessed by analyzing resident performance with a one-way analysis of variance, with residence year as the independent variable. Post hoc contrasts were done with Student-Newman-Keuls test. Pass/fail data were analyzed with nonparametric tests,
2, and Mann-Whitney U tests. Correlation of scores between global and checklist scores was done with Pearson correlation.
| RESULTS |
|---|
|
|
|---|
|
|
. The overall reliability for the global rating scale was 0.94 and 0.77 for the checklist. Reliability indices for the individual procedures ranged from 0.58 to 0.93. Interrater reliability was evaluated for all tasks and ranged from 0.66 to 0.98. Overall, interrater reliability was 0.91 for the global rating scale and 0.92 for the checklist. Correlations between checklists and global scores were moderately high, ranging from 0.67 to 0.96, which indicates that these two tools are measuring approximately the same thing. This is also a form of validityconcurrent validitywhere there is a good correlation between the skills checklist and the global scale, which had been previously validated.6
|
The cost of the examination was documented. We purchased a female trainer (for Burch), two laparoscopic trainers, abdominal wall closure model, and bowel model for approximately $5000 from Limbs and Things (Bristol, England). These life-like models can be reused an indefinite number of times. The replaceable parts included ectopic pregnancies, dissection pad, bladder suspension model, bowel, and abdominal wall, and cost approximately $1100. The ectopic model can only be used once; all others were used to test six residents. In addition, we purchased pigs feet for subcuticular closure at a cost of under $10. The cost to use the surgical laboratory was $200 per session. The total cost for supplies and facilities was $1900. Each of the four testing sessions lasted 3 hours and required 4 hours of faculty time. So that interrater reliability could be evaluated, there were a total of six faculty at each session. Total faculty time was 96 hours. The faculty time does not include data entry.
| DISCUSSION |
|---|
|
|
|---|
In our previous study of objective structured assessment of technical skills in a pig model,3 we found the test had overall reliability indices of 0.89 and overall interrater reliability of 0.87. In addition, we found significant differences in score among residency levels for the checklist and global rating scale but not for time or overall pass/fail judgment. These results were very similar to those of Reznick et al at the University of Toronto in an objective skills assessment of general surgery residents.4,5 Although the animals have provided us with a truly life-like model, it would be difficult for many other residencies to duplicate our results secondary to high costs ($250 per resident) and the need for animal surgery facilities and veterinary technicians. In addition, many have ethical issues about using live animals for medical training and research. Because of these concerns, we have developed a bench station objective assessment of technical skills.
The examination we designed used life-like surgical models. Over the past 5 years, there has been a significant improvement in quality and ability to obtain models that replicate many commonly performed gynecologic procedures. Although performing tasks in animals is more realistic, most animals, such as the pig, do not have reproductive anatomy that is similar to humans, and therefore, models for many gynecologic procedures provide a better surgical experience. We evaluated the feasibility of this bench examination format and found that the cost, $79 per resident, was significantly less than the cost required for animal testing although the bench station examination did require an initial $5000 investment to buy the models. Because of our need for laparoscopic towers, the examination was conducted in the animal surgery facilities, which increased the cost of the exam. However, if a camera and video monitor can be obtained, the exam could easily be conducted in a classroom or other nonoperative facility. Another advantage is that the bench examination is completely portable. Recently, we were able to administer our bench station assessment of technical skills at a nearby institution. The faculty time required to administer the bench station exam is not significantly different from the animal exam. Approximately 2024 hours per six candidates tested are needed. The time commitment could, however, be reduced by eliminating a second examiner.
In addition to finding that the bench station format was more feasible to administer, we found it was equally as reliable as an animal examination. Reliability indices were quite good at 0.94 for the overall global scale and 0.77 for the overall checklist score. Studies have shown that examinations with reliability indices above 0.80 can be used for high-stakes purposes such as certification.5 The interrater reliability in this study was also quite good at 0.92 for the total checklist and 0.91 for the overall global rating scale. This indicates that it does not require a significant amount of time to train examiners, and one examiner per station is probably sufficient.
In our current study, we found that the bench station assessment of surgical skills had significant construct validity with ability to distinguish between resident levels. In this study, we found that the checklists, global rating scale, pass/fail judgment, and time to perform procedure all were significantly different among residency levels when we evaluated the examination as a whole (all seven tasks). This is different from our previous study where only the checklist and global rating scale showed significant differences among the residency levels and may result from having more experienced examiners, or tasks which are better discriminators. However, it is important to point out that time to complete a task is not necessarily a good surrogate for ability. For example, an inexperienced resident can do a procedure quickly but completely wrong. In other studies, which have evaluated time to complete a procedure, results have been mixed with some showing significant improvement in time with increased skill levels and some not.4,9,10
What we have been able to conclude from this study is that the use of bench models for objective structured assessment of technical skills appears to be equivalent to the same type of assessment performed in an animal model. Our results are similar to Martin et al5 who conducted parallel examinations of operative skill, one in live animals and one using a bench station format. These investigators had 20 general surgery residents perform two identical surgical skills examinations on the same day using the different formats. They concluded, as we have, that the use of a bench station format is as reliable and valid as the examination conducted in an animal model but significantly less costly and more easily administered.
Although the results of this study are promising, there is still additional research needed to verify them. This type of examination now needs to be conducted in a significantly larger group of residents to see if the results can be replicated and to clearly establish the reliability and validity of this type of testing. In addition, in future studies, it will be very important for examiners to be blinded to information about the residents. In our current study, all of the examiners had worked with the residents previously which could have biased the results. We are currently working with several other residency programs in the Northwest to perform surgical skills assessment in a completely blinded fashion.
Technical competence is essential for all surgeons and is expected by patients. However, the current certification process for any surgeon does not require "proof" of competency in performing a set of standardized skills before the surgeon is allowed to operate unsupervised. Compare this with the extensive and continually updated certification of commercial pilots, and it seems strange that surgical specialties do not have a formal process to certify surgical competence. If a valid and reliable test of surgical skills can be developed, there are many potential uses in addition to certification. Residents could be tested at the end of each year to be certain that the surgical education we are providing our residents is appropriate. Residents who fall behind could be identified early for additional instruction and additional exposure to cases. Testing can provide residents with the self-confidence that they can do a procedure with no attending input. Finally, an objective assessment of surgical skills allows surgical educators to be confident that we have trained high-quality surgeons.
| Footnotes |
|---|
Received December 27, 2000. Received in revised form April 16, 2001. Accepted May 24, 2001.
| REFERENCES |
|---|
|
|
|---|
2. Mandel LS, Lentz GM, Goff BA. Teaching and evaluating surgical skills. Obstet Gynecol 2000;95:7835.
3. Goff BA, Lentz GM, Lee D, Houmard B, Mandel LS. Development of an objective structured assessment of technical skills for obstetric and gynecology residents. Obstet Gynecol 2000;96:14650.
4. Reznick R, Regehr G, MacRae H, Martin J, McCulloch W. Testing technical skill via an innovative "bench station" examination. Am J Surg 1997;173:22630.[Medline]
5. Martin JA, Regehr G, Reznick R, MacRae H, Murnaghan J, Hutchison C, Brown M. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg 1997;84:2738.[Medline]
6. Winckel CP, Reznick RK, Cohen R, Taylor B. Reliability and construct validity of a structured technical skills assessment form. Am J Surg 1994;167:4237.[Medline]
7. Whitman N, Lawrence P. Chapter 5: Teaching procedures. In: Whitman N, Lawrence P. Surgical teaching: Practice makes perfect. Department of Family and Preventative Medicine, University of Utah School of Medicine, 1991:6579.
8. McLeod PJ, Harden RM. Clinical teaching strategies for physicians. Med Teach 1985;7:17389.[Medline]
9. Goff BA, Lentz GM, Lee DM, Mandel LS. Formal teaching of surgical skills in an obstetric-gynecology residency. Obstet Gynecol 1999;93:78590.
10. Chung JY, Sackier JM. A method of objectively evaluating improvements in laparoscopic skills. Surg Endosc 1998; 12:11116.[Medline]
This article has been cited by other articles:
![]() |
B. A. Goff Changing the Paradigm in Surgical Education Obstet. Gynecol., August 1, 2008; 112(2): 328 - 332. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Alleman and A. F. Al-Assaf Have You Wondered About Your Colleague's Surgical Skills? American Journal of Medical Quality, March 1, 2005; 20(2): 78 - 82. [Abstract] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |