Obstetrics & Gynecology Track the topics, authors and articles important to you
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Obstetrics & Gynecology 2001;98:412-416
© 2001 by The American College of Obstetricians and Gynecologists
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Goff, B. A.
Right arrow Articles by Mandel, L. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Goff, B. A.
Right arrow Articles by Mandel, L. S.

ORIGINAL RESEARCH

Development of a Bench Station Objective Structured Assessment of Technical Skills

Barbara A. Goff, MD, Gretchen M. Lentz, MD, David Lee, MD, Dee Fenner, MD, Jamie Morris, MD and Lynn S. Mandel, PhD

From the Department of Obstetrics and Gynecology, University of Washington School of Medicine, Seattle, Washington; and University Fertility Consultants, Oregon Health Sciences University, Portland, Oregon.

Address reprint requests to: Barbara A. Goff, MD, Department of Obstetrics and Gynecology, University of Washington School of Medicine, Box 356460, Seattle, WA 98195-6460; E-mail: bgoff{at}u.washington.edu.


    ABSTRACT
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
OBJECTIVE: We have previously shown that objective structured assessment of technical skills performed in an animal model was an innovative, reliable, and valid method of assessing surgical skills. Our goal was to develop a less costly bench station objective structured assessment of technical skills and to evaluate the feasibility, reliability, and validity of this exam.

METHODS: A seven-station examination was administered to 24 residents. The tests included laparoscopic procedures (salpingostomy, intracorporeal knot tying, closure of port sites) and open abdominal procedures (subcuticular closure, bladder neck suspension, repair of enterotomy, abdominal wall closure). All tasks were performed using life-like surgical models. Residents were timed and assessed at each station using three methods of scoring: a task-specific checklist, a global rating scale, and a pass/fail grade.

RESULTS: Assessment of construct validity, the ability of the test to discriminate among residency levels, found significant differences on the checklist, global rating scale, time for procedures, and pass/fail grade by level of training. Reliability indices calculated with Cronbach’s {infty} were 0.77 for the checklists and 0.94 for the global rating scale. Overall interrater reliability indices were 0.91 for the global rating scale and 0.92 for the checklists. Total cost for replaceable parts and facilities was $1900.

CONCLUSION: The less costly and more portable bench station objective structured assessment of technical skills can reliably and validly assess the surgical skills of gynecology residents. This type of examination can be a useful tool to identify residents who need additional surgical instruction, provide remediation, and may become a mechanism to certify surgical skill competence.

Assessment of technical skills is an important part of insuring that we have trained competent surgeons.1 In a recent survey of obstetrics and gynecology residency directors in the United States, we found that most programs rely on subjective faculty assessment of residents’ technical skills.2 That evaluation is usually performed at the end of the rotation and is based on the recollection of events that occurred during that rotation. Very few programs use evaluations that occur while the resident is performing a surgical procedure. Studies that have evaluated retrospective assessments have shown poor reliability and unknown validity. For example, two surgeons evaluating the same resident without structured criteria will typically exhibit a low level of agreement.1 In our survey, we also found that 25% of programs did not even evaluate the surgical skills of their residents.2

To develop a better assessment tool for surgical skills, we designed structured assessment of technical skills for obstetrics and gynecology residents based on the work of others.3–6 In that exam, we had all of the residents in our program perform seven surgical tasks in a pig: four laparoscopic and three open abdominal procedures. We found that objective structured assessment of technical skills can assess residents’ surgical skills with high reliability and validity. Reliability refers to the consistency of the exam, the extent to which results are replicated each time the test is given. Validity is the extent to which the test measures what it is intended to measure. If a resident does well on an objective assessment of technical skills, it should mean that resident is a competent surgeon. In reality, this is a very difficult thing to measure, so we often settle for a proxy measure such as construct validity, which is the ability to distinguish between different levels of training.

Although preliminary results of our surgical skills assessment have been encouraging, the cost ($6000 for 24 residents) and the resources needed (animal surgical facilities, veterinary technician) have led us to pursue testing in surgical models. The purpose of our current study was to develop a less costly bench station, objective structured assessment of technical skills and to evaluate the feasibility, reliability, and validity of this exam.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
We administered a surgical assessment exam to all 24 obstetrics and gynecology residents in our program as previously described.3 Briefly, there were six residents in each postgraduate year, and the examinations were administered during the last two months of the residency year (May/June 2000). Testing was done from 4:00 to 7:00 PM. Six residents were tested each session.

A seven-station examination was developed, which included laparoscopic (linear salpingostomy, intracorporeal knot, port closure) and open abdominal procedures (abdominal wall closure, repair of enterotomy, bladder neck suspension, subcuticular closure). Tasks were selected for a range of difficulty based upon our previous research and included three procedures (linear salpingostomy, intracorporeal knot, repair of enterotomy), which we had evaluated in a pig model.3

Each task was performed in a surgical model purchased from Limbs and Things, Inc. (Bristol, England, www.limbsandthings.com). Because of the need to evaluate laparoscopic skills, the testing was performed in the animal surgical facilities so that we would have access to laparoscopic equipment and cameras. For each procedure, the faculty member grading the resident acts as a qualified assistant but does only what is asked by the resident and provides no input on surgical management. The resident is responsible for choosing the appropriate instruments and suture and directing the assistant. For instruments and suture, appropriate choices as well as distractors are provided. For example, both traumatic and atraumatic forceps are available for closing the bowel. In addition, both permanent and absorbable sutures ranging in size from 4-0 to #1, on a variety of needles (cutting, GI, GS-21, etc) are available to choose from.

At each examination session, there were six faculty members. The faculty members evaluated residents in teams so that we could assess interrater reliability. Faculty members were assigned specific procedures, and the residents rotated through the tasks. The same faculty member evaluated the same task for each resident. A total of nine faculty participated (two gynecologic oncologists, two urogynecologists, two reproductive endocrinologists, and three generalists). Three of the faculty members had participated in the previous study using pigs and the other six volunteered to participate. Each volunteer was given a 15-minute orientation session on how to conduct the exam and how to grade. Each volunteer was paired with a more experienced examiner.

Three evaluation methods were used to score the residents, including a task-specific checklist, a global rating scale, and an overall pass/fail judgment.3 The total possible points for the global rating scale is 35, and the total possible points for the checklists ranged from 26 to 52, depending on the individual task. Residents could fail for two reasons: poor performance or inability to complete the task in 10 minutes. Reason for failure was recorded for each task. A checklist, global rating scale, pass/fail grade, and time to perform procedure were recorded for each task the resident performed.

Statistical analyses were performed using SPSS for Windows, 8.0 (SPSS, Inc., Chicago, IL). Internal consistency of the examination, which is a measure of the reliability of the test, was calculated using Cronbach’s {infty} Interrater reliability was calculated using intraclass correlation coefficients. Construct validity was assessed by analyzing resident performance with a one-way analysis of variance, with residence year as the independent variable. Post hoc contrasts were done with Student-Newman-Keuls test. Pass/fail data were analyzed with nonparametric tests, {chi}2, and Mann-Whitney U tests. Correlation of scores between global and checklist scores was done with Pearson correlation.


    RESULTS
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The mean total scores, time, and pass/fail data are provided in Table 1Go. The Burch and intracorporeal knot had significantly lower scores and pass rates compared with the other procedures. A large number of residents (50%) were unable to complete the abdominal wall closure within a 10-minute period of time, although most were technically able to perform the procedure. The failure rate secondary to time for the other procedures ranges from 0 to 12.5% and is also shown in Table 1Go. We did not find any significant differences among the four examination sessions.


View this table:
[in this window]
[in a new window]
 
Table 1. Mean Total Scores, Time, and Pass Rate for Objective Structured Assessment of Technical Skills*
 
The one-way analysis of variance with Student-Newman-Keuls post hoc test was used to evaluate construct validity, the ability to distinguish between residency levels. The results for the mean total checklist score, global score, time, and pass/fail analysis for each residency level is shown in Table 2Go. All four evaluation methods found significant differences between PGY1,2 and PGY3,4. In addition, each task was evaluated individually for construct validity. Both the global rating scale and skills checklist found significant differences among the resident levels for all tasks except the laparoscopic port closure. The global rating scale was a more effective discriminator of levels than the checklist when we evaluated tasks individually. Time was only a significant discriminator for abdominal wall closure, laparoscopic salpingostomy, and subcuticular closure. Pass/fail analysis was a significant discriminator only for Burch and repair of enterotomy. However, the power of our observations is limited by small numbers as we look at each procedure individually.


View this table:
[in this window]
[in a new window]
 
Table 2. Total Exam Scores for All Seven Tasks per Residency Level*
 
Reliability indices are shown in Table 3Go. Internal consistency was calculated with Cronbach’s {infty}. The overall reliability for the global rating scale was 0.94 and 0.77 for the checklist. Reliability indices for the individual procedures ranged from 0.58 to 0.93. Interrater reliability was evaluated for all tasks and ranged from 0.66 to 0.98. Overall, interrater reliability was 0.91 for the global rating scale and 0.92 for the checklist. Correlations between checklists and global scores were moderately high, ranging from 0.67 to 0.96, which indicates that these two tools are measuring approximately the same thing. This is also a form of validity—concurrent validity—where there is a good correlation between the skills checklist and the global scale, which had been previously validated.6


View this table:
[in this window]
[in a new window]
 
Table 3. Reliability Indices
 
A comparison of scores between Examiner #1 (experienced) and Examiner #2 revealed no significant patterns of grading (higher or lower) based on experience. We did find that there was greater variability in Examiner #2 (less experienced) scores when compared with Examiner #1.

The cost of the examination was documented. We purchased a female trainer (for Burch), two laparoscopic trainers, abdominal wall closure model, and bowel model for approximately $5000 from Limbs and Things (Bristol, England). These life-like models can be reused an indefinite number of times. The replaceable parts included ectopic pregnancies, dissection pad, bladder suspension model, bowel, and abdominal wall, and cost approximately $1100. The ectopic model can only be used once; all others were used to test six residents. In addition, we purchased pigs’ feet for subcuticular closure at a cost of under $10. The cost to use the surgical laboratory was $200 per session. The total cost for supplies and facilities was $1900. Each of the four testing sessions lasted 3 hours and required 4 hours of faculty time. So that interrater reliability could be evaluated, there were a total of six faculty at each session. Total faculty time was 96 hours. The faculty time does not include data entry.


    DISCUSSION
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Research in learning psychomotor skills has identified three important conditions that influence skill learning: contiguity, practice, evaluation, and feedback.7 Contiguity is understanding the proper sequence and appropriate timing of motor response. Practice allows for rehearsing and fixation of the motor responses necessary for completion of the skill. The more closely the conditions of practice approach the conditions under which the skill is usually performed, the more effective the practice will be. The third, and what is thought by some to be the most important factor in acquiring skills, is evaluation and feedback. This step involves more than just telling residents if they have performed a procedure correctly or not. It should be a process of providing learners information about their current performance so that they can improve in the future. Without objective evaluation and feedback, mistakes will go uncorrected, good performance is not reinforced, and clinical competence is difficult to achieve.8 We have found that objective structured assessment of technical skills is an excellent method to evaluate surgical skills and provides a constructive format to give feedback about those skills. In addition, this type of testing allows the residents to assess their own knowledge and technical expertise in an arena where mistakes are permissible.

In our previous study of objective structured assessment of technical skills in a pig model,3 we found the test had overall reliability indices of 0.89 and overall interrater reliability of 0.87. In addition, we found significant differences in score among residency levels for the checklist and global rating scale but not for time or overall pass/fail judgment. These results were very similar to those of Reznick et al at the University of Toronto in an objective skills assessment of general surgery residents.4,5 Although the animals have provided us with a truly life-like model, it would be difficult for many other residencies to duplicate our results secondary to high costs ($250 per resident) and the need for animal surgery facilities and veterinary technicians. In addition, many have ethical issues about using live animals for medical training and research. Because of these concerns, we have developed a bench station objective assessment of technical skills.

The examination we designed used life-like surgical models. Over the past 5 years, there has been a significant improvement in quality and ability to obtain models that replicate many commonly performed gynecologic procedures. Although performing tasks in animals is more realistic, most animals, such as the pig, do not have reproductive anatomy that is similar to humans, and therefore, models for many gynecologic procedures provide a better surgical experience. We evaluated the feasibility of this bench examination format and found that the cost, $79 per resident, was significantly less than the cost required for animal testing although the bench station examination did require an initial $5000 investment to buy the models. Because of our need for laparoscopic towers, the examination was conducted in the animal surgery facilities, which increased the cost of the exam. However, if a camera and video monitor can be obtained, the exam could easily be conducted in a classroom or other nonoperative facility. Another advantage is that the bench examination is completely portable. Recently, we were able to administer our bench station assessment of technical skills at a nearby institution. The faculty time required to administer the bench station exam is not significantly different from the animal exam. Approximately 20–24 hours per six candidates tested are needed. The time commitment could, however, be reduced by eliminating a second examiner.

In addition to finding that the bench station format was more feasible to administer, we found it was equally as reliable as an animal examination. Reliability indices were quite good at 0.94 for the overall global scale and 0.77 for the overall checklist score. Studies have shown that examinations with reliability indices above 0.80 can be used for high-stakes purposes such as certification.5 The interrater reliability in this study was also quite good at 0.92 for the total checklist and 0.91 for the overall global rating scale. This indicates that it does not require a significant amount of time to train examiners, and one examiner per station is probably sufficient.

In our current study, we found that the bench station assessment of surgical skills had significant construct validity with ability to distinguish between resident levels. In this study, we found that the checklists, global rating scale, pass/fail judgment, and time to perform procedure all were significantly different among residency levels when we evaluated the examination as a whole (all seven tasks). This is different from our previous study where only the checklist and global rating scale showed significant differences among the residency levels and may result from having more experienced examiners, or tasks which are better discriminators. However, it is important to point out that time to complete a task is not necessarily a good surrogate for ability. For example, an inexperienced resident can do a procedure quickly but completely wrong. In other studies, which have evaluated time to complete a procedure, results have been mixed with some showing significant improvement in time with increased skill levels and some not.4,9,10

What we have been able to conclude from this study is that the use of bench models for objective structured assessment of technical skills appears to be equivalent to the same type of assessment performed in an animal model. Our results are similar to Martin et al5 who conducted parallel examinations of operative skill, one in live animals and one using a bench station format. These investigators had 20 general surgery residents perform two identical surgical skills examinations on the same day using the different formats. They concluded, as we have, that the use of a bench station format is as reliable and valid as the examination conducted in an animal model but significantly less costly and more easily administered.

Although the results of this study are promising, there is still additional research needed to verify them. This type of examination now needs to be conducted in a significantly larger group of residents to see if the results can be replicated and to clearly establish the reliability and validity of this type of testing. In addition, in future studies, it will be very important for examiners to be blinded to information about the residents. In our current study, all of the examiners had worked with the residents previously which could have biased the results. We are currently working with several other residency programs in the Northwest to perform surgical skills assessment in a completely blinded fashion.

Technical competence is essential for all surgeons and is expected by patients. However, the current certification process for any surgeon does not require "proof" of competency in performing a set of standardized skills before the surgeon is allowed to operate unsupervised. Compare this with the extensive and continually updated certification of commercial pilots, and it seems strange that surgical specialties do not have a formal process to certify surgical competence. If a valid and reliable test of surgical skills can be developed, there are many potential uses in addition to certification. Residents could be tested at the end of each year to be certain that the surgical education we are providing our residents is appropriate. Residents who fall behind could be identified early for additional instruction and additional exposure to cases. Testing can provide residents with the self-confidence that they can do a procedure with no attending input. Finally, an objective assessment of surgical skills allows surgical educators to be confident that we have trained high-quality surgeons.


    Footnotes
 
Supported in part by a grant from the National Board of Medical Examiners (NBME) Medical Education Research Fund Grant. The project does not necessarily reflect NBME policy, and NBME support provides no official endorsement. Supported in part by a grant from United States Surgical Corporation, Norwalk, Connecticut.

PII S0029-7844(01)01473-9

Received December 27, 2000. Received in revised form April 16, 2001. Accepted May 24, 2001.


    REFERENCES
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
1. Reznick RK. Teaching and testing technical skills. Am J Surg 1993;165:358–61.[Medline]

2. Mandel LS, Lentz GM, Goff BA. Teaching and evaluating surgical skills. Obstet Gynecol 2000;95:783–5.[Abstract/Free Full Text]

3. Goff BA, Lentz GM, Lee D, Houmard B, Mandel LS. Development of an objective structured assessment of technical skills for obstetric and gynecology residents. Obstet Gynecol 2000;96:146–50.[Abstract/Free Full Text]

4. Reznick R, Regehr G, MacRae H, Martin J, McCulloch W. Testing technical skill via an innovative "bench station" examination. Am J Surg 1997;173:226–30.[Medline]

5. Martin JA, Regehr G, Reznick R, MacRae H, Murnaghan J, Hutchison C, Brown M. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg 1997;84:273–8.[Medline]

6. Winckel CP, Reznick RK, Cohen R, Taylor B. Reliability and construct validity of a structured technical skills assessment form. Am J Surg 1994;167:423–7.[Medline]

7. Whitman N, Lawrence P. Chapter 5: Teaching procedures. In: Whitman N, Lawrence P. Surgical teaching: Practice makes perfect. Department of Family and Preventative Medicine, University of Utah School of Medicine, 1991:65–79.

8. McLeod PJ, Harden RM. Clinical teaching strategies for physicians. Med Teach 1985;7:173–89.[Medline]

9. Goff BA, Lentz GM, Lee DM, Mandel LS. Formal teaching of surgical skills in an obstetric-gynecology residency. Obstet Gynecol 1999;93:785–90.[Abstract/Free Full Text]

10. Chung JY, Sackier JM. A method of objectively evaluating improvements in laparoscopic skills. Surg Endosc 1998; 12:1111–6.[Medline]




This article has been cited by other articles:


Home page
Obstet GynecolHome page
B. A. Goff
Changing the Paradigm in Surgical Education
Obstet. Gynecol., August 1, 2008; 112(2): 328 - 332.
[Abstract] [Full Text] [PDF]


Home page
American Journal of Medical QualityHome page
A. M. Alleman and A. F. Al-Assaf
Have You Wondered About Your Colleague's Surgical Skills?
American Journal of Medical Quality, March 1, 2005; 20(2): 78 - 82.
[Abstract] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Goff, B. A.
Right arrow Articles by Mandel, L. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Goff, B. A.
Right arrow Articles by Mandel, L. S.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS