The Testing Column: Joining NCBE as the New Director of Assessment and Research

This article originally appeared in The Bar Examiner print edition, Winter 2021-2022 (Vol. 90, No. 4), pp. 23–25.By Rosemary Reshetar, EdD Hand with extended pointer finger extended toward illuminated touchscreen with a radial design; center circle reads "assessment", with several circles containing icons arranged around it. Icons include globe, bar and line graphs, checklist, and magnifying glass

In March 2021, I joined NCBE as its new Director of Assessment and Research. I am pleased to have the opportunity to introduce myself in this issue’s Testing Column.

I join NCBE at what is certainly a momentous time, as it begins developing the next generation of the bar exam. The Assessment and Research Department is engaged in many areas of work pertaining to development and implementation of the new exam—work I am excited to be involved with.

A bit on my background: I graduated from Drexel University with a bachelor’s degree in design and merchandising, and a year later I received a master’s degree in psychological services in education from the University of Pennsylvania. At that time, I knew nothing of careers in psychometrics. But after completing my doctorate in research and evaluation methods at the University of Massachusetts, my first position was as a psychometrician at the American Board of Internal Medicine (ABIM) working on its specialty and subspecialty exams for internal medicine certification.

Since my salad days at ABIM, I’ve worked in psychometrics on current operations, minor and major revisions, redesigns, and change implementation for a great variety of high-stakes testing programs. In addition to internal medicine licensure, these include Advanced Placement (AP) exams, the SAT Suite of Assessments (the SAT and the Preliminary SAT/National Merit Scholarship Qualifying Test), the GRE, the College Level Examination Program (CLEP, which gives students the opportunity to earn college credit by demonstrating knowledge of introductory college-level material), and the Praxis licensure exams (for those seeking to be certified and licensed educators). In addition to individual contributor assignments, I’ve served in management and leadership roles at ETS (the organization that develops and administers the SAT on behalf of College Board) and College Board.

What I’ve found most interesting and motivating in my career, in addition to these roles, is the technical work required to create and implement feasible solutions for exam redesigns and major initiatives, and the challenges and rewards they each bring. Although each program I staffed had been running their current assessment programs for some time, there were reasons for each of them to undertake major revisions and redesigns along the way. Some of the highlights follow.

About 20 years ago, the GRE added an analytical writing section to test critical thinking and analytical writing skills. The measure consists of two separately timed tasks: analyze an issue and analyze an argument. At that time, use of constructed response items of this type—items that require essay responses—was not widespread for large-scale computer-delivered assessments. As a psychometrician I worked closely with the Analytical Writing test developers to ensure that the procedures from pretesting through scoring supported the claim of interchangeable scores regardless of the essay prompts administered.

Around that time, CLEP desired to convert its paper-based on-demand test delivery model to a computer-based one, where CLEP exams would be administered at national test centers located on college and university campuses. A new delivery infrastructure was built, which incorporated changes in the psychometric models used for scoring and a move from an exam delivered as an entire unit to a testlet-based exam (where a packet of test items are administered together). One of the enhancements the change in psychometric models made possible for examinees was that scores would be available immediately after testing.

In 2005, while I worked at ETS, the SAT began including the Writing test—consisting of multiple-choice items and a single essay item—as a third measure on the exam to better reflect the value of clear and effective writing. The total testing time and maximum overall and individual test scores increased; in addition, some changes were made to the other tests. To support the scores’ fairness, reliability, and consistency along the entire score scale for each test, a good deal of test development and psychometric work was needed before the launch of the revised SAT. This included conducting large-scale testing of prototype items, sections, and test forms via demographically diverse examinee sample groups. It was important to ensure that the test was fair across gender and racial/ethnic groups, that it was not speeded (i.e., that examinees’ scores were not affected by the test’s time limitations), and that we had equating models that would support the claim of interchangeable scores regardless of the form administered.

The next major revision to the SAT was slated for 2016, while I was employed at College Board. As in 2005, the changes were large enough to require a good deal of psychometric research and development to support the quality of the reported scores. The new SAT included Math, Reading, and Writing and Language tests. All tests included multiple-choice questions, and the Math test also included student-produced response questions. An optional Essay Test was also made available. The scoring model was changed, and in addition to the total and individual test scores, sub-scores were also added to the score reports, and the enhanced score reports provided feedback for students and teachers. Psychometric efforts supported these changes, again ensuring that the test was fair, not speeded, and that scores across time would be interchangeable.

When I took the SAT 25 years prior, I could not have imagined that someday I would be leading the psychometric team at ETS working on a major revision of the exam—or that 10 years after that I’d be leading the College Board’s psychometric team responsible for transitioning the SAT operational psychometric work in-house and implementing the next major revision of the exam.

Another well-known testing program I staffed that underwent revisions is the AP program. The AP program gives students the opportunity to take college-level courses in high school, and by taking the AP exams, students can earn college credit and placement. The first round of revisions for the AP program was a course and exam redesign effort started in 2005. The redesign process was a collaboration among college faculty, AP teachers, and learning and assessment specialists to support the development of the knowledge and skills students need to succeed in subsequent courses in a given discipline at the college level. In addition to supporting the content changes and ensuring that the new item types and exam designs were reliable and valid, two notable changes in the psychometric procedures were introduced. First, the scoring model was modified; and second, based on psychometric and policy research and recommendations, the method of setting passing scores was changed to a panel-based standard-setting method. Going forward, a standard-setting study would be undertaken at the time a new exam was introduced, and then commonly used equating procedures would be used to maintain the comparability of scores across administrations.

Most recently, the AP program needed to pivot quickly to provide solutions during the COVID-19 pandemic as schools closed in March 2020 when most AP students had completed 75% or more of their AP work and needed an exam score for college credit or placement. Through 2019, AP exams were administered on paper at test sites in high schools. That administration model was not available in 2020. The College Board opted to rapidly develop a digital, remote administration model for the 2020 year only. This led to the introduction of shorter tests, along with many unique forms for each subject. The psychometric work to implement this model included analyses aimed at making the fairest decisions possible about the passing scores across the forms. Building on the work completed for the 2020 remote exam, the psychometric team worked closely with test development, program management, IT, and operations to build the capability for in-school digital test administration for the 2021 AP exams.

When I joined NCBE, I was assured that there would be plenty of opportunities for me to pursue my interest in the technical work involved in creating and implementing solutions for exam redesigns and major initiatives. And indeed, I have found this to be true. As NCBE embarks on development and implementation of the next generation of the bar exam, taking good care with the entirety of the assessment design, research, and psychometric work supports the baseline validity argument for the exam and provides the foundation for NCBE’s exams to remain valid and reliable. I am excited to work with the Assessment and Research Department staff as we achieve this goal.

Portrait Photo of Rosemary Reshetar