The Testing Column: Grading the MEE, MPT, and the NextGen Bar Exam: Ensuring Fairness to Candidates

This article originally appeared in The Bar Examiner print edition, Spring 2024 (Vol. 93, No. 1), pp. 69–72.By Wendy Light; Rosemary Reshetar, EdD; Erica Shoemaker series of light-colored connecting puzzle pieces on a blue background. Several pieces have blue icons on them, including a magnifying glass, a checkmark, and a set of scales The development of the NextGen bar exam provides an exciting opportunity to revisit how bar exams are graded, building on testing industry best practices and the deep legal knowledge base represented on NCBE’s bar exam drafting and jurisdictions’ grading teams. Courts, legal educators, and examinees can be confident that grading and scoring of the NextGen bar exam will be rigorous, objective, consistent, and fair.

Adhering to best practices in grading supports score reliability by ensuring that examinees’ pass/fail status would not change if different questions were administered, different graders were used, or examinees sat in a different jurisdiction or with a different examinee group. Well-established research and practices support the move to a national grading model that builds on the strong foundation of jurisdictions’ current grading practices.

Grading the MEE and MPT: Rubrics and Training Workshops

Today, each jurisdiction grades its examinees’ Multistate Essay Examination (MEE) and Multistate Performance Test (MPT) responses; grading for the NextGen exam will continue in that tradition but with several key enhancements. NCBE provides grading guidelines for each of the MEE and MPT questions and in-depth grader training via the MEE/MPT Grading Workshops, held virtually in February and in person in July. Hundreds of graders participate in the workshops, either in real time or by registering for on-demand videos that they watch after the workshops. Workshops consist of a discussion of high-stakes testing principles and grading fundamentals, and hands-on sessions dedicated to preparing participants to grade their jurisdictions’ written responses.

Each jurisdiction is responsible for establishing and maintaining their individual MEE and MPT grading processes and procedures, and the guidance provided in the NCBE grading materials and Grading Workshops supports them in ensuring consistency in scores.1

NextGen Bar Exam Grading

The NextGen bar exam will feature a different mix of question types than is found on the current exam.2 NCBE will continue to score multiple-choice questions; jurisdiction graders will grade constructed-response (aka written-response) questions with the assistance of a new, centralized grading system that integrates industry-wide best practices into the system itself, further ensuring accurate and consistent grading both within and among the jurisdictions.

Each constructed-response question will have a set scale and a clear definition of the knowledge and skills that an examinee must demonstrate to obtain each score level on that scale. Graders will have a scoring guide that includes an item-specific rubric for each question. This guide will also include the questions’ source material, grading notes, and benchmark responses with annotations. Benchmark responses are exemplars that demonstrate the variety of ways an examinee can answer a question and receive points for their response. Annotations explain and address how the constructed-response benchmarks meet or do not meet the rubric criteria and why each benchmark is assigned a particular score.

The most significant change to grading may be the switch from today’s relative grading model, which is applied within each jurisdiction independently, to an objective, absolute grading model applied across all jurisdictions. Both are fair, psychometrically valid approaches to grading,3 but absolute grading rubrics are significantly more straightforward for the grader and more easily understood by the examinee. (See the articles by NextGen field test graders) The advantages to absolute grading will be felt in all jurisdictions, but especially in those medium and large jurisdictions that today are tasked with maintaining relative grading standards across thousands of constructed responses.

NextGen Bar Exam Grading Terms

Adjudication: a process to resolve when two graders’ scores disagree.

Calibration: a short verification administered at the start of grading each question to ensure the grader is prepared to grade.

Double-Grading: the independent grading of every response by two graders, providing accuracy and fairness for examinees.

Scoring Guides: a collection of documents specific to grading each NextGen item that includes:

Source Materials: the file and library each examinee receives for each question.
Rubric: a set of rules that describe the criteria for performance at various levels and are used for grading examinee responses on a constructed-response question.
Grading Notes: examples and/or clarifications that will assist the grader.
Benchmarks with annotations: exemplars that demonstrate the variety of ways an examinee can answer a question.

SecureMarker Grading Platform: a secure platform provided by Surpass Assessment, the testing software vendor for the NextGen bar exam, that contains all scoring materials and responses, houses all the grades assigned, and provides analytical tools.

Validity Responses: responses that are clear representations of each score point that are interspersed into each grader’s assigned set of responses. This tool helps ensure grader scores align with the grading criteria in the rubrics throughout the grading process.

Grading Tools

In addition to enhanced rubrics and grading instructions, NCBE will provide tools that support consistency in grading. The tools include grader training, calibration, embedded validity responses, statistical monitoring, and best-practice double grading of written responses. We are all human, and grader drift may occur during the grading period.4 Using these tools will help combat variation in scores that are not the result of differing examinee performance.

Grader Training

NCBE is developing training modules to introduce graders to the various item types and will provide hands-on grading practice. We have received valuable insight regarding the importance of continuing grader workshops for NextGen grading and are working to evolve these workshops toward the new exam. Our goal is to provide comprehensive training while being mindful of graders’ time commitment. General training resources will be made available to graders throughout the year, allowing them to refresh their skills when they have the need and the time; question-specific modules will be released on a schedule like that for the current exam.

Calibration

Before grading live responses, graders will complete a short calibration verification (5–10 responses) for the item they are assigned in the grading platform. Calibration allows graders to warm up and focus on the relevant grading criteria before grading their jurisdiction’s examinee responses. This calibration activity will be piloted during the upcoming prototype exam, giving jurisdiction graders the opportunity to provide feedback before the first NextGen exam administration in 2026.

Validity Responses

Validity responses are used to help graders and jurisdiction-appointed grading leaders5 ensure that grades do not drift over time. These responses have predetermined grades that are clear representations of each score range for the assigned item and will be interspersed into each grader’s assigned set of responses; graders will not know which responses are validity responses and which are examinee responses from their jurisdiction; both graders and grading leaders will be able use the data from the validity responses to keep grading on track.

Statistical Monitoring

The centralized grading platform will also allow jurisdictions to view grading data in real time across all their graders and responses. This will enable statistical monitoring and quality control throughout the grading window to give jurisdictions the ability to check on grader progress and address any grading inconsistencies quickly and based on real data. This also allows us to share relevant information quickly and consistently between and among jurisdiction grading teams.

Double Grading

As with the current exam, a significant portion of the NextGen exam score will consist of points earned through constructed-response questions, which will be graded by jurisdiction graders.

To deal with human inconsistency and potential grader drift, as well as ensure that grades are not affected by outside factors—for example, grader fatigue, environmental distractions, grading timelines (compressed vs. more extended schedules), or number of examinees—examinee responses will be randomized and then graded by two jurisdiction graders working independently. If the difference between the two grades on a given response falls outside set parameters, the question will be sent for resolution either through discussion and agreement of the two graders or by a jurisdiction-appointed grading leader. There is flexibility in how each jurisdiction will resolve any disagreement about examinees’ responses.

Double grading is designed to enable jurisdictions that require regrading to meaningfully complete that process during the primary grading period, rather than after. As discussed below, our initial research suggests that jurisdictions will be able to double grade in the same amount of time and utilizing the same number of graders they use today.

Ongoing Research and Planning

These new grading procedures are undergoing testing of their own during NextGen field and prototype testing in 2024 and 2025. As part of the January 2024 field test, 61 volunteer graders from 28 jurisdictions graded over 37,000 responses, and NCBE staff is currently analyzing the data. Field test graders completed a survey after their field test experience and a subset of graders also participated in post-grading interviews. This feedback provided detailed insight that has already led to further refinement of the NextGen test development and grading process. (See pages 65 and 66 for two graders’ descriptions of their grading experiences.)

Grading Best Practices: Additional Reading from Educational Testing Services (ETS)

“Best Practices for Constructed-Response Scoring” (2021), available at https://www-vantage-stg-publish.ets.org/pdfs/about/cr_best_practices.pdf.
“ETS Standards for Quality and Fairness” (2015), available at https://www.ets.org/pdfs/about/standards-quality-fairness.pdf.
Catherine A. McClellan, “Constructed-Response Scoring—Doing It Right,” 13 R&D Connections (February 2010) 1–7.
“Guidelines for Constructed-Response and Other Performance Assessments” (2008), available at https://www.ets.org/content/dam/ets-org/pdfs/about/constructed-response-guidelines.pdf.

Timing Data

Graders were asked to track their time reviewing materials and grading responses. This timing data will enable NCBE to give an estimate to jurisdictions of how long it will take to grade the NextGen exam and of how many graders they may need. Based on the submitted timing data, we anticipate that during the first few administrations it will take approximately the same amount of grader hours to double grade the NextGen exam as it does to single grade the MEE and MPT. As we continue to refine our process and transition to the final grading platform, we anticipate this time will be reduced.

Prototype Testing

Prototype testing will take place in October 2024, and additional grader input and data will enable us to fine-tune the grading process further. Jurisdictions that intend to first administer the NextGen exam in July 2026 and those that host prototype exam locations will receive priority as we assemble the prototype grading team.

While some of the grading procedures put in place for the NextGen exam will be familiar to jurisdictions, the use of the new grading platform and grader materials open up opportunities to build on the expertise of the jurisdictions’ grading teams. Robust training, combined with calibration, double grading, validity responses, adjudication, and absolute grading makes a grading system that produces valid and reliable scores without the expense of additional time. Initial research suggests that the new grading protocols will not increase the time dedicated to grading and will improve the rigor and fairness to examinees. NCBE plans to make its staff grading content advisors available to discuss any questions or concerns regarding grading materials throughout the grading period, and with the centralized platform, all jurisdictions will have access to the materials and tools—and the guidance and support for implementation of grading procedures—that ensure fairness to all candidates.

Notes

For the best practices covered by NCBE Grading Workshop facilitators, see Sonja Olsen, “13 Best Practices for Grading Essays and Performance Tests,” 88(4) The Bar Examiner (Winter 2019–2020) 8–14. In addition to the MEE and MPT components, which require human grading, the bar exam includes the multiple-choice Multistate Bar Examination (MBE). NCBE centrally scores the MBE Scantron sheets each examinee completes, with scoring verified by a third-party partner. (Go back)
For the mix of item types on the NextGen bar exam, see https://nextgenbarexam.ncbex.org/nextgen-sample-questions/. (Go back)
See Judith A. Gundersen, “It’s All Relative—MEE and MPT Grading, That Is,” 85(2) The Bar Examiner (June 2016), 37-45. (Go back)
See Zhihan (Helen) Wang, Jiaxin Pei, and Jun Li, “30 Million Canvas Grading Records Reveal Widespread Sequential Bias and System-Induced Surname Initial Disparity” (October 16, 2023), available at SSRN: https://ssrn.com/abstract=4603146. (Go back)
Many, but not all, jurisdictions currently manage the work of their graders through a team of “grading leaders,” which help ensure continuity and consistency in their grading of constructed response answers. In some jurisdictions, this function is performed by Board members. Depending on the jurisdiction, they may be called grading monitors, supervising graders, team leaders, or similar. (Go back)