New to Bar Admissions? The Standards for Educational and Psychological Testing

This article originally appeared in The Bar Examiner print edition, Summer 2022 (Vol. 91, No. 2), pp. 38–41.

A column devoted to exploring topics and providing information especially for those new to bar admissions.By Mengyao Zhang, PhD, and Keyu Chen, PhD

Icon of a magnifying glass with a question mark, and the words: What You Might Like to Know About: Welcome to the third “New to Bar Admissions?” column. The previous two columns briefly explained the role of the American Bar Association’s Section of Legal Education and Admissions to the Bar in bar admissions and the accreditation of law schools,1 and assembled some important testing concepts and terms those in the bar admissions community are likely to encounter in the course of their work.2 Here, we would like to introduce a publication that is well known in the testing industry as the “gold standard in guidance on testing in the United States and worldwide.”3

Many professions have articulated a code of ethics or statements of the norms of their professional practice. The testing industry also has developed a set of guidelines for the development, administration, and scoring of tests, as well as criteria for evaluating tests and testing practices. This set of guidelines is known as the Standards for Educational and Psychological Testing (hereafter the Standards). Below we discuss several basic testing concepts that are covered extensively in the Standards, as well as a bit about its mission and history.

What Are the Standards?

The Standards “provide criteria for the development and evaluation of tests and testing practices and provide guidelines for assessing the validity of interpretations of test scores for the intended test uses.”4 The Standards are not a list of legal requirements but instead criteria and guidelines for establishing sound testing practices that apply to different types of educational and psychological assessments, such as those in clinical psychology, employment, K–12 education, admissions, and licensure and certification.

The Standards were first developed jointly by the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME) in 1966. Several subsequent editions of the Standards were prepared by joint committees representing these sponsoring organizations. The latest edition came out in 2014, which updated the 1999 edition. Revisions incorporate insights and developments from academia and the testing industry by involving a variety of stakeholders on the committee that drafts and deliberates over revisions and the update process overall. For example, Michael T. Kane, PhD, current Samuel J. Messick Chair in Test Validity at Educational Testing Service (ETS) and Director of Research at NCBE from 2001 to 2009, served on the joint committee in charge of developing the 2014 edition of the Standards.

Foundational Concepts

The current Standards consist of three main parts: foundational concepts, test operations, and testing applications in different settings. The Standards are available in open access in English and Spanish.5 We briefly overview the three key foundational concepts contained in the Standards, explain how each concept relates to the bar exam, and suggest several Bar Examiner articles for interested readers, including the one below containing definitions of some terms mentioned in this article.

Recommended Reading:

“New to Bar Admissions? What You Might Like to Know About: Terms Often Used in Reference to the Bar Examination,” 90(2–3) The Bar Examiner 45–48 (Summer/Fall 2021).

Validity

The first and most fundamental issue in testing is validity. Simply put, a well-constructed exam adequately measures the construct it is designed to measure. Regarding the bar exam, “construct” refers to the knowledge, skills, and abilities (KSAs) required for competent entry-level legal practice. How do we know what KSAs are necessary for entry-level practice? For licensure exams, this question is typically answered based on findings from a practice analysis. NCBE conducts practice analyses periodically to collect up-to-date information about job-related activities and KSAs newly licensed lawyers need to effectively carry out their work. The most recent practice analysis NCBE conducted was completed in 2019; numerous insights were uncovered by the analysis and later incorporated into developing the next generation of the bar exam.6

Once the target KSAs are clearly identified, several sources of evidence might be gathered to evaluate the validity of intended interpretations and uses of test scores. For example, a test blueprint is an essential piece of validity evidence that ensures the alignment of the depth and breadth of the content being tested to the KSAs under consideration. It clarifies several test design issues, such as the number and format of test items used to measure each component of the KSAs, as well as the relative “weights” of these items toward an overall assessment of the minimum competence needed to practice law.

Another potential source of validity evidence concerns relationships between test scores and other measures of similar (or dissimilar) skills. For instance, a recent study indicates a stronger positive relationship between bar exam scores and law school grade point averages, followed by Law School Admission Test scores and undergraduate grade point averages.7

Reliability

A prerequisite for valid interpretations of test scores is reliability. When test scores are sufficiently reliable, the scores of a group of examinees will be consistent or stable over multiple (theoretical) occasions of testing unless a real change in examinees’ KSAs has occurred. There are many factors that affect the degree of consistency in the test scores. However, for high-stakes exams like the bar exam, it should be irrelevant to examinees when they take the exam—whether in February or July, and regardless of year. Minimal impact is also expected if examinees take different test forms, or their essays are scored by different graders.

The reliability coefficient is a statistic commonly used to report reliability. The value of the reliability coefficient ranges between 0 and 1, where a higher value indicates higher reliability. The magnitude of the reliability coefficient can be affected by the length of a test. Generally, reliability tends to increase as the length of a test increases, because a longer test is more likely to provide better content coverage, and thus more reliable, consistent information on the KSAs of examinees and more precise scores. For example, the reliability of the Multistate Bar Examination (MBE) is consistently above 0.90, which is quite good for a 200-question exam. The Multistate Professional Responsibility Examination (MPRE) reliabilities have been around 0.80 in recent years, and such values are reasonably good given that it’s a 60-question exam.

Fairness

Another overarching issue in testing is fairness. The Standards indicate that the purpose of fairness is to “maximize, to the extent possible, the opportunity for test takers to demonstrate their standing on the construct(s) the test is intended to measure.”8 Concerns for fairness and equity are threaded throughout the entire testing process, and principles of universal design might be used.9 Universal design is a concept borrowed from the field of architecture; it seeks to improve access to the test for all intended examinees, regardless of their characteristics such as gender, ethnicity, socioeconomic status, culture, or disability, thereby maximizing fairness.

More specifically, in a practice analysis, it is important to gather inputs from a broadly representative group of respondents, and the KSAs should be defined precisely. Test items are written in a way to avoid test content that could be confusing, offensive, or emotionally disturbing to some examinees. On exam day, standardized procedures and security protocols are in place, and the testing environment is reasonably comfortable with minimal distractions. Appropriate test accommodations promote fairness by reducing barriers to the measurement of the target KSAs for examinees with disabilities. Fairness is also addressed when an exam is scored; detailed descriptions and examples of empirical analyses conducted by NCBE are provided in the following Bar Examiner article.

Implications for Bar Admissions

Well-constructed bar exams provide substantial benefits for examinees and those who need to make decisions based upon the test results. The Standards provide a framework for organizing central issues in testing, such as reliability, validity, and fairness, and describe guidelines for evaluating components of test operations, such as test development, administration, scoring, standard setting, and score interpretations.

There are often many individuals and multiple institutions involved in the testing process in addition to examinees. For example, individuals and institutions may include those who develop, publish, administer, and score the exam; those who interpret test results and make decisions based on test results; those who sponsor tests; and those who select/review tests. The bar exam is no exception. An understanding of the foundational concepts highlighted in the Standards—validity, reliability, and fairness—allows all of us to make good decisions about how to develop and use the bar exam, to ensure the bar exam is equally accessible to all intended examinees, regardless of their characteristics that are unrelated to the intended KSAs, and ultimately to protect the public by identifying individuals who are adequately prepared to enter the legal profession.

Notes

“New to Bar Admissions? What You Might Like to Know About: The ABA’s Connection to Bar Admissions,” 90(1) The Bar Examiner 86–88 (Spring 2021), https://thebarexaminer.ncbex.org/article/spring-2021/new-bar-admissions-aba-connections/. (Go back)
“New to Bar Admissions? What You Might Like to Know About: Terms Often Used in Reference to the Bar Examination,” 90(2–3) The Bar Examiner 45–48 (Summer/Fall 2021), https://thebarexaminer.ncbex.org/article/bar-admissions/new-to-bar-admissions/. (Go back)
National Council on Measurement in Education, Testing Standards, https://www.ncme.org/resources-publications/books/testing-standards. (Go back)
American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME), Standards for Educational and Psychological Testing (AERA, 2014), p. 1. (Go back)
The Standards for Educational and Psychological Testing, Open Access Files, https://www.testingstandards.net/open-access-files.html. (Go back)
NextGen Bar Exam of the Future, Phase 2 Report of the Testing Task Force, https://nextgenbarexam.ncbex.org/reports/phase-2-report/. (Go back)
NCBE, “Executive Summary: Impact of Adoption of the Uniform Bar Examination in New York,” https://www.nybarexam.org/UBEReport/NY%20UBE%20Adoption%20Part%201%20Executive%20Summary.pdf. (Go back)
The Standards, supra note 4, p. 50. (Go back)
Sandra J. Thompson, Christopher J. Johnstone, and Martha L. Thurlow, “Synthesis Report 44: Universal Design Applied to Large Scale Assessments” (National Center on Educational Outcomes, June 2002), https://nceo.umn.edu/docs/onlinepubs/synth44.pdf. (Go back)