This article originally appeared in The Bar Examiner print edition, Summer 2023 (Vol. 92, No. 2), pp. 32–34.By Andreas Oranje, PhD, MBAheader of a closeup of a circuit board; atop the central component is an icon of a head in profile with a thought bubble reading ‘AI’OpenAI, maker of the artificial intelligence (AI) chatbot ChatGPT, recently made a big claim. It announced that its latest GPT 4.0 model was able to obtain a score on the Multistate Bar Examination in the 90th percentile.1 This was up from a score in the 10th percentile achieved with GPT 3.0, the large language model that has been used underneath the freely accessible version of ChatGPT. Frankly, I was surprised and disappointed that it could not get a perfect score. I will tell you why and, in the process, unpack what generative AI is all about and how it might affect the legal profession generally, and the bar exam specifically.

How Does AI Generally Operate?

AI is essentially a three-step process:

  1. Obtain a lot of data and calculate relationships between all the variables in the data. This is a big part of the machine learning in AI. For example, suppose you have a large set of divorce settlements with different characteristics (assets, prenuptials, children, pets, employment, location, age, etc.) and outcomes (distribution of assets, custody, future awards). You then use that data to statistically model all the relationships between those characteristics and outcomes.
  2. Use that model to make predictions about new situations for which we do not yet know the outcomes. In the same example, we would likely predict that two comparably employed Californians in their early 30s without children and who have about $10,000 in assets would each be awarded $5,000 with no further obligations in the divorce settlement.
  3. Use the predictions from the second step to automatically make and execute decisions, often incorporated in a software application with little to no human interference. Continuing the example, there could be a phone app where both spouses enter their information, and the judge is an algorithm that automatically creates and distributes all necessary documentation in a matter of seconds.

Points to Keep in Mind

There are some immediately apparent points in this process:

  1. The data used to train complex models ideally comes from many expert judgments. In other words, AI is nothing more than combining and consulting a lot of expertise in a statistically consistent manner. Therefore, if you had access to all the judicial expertise in the world, shouldn’t you be able to figure out all the answers to questions on a bar exam?
  2. Expert data is not always used, and this can create problems. When more general data is used, as is the case with ChatGPT, then you are also relying on lots of nonexperts who provide inaccurate information. And, therefore, your models will include lots of inaccuracies. Large models are fun to work with, but not necessarily great for accuracy. They reflect the data used, both the good and the bad parts, including when the data is collected from only a segment of the population (e.g., frequent users of the internet). Thus, a significant issue in AI is ensuring that the model both contains the expert knowledge and connects your problem appropriately with the experts who provide that information. And very often, small, targeted models based on specific data work better for a specific task than large language models.
  3. AI models are historical and reactive. The training data were collected at some point in the past under likely somewhat different circumstances than those in which they are used to predict something. When the environment changes rapidly, these models become quickly unpredictive. There are mechanisms to continuously update and retrain models, but you are always at least one step behind.
  4. AI models are approximate. Even with datasets that are many terabytes in size, there are relationships that have not been observed directly and about which we want to make a decision. AI models explicitly make inferences about unobserved relationships between variables, and this creates fertile ground for bias and inaccuracies.
  5. Ownership of and responsibility for AI-based decisions is difficult to determine. First, the data often comes from many sources that each probably own their data. Second, the algorithms are likely built in general-use platforms that belong to the platform creator. Third, the way the algorithm and data are combined belongs to yet someone else (e.g., app creator, decision maker). The question then becomes: Who is responsible when something goes wrong?
  6. Data privacy is not a given. If the ownership of the models and the decisions that are derived from them is unclear, then what about the protection of the data that feeds into these models? Even without explicitly identifying variables, such as, for example, name, date of birth, and social security number, the overall data is vast enough to trace back to specific individuals. Upon realizing this, many companies are instituting policies that prohibit employees from entering any company proprietary information into publicly available AI engines. Similar to using a free email service, you are paying with your data and, eventually, the ability of the provider to make money through targeted advertising. In short, you are paying for using ChatGPT or Gmail by buying that 15th pair of shoes that you absolutely didn’t need but were advertised to you.

AI as the Next Evolution in Connecting People with Information

Another way of thinking about AI is the democratization of access to expertise. Books, libraries, and education were the first steps to more effectively connect people, ideas, and knowledge. The internet was the next evolutionary step in these connections, alongside other uses (e.g., marketplaces). AI is just another evolution in the process of connecting and making it easier to access and use information. This access to and combining of expertise often leads to new insights and uses to respond to human needs. To be clear, AI may be generative in what it produces or helps produce. It does not, however, create a need independently. That is still the sole realm of humans.

Evaluative and Generative AI Use in Relation to the Bar Exam

AI is currently being used more widely than most people realize. From phone and airport security (face or thumbprint recognition) to marketing and advertising, AI is an integral part of our daily lives. There are several ways that AI either already exists within or will make its way into the legal profession and the bar exam. There are of course well-known identification uses by law enforcement and more general uses of AI to market and advertise law practices. However, the examples below are more specific. One way to look at this is by considering evaluative and generative uses.

Evaluative uses are applications where AI is used to evaluate large amounts of information, extract features from data, or perform a selection, often in tandem with natural language processing. For example:

  • Automatic access and review of a body of cases and statutes to evaluate it for use as precedent in an argument.
  • Evaluation of the language of procedural documents or arguments to find (and correct) errors, inaccuracies, or weaknesses.
  • Pattern extraction of large data sources that are material to a case (e.g., stock or real estate transactions, traffic patterns, surveillance footage).

Generative uses are applications where AI is used to create new artifacts. For example:

  • Draft procedural documents.
  • Develop arguments to build a case.
  • Create depictions or images required as part of an ­investigation or case.

For the bar exam, potential uses are related to these evaluative and generative applications of AI. For example, AI could aid in the scoring of case reports, essays, or other constructed response–type tasks on the exam. In some cases, there are enough expert judgments collected so that the remainder of responses can be consistently, accurately, and fairly evaluated by a comprehensive scoring model. In other cases, the model can accurately score clear-cut responses, while it routes edge cases to a human evaluator.

Another example would be in the generation of test material. AI models could be used to draft tasks or to build models from which multiple task variants can be generated. Auxiliary materials, such as case setups or supporting graphical materials, could also be generated quickly, cheaply, and without needing to obtain copyrights. In fact, this could open the door to expansion of test forms and, therefore, additional administrations and greater convenience for examinees. In the long run, one can imagine a fully personalized experience for each examinee based on generative AI, where the learning and assessment aspects become seamless and interchangeable.

Ethical Use of AI and Human Collaboration

As with all AI, there will have to be a solid framework for ethical use in place, including data and algorithm governance. Fortunately, several solid ones have been released recently2 that provide a great foundation. Also, there are general questions anyone should ask about the use of AI in education and assessment.3

Lastly, in high-stakes decision making, it would be entirely inadvisable to not have a human (or humans) in the loop at critical junctures to ensure that the computer is not running away. As such, it is all about optimizing the collaboration between humans and computers, where computers are taking care of the things that computers are good at: mundane and repetitive tasks, mass information processing, and analysis and representation of large amounts of data. Humans, on the other hand, do what they do best: judgment of new situations, creative problem-solving and investigation, and differentially weighting uncommon circumstances.4

Notes

  1. Recent research has questioned OpenAI’s claim; see Eric Martínez, “Re-Evaluating GPT-4’s Bar Exam Performance” (June 12, 2023), available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4441311. (Go back)
  2. See OECD.AI Policy Observatory, OECD AI Principles Overview, https://oecd.ai/en/ai-principles; Google AI, Our Principles, https://ai.google/principles/. (Go back)
  3. See Education Week Special Report, Michelle R. Davis, “Q&A: The Promise and Pitfalls of Artificial Intelligence and Personalized Learning” (November 5, 2019), https://www.edweek.org/technology/q-a-the-promise-and-pitfalls-of-artificial-intelligence-and-personalized-learning/2019/11; EdWeek Market Brief, Michele Molnar, “14 Questions Educators Should Ask about AI-Based Products” (July 4, 2019), https://marketbrief.edweek.org/marketplace-k-12/14-questions-educators-ask-ai-based-products/. (Go back)
  4. M. Dawes, D. Faust, and P. E. Meehl, “Clinical Versus Actuarial Judgment,” 243(4899) Science, New Series 1668–1674 (1989). (Go back)

Portrait Photo of Andreas Oranje, PhD, MBAAndreas Oranje, PhD, MBA, is the Managing Director of Assessment  Programs for the National Conference of Bar Examiners.

Contact us to request a pdf file of the original article as it appeared in the print edition.

  • Bar
    Bar Exam Fundamentals

    Addressing questions from conversations NCBE has had with legal educators about the bar exam.

  • Online
    Online Bar Admission Guide

    Comprehensive information on bar admission requirements in all US jurisdictions.

  • NextGen
    NextGen Bar Exam of the Future

    Visit the NextGen Bar Exam website for the latest news about the bar exam of the future.

  • BarNow
    BarNow Study Aids

    NCBE offers high-quality, affordable study aids in a mobile-friendly eLearning platform.

  • 2023
    2023 Year in Review

    NCBE’s annual publication highlights the work of volunteers and staff in fulfilling its mission.

  • 2023
    2023 Statistics

    Bar examination and admission statistics by jurisdiction, and national data for the MBE and MPRE.