The California-based Smarter Balanced Assessment Consortium is a member-led public organization that provides assessment systems to educators working in K-12 and higher education. The organization, which was founded in 2010, partners with state education agencies to develop innovative, standards-aligned test assessment systems. Smarter Balanced supports educators with tools, lessons and resources including formative, interim and summative assessments, which help educators to identify learning opportunities and strengthen student learning.
Smarter Balanced is committed to evolution and innovation in an ever-changing educational landscape. Through a collaboration with IBM Consulting®, it aims to explore a principled approach for the use of artificial intelligence (AI) in educational assessments. The collaboration was announced in early 2024 and is ongoing.
Traditional skills assessments for K-12 students, including standardized tests and structured quizzes, are criticized for various reasons related to equity. If implemented responsibly, AI has the transformative potential to offer personalized learning and evaluation experiences to enhance fairness in assessments across student populations that include marginalized groups. Thus, the central challenge is to define what responsible implementation and governance of AI looks like in a school setting.
As a first step, Smarter Balanced and IBM Consulting created a multidisciplinary advisory panel that includes experts in educational measurement, artificial intelligence, AI ethics and policy, and educators. The panel’s goal is to develop guiding principles for embedding accuracy and fairness into the use of AI for educational measurement and learning resources. Some of the advisory panel’s considerations are outlined below.
Using design thinking frameworks helps organizations craft a human-centric approach to technology implementation. Three human-centered principles guide design thinking: a focus on user outcomes, restless reinvention and empowerment of diverse teams. This framework helps ensure that stakeholders are strategically aligned and responsive to functional and non-functional organizational governance requirements. Design thinking enables developers and stakeholders to deeply understand user needs, ideate innovative solutions and prototype iteratively.
This methodology is invaluable in identifying and assessing risks early in the development process, and facilitating the creation of AI models that are trustworthy and effective. By continuously engaging with diverse communities of domain experts and other stakeholders and incorporating their feedback, design thinking helps build AI solutions that are technologically sound, socially responsible and human-centered.
For the Smarter Balanced project, the combined teams established a think tank that included a diverse set of subject-matter experts and thought leaders. This group comprised experts in the fields of educational assessment and law, neurodivergent people, students, people with accessibility challenges and others.
“The Smarter Balanced AI think tank is about ensuring that AI is trustworthy and responsible and that our AI enhances learning experiences for students,” said think tank member Charlotte Dungan, Program Architect of AI Bootcamps for the Mark Cuban Foundation.
The goal of the think tank is not to simply incorporate its members’ expertise, viewpoints and lived experiences into the governance framework in a “one-and-done” way, but iteratively. The approach mirrors a key principle of AI ethics at IBM: the purpose of AI is to augment human intelligence, not replace it. Systems that incorporate ongoing input, evaluation and review by diverse stakeholders can better foster trust and promote equitable outcomes, ultimately creating a more inclusive and effective educational environment.
These systems are crucial for creating fair and effective educational assessments in grade school settings. Diverse teams bring a wide array of perspectives, experiences and cultural insights essential to developing AI models that are representative of all students. This inclusivity helps to minimize bias and build AI systems that do not inadvertently perpetuate inequalities or overlook the unique needs of different demographic groups. This reflects another key principle of AI ethics at IBM: the importance of diversity in AI isn’t opinion, it’s math.
One of the first efforts that Smarter Balanced and IBM Consulting undertook as a group was to ascertain the human values that we want to see reflected in AI models. This is not a new ethical question, and thus we landed on a set of values and definitions that map to IBM’s AI pillars, or fundamental properties for trustworthy AI:
Operationalizing these values in any organization is a challenge. In an organization that assesses students’ skill sets, the bar is even higher. But the potential benefits of AI make this work worthwhile: “With generative AI, we have an opportunity to engage students better, assess them accurately with timely and actionable feedback, and build in 21st-century skills that are actively enhanced with AI tools, including creativity, critical thinking, communication strategies, social-emotional learning and growth mindset,” said Dungan. The next step, now underway, is to explore and define the values that will guide the use of AI in assessing children and young learners.
Questions the teams are grappling with include:
For this exercise, we undertook a design thinking framework called Layers of Effect, one of several frameworks IBM® Design for AI has donated to the open source community Design Ethically. The Layers of Effect framework asks stakeholders to consider primary, secondary and tertiary effects of their products or experiences.
For this use case, the primary (desired) effect of the AI-enhanced test assessment system is a more equitable, representative and effective tool that improves learning outcomes across the educational system.
The secondary effects might include boosting efficiencies and gathering relevant data to help with better resource allocation where it is most needed.
Tertiary effects are possibly known and unintended. This is where stakeholders must explore what potential unintended harm might look like.
The teams identified five categories of potential high-level harm:
Initially applied in legal cases, disparate impact assessments help organizations identify potential biases. These assessments explore how seemingly neutral policies and practices can disproportionately affect individuals from protected classes, such as those susceptible to discrimination based on race, religion, gender and other characteristics. Such assessments have proven effective in the development of policies related to hiring, lending and healthcare. In our education use case, we sought to consider cohorts of students who might experience inequitable outcomes from assessments due to their circumstances.
The groups identified as most susceptible to potential harm included:
As a collective, our next set of exercises is to use more design thinking frameworks such as ethical hacking to explore how to mitigate these harms. We will also detail minimum requirements for organizations seeking to use AI in student assessments.
This is a bigger conversation than just IBM and Smarter Balanced. We are publicly publishing our process because we believe those experimenting with new uses for AI should consider the unintended effects of their models. We want to help ensure that AI models that are being built for education are serving the needs not just of a few, but for society in its entirety, with all its diversity.
“We see this as an opportunity to use a principled approach and develop student-centered values that will help the educational measurement community adopt trustworthy AI. By detailing the process that is being used by this initiative, we hope to help organizations that are considering AI-powered educational assessments have better, more granular conversations about the use of responsible AI in educational measurement.”
— Rochelle Michel, Deputy Executive Program Officer, Smarter Balanced.