Explainable AI Decision-Making in Human-AI Groups

A closed-loop machine teaching framework that uses explainable robot demonstrations and particle filters to model and adapt to individual and group beliefs, improving human understanding of robot decision-making in teams.

This project enhances transparency and collaboration in human-robot teams by developing explainable teaching frameworks. The goal is to help human collaborators understand how robots make decisions in complex, time-constrained, or resource-limited environments.

The framework combines counterfactual reasoning, particle filter-based belief modeling, and pedagogical scaffolding. It dynamically models and updates individual and team beliefs about robot decision-making policies captured using Inverse Reinforcement Learning (IRL) in an MDP framework from user test responses and robot demonstrations (Jayaraman et al., 2024). Demonstrations are selected based on information gain from simulated counterfactuals, with belief updates performed via Bayesian filters (Jayaraman et al., 2024).

This illustration highlights the complexity of teaching human groups by modeling different belief states. Top-left shows three individuals with different beliefs about the robot’s decision-making. These beliefs are used to generate targeted or aggregated representations shown at the bottom: individual beliefs (distinct understanding per person), team common belief (intersection of all), and team joint belief (union of all). The robot uses these representations to adapt its explanations for improved understanding across the team.

A closed-loop teaching framework leverages insights from education literature to adaptively generate demonstrations based on individual and aggregated team beliefs. Human learners are provided with several lessons (scaffolding) associated with concepts with increasing complexity. Each lesson has demonstrations (examples) of robot behavior, check-in tests to evaluate their understanding of the underlying concept, and feedback on their performance in these tests.

This figure shows how particle filter-based belief distributions evolve for three individuals (P1, P2, P3) and their aggregated team beliefs (common and joint) across teaching stages: demonstrations, tests, and feedback. Feedback is either confirmatory (✓) or corrective (✗), and helps refine the learner’s understanding of the robot's reward function in terms of mud cost (w₀), recharge reward (w₁), and action cost (w₂).

The research explored how teaching strategies tailored to group or individual beliefs can significantly benefit different groups characterized by varying levels of learner capabilities. The framework leverages education-inspired scaffolding, with demonstrations, concept tests, and targeted feedback. It shows that group belief strategies benefit proficient teams, while individualized strategies are better for mixed-ability groups. These findings were validated in both simulated (Jayaraman et al., 2024) and empirical online studies (Jayaraman et al., 2025).

This work lays the groundwork for real-time adaptive explainable AI that supports group trust calibration, dynamic policy explanation, and collective behavior modeling, with implications for interactive AI systems, collaborative robotics and autonomous decision support tools.

References

2025

  1. Explaining Robot Behavior to Groups: Machine Teaching for Transparent Decision-Making
    Suresh Kumaar Jayaraman, Aaron Steinfeld, Henny Admoni, and 1 more author
    2025
    Manuscript in preparation

2024

  1. Modeling human learning of demonstration-based explanations for user-centric explainable AI
    Suresh Kumaar Jayaraman, Aaron Steinfeld, Reid Simmons, and 1 more author
    In Presented at the Explainability for Human-Robot Collaboration workshop at the ACM/IEEE International Conference on Human-Robot Interaction, 2024
  2. Understanding Robot Minds: Leveraging Machine Teaching for Transparent Human-Robot Collaboration Across Diverse Groups
    Suresh Kumaar Jayaraman, Reid Simmons, Aaron Steinfeld, and 1 more author
    In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024