Eval-LLMs

Innovative UIs for identifying and mitigating errors in AI-driven support interactions

Eval-LLMs explores innovative UI designs to evaluate LLMs behavior in text-based support systems for improving how users can detect and address errors in AI-generated responses, particularly in emotionally sensitive contexts like peer support or mental health applications. The goal is to create interfaces that make it easier to spot inaccuracies, biases, or misleading outputs, allowing users to intervene or correct responses in real-time. By empowering users to actively engage with LLM-generated content, this approach aims to reduce overreliance on LLMs while ensuring that the technology supports users in a responsible and ethical way.

This research is focused on developing feedback-driven UIs that highlight potential errors, enabling users to flag or adjust outputs when necessary. Key features include real-time error detection, visual cues, and feedback mechanisms, all designed to improve the transparency and reliability of AI systems. In contexts where trust and emotional safety are critical, these interfaces can build user confidence. The aim is to create systems that encourages collaborative human-AI interaction, where AI acts as a supportive tool, not an unquestioned authority.

While this research is currently under progress, read our relevant work on the harms and limitations of AI in text-based support systems where we evaluated state-of-the-art AI models and LLMs (Syed* et al., 2024), (Iftikhar et al., 2024).

References

2024

  1. CHI
    empathy.gif
    Machine and Human Understanding of Empathy in Online Peer Support: A Cognitive Behavioral Approach
    Sara Syed*, Zainab Iftikhar*, Amy Wei Xiao, and Jeff Huang
    In Proceedings of the CHI Conference on Human Factors in Computing Systems , 2024
  2. Therapy as an NLP Task: Psychologists’ Comparison of LLMs and Human Peers in CBT
    Zainab Iftikhar, Sean Ransom, Amy Xiao, and Jeff Huang
    arXiv preprint arXiv:2409.02244, 2024