EdTech Discovery
Hermes

An instrument for spotting the next edtech opportunity — generated ideas, each traced to the real-world signals behind it.

Updated Jun 24, 2026 · 10 ideas · 1304 signals
Admin mode — curation controls visible. Keep this URL (with token) private.

Signals

The evidence library — the raw signals the pipeline is watching across the education ecosystem. Every idea is built from these.

technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Quantum Cinema: An Interactive Cinematic Exploration of Quantum Computing Hardware via Generative World Models

arXiv:2606.17102v2 Announce Type: replace-cross Abstract: Quantum computing promises transformative advances across science and industry, yet the physical hardware that enables these computations remains invisible to the public: quantum processors operate inside sealed dilution refrigerators at temperatures near absolute zero, making direct observation impossible. This "imagination gap" between quantum computing's growing societal impact and the public's ability to visualize it represents a significant barrier to quantum literacy and workforce development. We present Quantum Cinema, an open-source, browser-based interactive application that closes this gap by transforming invisible quantum hardware into explorable, cinematic experiences using generative world models. Quantum Cinema guides users through a four-act narrative -- from the foundational Nobel Prize-winning science of quantum entanglement, through curated video introductions to three major quantum computing architectures (tra

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Lect\=uraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching

arXiv:2606.16428v2 Announce Type: replace-cross Abstract: Effective personalized AI-assisted learning demands systems that can not only generate accurate learner-specific educational materials, but also dynamically adapt their instruction to diverse learners. However, existing educational agents have primarily focused on lecture content automation and simulations, which often fall short of modelling multimodal and embodied instructional methods tailored for the individual learner. To this end, we propose Lect\=uraAgents - a multi-agent framework that enables personalized learning through end-to-end adaptive embodied teaching. At its core, Lect\=uraAgents mirrors a professor-student relationship, in which a ProfessorAgent leads a collaborative team of specialized subordinate agents through research, planning, review, and embodied delivery of lecture contents that adapt to a learner's needs. The framework offers three main contributions: (1) a hierarchical multi-agent architecture for en

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Face versus Body Tracking for Human-Robot Interaction: An Egocentric Dataset

arXiv:2606.03694v2 Announce Type: replace-cross Abstract: Meaningful human-robot interaction (HRI) requires a robot to continuously assess user engagement through persistent user tracking. However, state-of-the-art Multi-Object Tracking models are heavily optimized for surveillance or autonomous driving. A social robot faces distinct egocentric challenges, such as humans moving in unpredictable nonlinear patterns, obstructing each other, or leaving and reentering the scene. These dynamics trigger frequent identity switches (IDSW), causing the robot to lose its footing mid-conversation. To address this, we introduce a focused, custom-annotated egocentric dataset collected via the Furhat robot. We present a systematic evaluation isolating detection errors from tracking logic, comparing face versus body tracking, and assessing the impact of extended memory and appearance re-identification (ReID). Results indicate that increasing temporal memory mitigates prolonged occlusions but fails on

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Lightweight Test-Time Adaptation for EMG-Based Gesture Recognition

arXiv:2601.04181v2 Announce Type: replace-cross Abstract: Reliable long-term decoding of gestures from surface electromyography (EMG) is hindered by signal drift caused by electrode displacement, muscle fatigue, and/or posture changes. Although modern models achieve high intra-session accuracy, their performance often degrades substantially across recording sessions. Existing approaches to mitigate this problem typically rely on large training datasets or computationally intensive pipelines that are unsuitable for energy-efficient wearable devices. We propose a lightweight test-time adaptation framework for EMG decoding. The framework includes three complementary adaptation strategies: (i) causal adaptive batch normalization for online statistical alignment, (ii) Gaussian Mixture Model alignment with experience replay to mitigate forgetting, and (iii) meta-learning for rapid few-shot calibration. We evaluate these methods on the multi-session NinaPro DB6 dataset. All approaches substan

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

MILE: A Mechanically Isomorphic Hand Exoskeleton and Visuotactile Robotic Hand for Data Collection in Dexterous Manipulation

arXiv:2512.00324v4 Announce Type: replace-cross Abstract: Dexterous robotic hands are expected to perform complex, contact-rich object manipulation, but learning such skills remains challenging because high-dimensional hands require high-fidelity demonstrations. Imitation learning provides a practical route for acquiring dexterous manipulation skills from human demonstrations, yet collecting synchronized multimodal demonstrations with accurate hand actions and tactile observations remains a key bottleneck. We present MILE, a teleoperation-based data-collection system comprising the human-first MILE exoskeleton and the mechanically corresponding MILE-Tac robotic hand. The system integrates custom-designed and fabricated modular joint encoders and compact MILE fingertip visuotactile sensor modules. The exoskeleton is informed by human-hand anatomy and ergonomic constraints, while the robotic hand is co-designed to preserve the selected four-finger kinematic topology. This correspondence

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Multimedia and Visual Analytics in the Agentic Era

arXiv:2504.06138v3 Announce Type: replace-cross Abstract: Professional users need tools to help them gain actionable insights from large multimedia collections. Foundation models and AI agents have rapidly changed the playing field, and improving their accuracy, trustworthiness, and reasoning capabilities are active topics in the computer vision, machine learning, and multimedia communities. Most current research focuses on benchmark driven algorithmic improvements. The multimedia community is the place to go beyond algorithms and consider complete multimedia analytics systems that support professional users in their complex tasks and achieve a true teaming of humans and AI. Supporting users with machine learning and visualizations has been studied for decades in the visual analytics field. In this paper, we propose a framework to bring multimedia and visual analytics together and indicate how it could impact current and new multimedia analytics solutions. Additional information can be

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

AI-Driven Analytics of Team-Teaching Talk: Acoustic Patterns across Experience, Cohorts and the Learning Design

arXiv:2606.09831v2 Announce Type: replace Abstract: As classroom cohorts expand, team teaching is increasingly used to integrate the expertise and pedagogical perspectives of multiple teachers. Yet, there is limited empirical understanding of how team teaching unfolds in practice, particularly regarding differences in teachers' contributions across experience levels, student cohorts, and learning task design. Prior research on team teaching has largely relied on retrospective self-reports or small-scale observations, offering limited insight into the micro-level processes through which team teaching is enacted. Teacher talk offers a scalable lens on these processes. While research in individual teaching contexts shows that acoustic features of speech (e.g., voice quality, intonation, and loudness) can shape student learning, evidence from team-teaching settings remains scarce. Moreover, capturing such features through manual observation or transcription is especially challenging in tea

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

MyoInteract: A Framework for Fast Prototyping of Biomechanical HCI Tasks using Reinforcement Learning

arXiv:2602.15245v2 Announce Type: replace Abstract: Reinforcement learning (RL)-based biomechanical simulations have the potential to revolutionise HCI research and interaction design, but currently lack usability and interpretability. Using the Human Action Cycle as a design lens, we identify key limitations of biomechanical RL frameworks and develop MyoInteract, a novel framework for fast prototyping of biomechanical HCI tasks. MyoInteract allows designers to setup tasks, user models, and training parameters from an easy-to-use GUI within minutes. It trains and evaluates muscle-actuated simulated users within minutes, reducing training times by up to 98%. A workshop study with 12 interaction designers revealed that MyoInteract allowed novices in biomechanical RL to successfully setup, train, and assess goal-directed user movements within a single session. By transforming biomechanical RL from a days-long expert task into an accessible hour-long workflow, this work significantly lower

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

"GenAI Defaults to Bias!" Gamify AI Literacy Through Reflections on Prompts

arXiv:2509.13679v2 Announce Type: replace Abstract: As Generative AI (GenAI) becomes widespread, it is increasingly important for the public to understand the model's behaviors and biases. However, existing AI literacy efforts miss opportunities to engage the general public to reflect on enduring GenAI bias and behaviors (e.g., how GenAI defaults to its internal bias in response to ambiguous or challenging prompts). In this work, we introduce ImaginAItion, a multiplayer game to help adults better reflect on GenAI bias and understand GenAI behaviors. ImaginAItion is grounded in reflective play to surface GenAI limitations by encouraging players to manipulate prompt specificity (e.g., an underspecified prompt "CEO" defaults to a white man). From ten sessions (n=30), we find that the game significantly improved players' understanding of GenAI behaviors by 35% in accuracy. Qualitative analysis showed how game mechanisms supported player reflections, including on prompting strategies to mit

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

From Inquisitorial to Adversarial: Using Legal Theory to Redesign Online Reporting Systems

arXiv:2506.07041v2 Announce Type: replace Abstract: User reporting systems play a central role in how online communities address interpersonal conflict and harassment, especially in private spaces such as direct messages, voice chats, and end-to-end encrypted messaging. These settings complicate evidence collection for community moderators while heightening users' concerns about procedural justice and privacy. To examine these challenges, we draw on adversarial legal frameworks from offline judicial systems and apply them to community-level reporting systems, using Discord as a research site. We find that online community reporting systems often follow an inquisitorial model, in which moderators lead evidence collection and case development, rather than an adversarial model, which gives users greater control over how evidence is presented and contested. Although adversarial practices can strengthen procedural justice and protect privacy, they can also introduce new risks of abuse, unde

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Assessing Distribution Shift in Human Activity Recognition for Domain Generalization

arXiv:2606.24781v1 Announce Type: cross Abstract: While the field of Human Activity Recognition (HAR) continues to draw interest from researchers and advance in important ways, some key challenges remain. One of the most difficult aspects of building HAR models that show good performance in real-world settings is dealing with data diversity from device and sensor heterogeneity, and contextual changes that are intrinsic to real-world applications. While data diversity in HAR has been well-acknowledged in the literature, there remains a gap in understanding the effect of various types of distribution shifts on HAR models and the domain generalization problem that arises. Towards that end, this paper systematically evaluates 4 different types of distribution shifts, including variations in device type, sensor placement, sampling rate, and user behavior. Quantifying their effects, we illustrate that diversity shifts predominantly define all types of shifts, indicating the existence of uniq

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Task Decomposition for Efficient Annotation

arXiv:2606.24734v1 Announce Type: cross Abstract: High-quality annotations of structured representations are expensive to collect over large corpora. Manual annotation of structure is laborious, and model-based annotation, although cheaper to generate, requires expensive validation and potentially significant supervision to ensure that the annotation quality is strong enough to be useful downstream. In traditional annotation workflows, annotation of each complete example is performed end-to-end by a single annotator. However, structured annotation is complex, and each aspect of the task represents a unique challenge with an associated inferential load for a given annotator. Modern annotation projects can incorporate heterogeneous groups of annotators, including both models and human annotators with varying domain and linguistic expertise. It remains unclear, however, how to redesign annotation tasks in this setting, where efforts are discriminately allocated across heterogeneous annota

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Measuring User's Mental Models of Speech Translation in Human-AI Collaboration

arXiv:2606.24644v1 Announce Type: cross Abstract: Millions of people use machine translation (MT) tools daily, yet little is known about their perception of what systems can and cannot do. This paper studies users' mental models of speech translation systems through a new framework based on cross-lingual question answering, where users either accept MT output or request professional re-translation to answer questions based on the information presented in a foreign language. By analyzing user behavior and accuracy trends across varying translation qualities, we examine to what extent they can predict where the system is likely to be wrong, and how this mental model evolves. Users develop stronger mental models with practice, especially when they have some knowledge of the source language, primarily by relying on surface-level error cues. Moreover, providing speech transcriptions can help users develop better mental models. Our results show the promise of cross-lingual question answering

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Themis: An explainable AI-enabled framework for Reinforcement Learning with Human Feedback

arXiv:2606.24622v1 Announce Type: cross Abstract: Training safe Reinforcement Learning (RL) systems is inherently challenging, with no guarantee of avoiding unwanted behaviors. The most effective defenses against this are (i) transparency through explainability and (ii) alignment via human feedback. While both show promising results, no publicly available framework currently combines them. To address this, we introduce Themis, an XAI-enabled testing and evaluation framework for Reinforcement Learning from Human Feedback. Themis supports over 200 widely used environments and is easily configurable for experiments in RL, transparency, and alignment. Our results show that Themis can train reward models that match or outperform the environment's true reward signal using human preferences. We also provide a cloud-based platform for collecting human feedback and managing experiments. It is user-friendly, auto-scalable, and supports large participant groups across multiple experiments without

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation

arXiv:2606.24515v1 Announce Type: cross Abstract: Computer-Use Agents (CUAs) execute high-level user goals by perceiving and acting directly within graphical user interfaces. However, reinforcement learning for CUAs remains difficult because open-ended desktop environments rarely provide scalable, machine-readable reward signals: task success is often visually grounded and hard to specify with handcrafted reward functions or dense manual labels. We propose an RL fine-tuning framework that uses autonomous vision-language evaluation as a scalable supervision signal for GUI agents. Given a final screenshot and the original instruction, a Vision-Language Model judges task completion and provides terminal feedback without task-specific heuristics or manual labels during policy optimization. Because autonomous evaluators are imperfect, we model their feedback as a noisy binary reward channel and derive a noise-corrected reward estimator for Proximal Policy Optimization. Experiments across ma

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Real-Time Interactive Music Generation via Data-Free Streaming Consistency Distillation

arXiv:2606.24307v1 Announce Type: cross Abstract: Interactive music and live performance relies on real-time human expression, but modern generative music AI remains largely absent from this domain due to its prohibitive inference latency and offline rendering paradigm. To provide pioneer musicians with a novel medium for interactive composition, we should fundamentally change these static models into dynamic, playable instruments. In this paper, we propose a framework that bridges this gap. To achieve the low latency required for live interaction without sacrificing structural coherence, we formulate distillation within a streaming autoregressive latent space. Our approach gets rid of the need for expensive paired audio-latent datasets by utilizing prompt-only inputs to synthesize teacher-guided, chunk-wise trajectories on the fly. Because live instruments require high acoustic fidelity, we introduce music-aware consistency objectives, which combine latent, spectral, and temporal-diff

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Dialogue to Discovery: Attribute-Aware Preference Elicitation for Conversational Product Search Assistants

arXiv:2606.24194v1 Announce Type: cross Abstract: Conversational product search assistants offer a more expressive, natural, and interactive alternative to traditional keyword-based product search. With limited screen space, showing only a few items increases the need for precise preference elicitation, which can prolong conversations, leading to user frustration and session abandonment. Conversely, rushing to recommend items without a clear understanding of preferences risks poor matches and a degraded user experience. We present Dialogue to Discovery (D2D), an attribute-oriented preference elicitation framework that dynamically exploits the structure of product attributes to efficiently steer conversations toward the user's desired item. D2D adaptively prioritizes the most informative queries and strategically times product recommendations, reducing premature or off-target suggestions that harm engagement. To evaluate D2D, we curate three datasets from the Amazon Reviews corpus. In s

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Aspect-Based Sentiment Evolution and its Correlation with Review Rounds in Multi-Round Peer Reviews: A Deep Learning Approach

arXiv:2606.24188v1 Announce Type: cross Abstract: Mining sentiment information from the textual content of peer review comments offers valuable insights into the scientific evaluation process. However, previous studies are often constrained by coarse-grained analysis and the lack of differentiation across review rounds. Notably, the dynamic shifts in reviewers' focus and sentiment tendencies throughout multiple review stages remain underexplored. To address this gap, the present study investigates the distribution and evolution of aspect-level sentiments and examines their correlation with the number of review rounds. We begin by segmenting the multi-round review comments of 11,063 accepted papers from Nature Communications and identifying fine-grained review aspect clusters. A manually annotated corpus of approximately 5,000 review sentences is then constructed. Using this dataset, we train a series of deep learning-based aspect sentiment classification models. Among them, the LCF-BER

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Ten Digits on a Train: AI-Assisted Verification of Two Eigenvalue Problems

arXiv:2606.23821v1 Announce Type: cross Abstract: Accurate numerical eigenvalues are often difficult to certify, especially in singular or non-normal settings. This article reports a human--AI collaboration on two such computations. For a singular self-adjoint Schr\"odinger operator, a verified zero count and Dirichlet--Neumann bracketing certify the complete negative spectrum to ten decimal places. For a delicate non-normal atom--molecule benchmark, a previously unresolved resonance pair is separated, with each member enclosed to ten digits. The second result is achieved not by increasing the precision of one-way shooting, but by reformulating the problem as a global matching system for projective solution lines. The infinite tail is encoded as uncertainty in the terminal projective data, and a componentwise, tail-robust Krawczyk--Brouwer inclusion supplies the certificate. This gives a reusable architecture for analytic boundary-value systems with ill-conditioned propagation and unce

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

EvidenceLens: A Claim-Evidence Matrix for Auditing Financial Question Answering

arXiv:2606.23724v1 Announce Type: cross Abstract: Large language models are increasingly used to answer questions over annual reports, earnings decks, and analyst notes, yet their outputs remain difficult to verify in high-stakes financial workflows. A fluent answer can blend directly grounded statements, weak synthesis, and unsupported claims across narrative text, tables, and charts. We present EvidenceLens, a visual analytics prototype that treats financial question answering as a claim-evidence alignment problem. The system decomposes an answer into atomic claims, summarizes support composition and confidence, support gaps, and coordinates claim-level inspection with source passages, table cells, and chart regions. Its core visual representation is a multimodal claim-evidence matrix that makes coverage, contradiction, and modality imbalance immediately visible. To support reproducibility, we also specify a JSON-based artifact schema, a lightweight multimodal alignment pipeline, and

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Zero-Shot Neural Priors for Generalizable Cross-Subject and Cross-Task EEG Decoding

arXiv:2606.23706v1 Announce Type: cross Abstract: The development of generalizable electroencephalography (EEG) decoding models is essential for robust brain-computer interfaces (BCI) and objective neural biomarkers in mental health. Conventional approaches have been hindered by poor cross-subject and cross-task generalization, owing to high inter-subject variability and non-stationary neural signals. We address this challenge with a zero-shot cross-subject decoding framework on the large-scale Healthy Brain Network dataset, benchmarking a convolutional neural network baseline, a hybrid LSTM, and a Transformer-based foundation model. To adapt the Transformer for regression while averting catastrophic forgetting, we propose a novel progressive unfreezing strategy. The baseline yielded an nRMSE of 0.9991, whereas our fine-tuned Transformer achieved 0.9799 on unseen subjects. This work advances scalable, calibration-free EEG decoding for computational psychiatry and behavioral prediction.

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Evaluating LLM Usage for Efficient and Explainable Numerical and Classified Implicit Sentiment Analysis of Product Desirability

arXiv:2606.23701v1 Announce Type: cross Abstract: Qualitative product feedback can reveal nuanced user experiences, but its implicit sentiment is difficult to measure. This paper presents a scalable and interpretable framework that uses large language models (LLMs) to quantify product desirability from such data. Using two Product Desirability Toolkit (PDT) datasets from ZORQ and CARMA comprising 106 respondent term groupings with gold-standard human annotation, zero-shot continuous numerical sentiment scoring and categorical sentiment classification are evaluated without relying on explicit review scores. Across the datasets, LLMs generated numerical sentiment scores directly from qualitative responses and closely matched expert labels, achieving Pearson correlations up to 0.97 and classification accuracy up to 94%. LLMs maintained robustness even when handling data presented in multiple forms and consistently expressed high confidence. In contrast, lexicon-based and transformer basel

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

A Geometry-Informed Computer Vision Method for Detecting and Examining Overtaking Vehicles From A Bicycle

arXiv:2606.23699v1 Announce Type: cross Abstract: Instrumented bicycle studies have produced direct field evidence on vehicle passing behavior, but extracting overtaking events from continuous rear-facing video has remained dependent on manual, frame-by-frame annotation. This bottleneck constrains sample sizes and limits naturalistic cycling safety research. We present a geometry-informed computer vision pipeline that automates overtaking event detection from a single bicycle-mounted camera without multi-sensor configurations or explicit camera calibration. The system combines RT-DETR object detection with ByteTrack multi-object tracking through a three-stage geometric validation module enforcing bearing angle trend, apparent size growth, and spatial confirmation criteria derived from perspective projection principles. Validated on 315 manually annotated real-world overtaking events from urban roads in Ann Arbor, Michigan, the pipeline achieved 97.8% recall with zero false positives. T

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

"Zooming In" on Agentic Web Browsers as Assistive Technologies: A Case Study with a Low-Vision Technology Expert

arXiv:2606.24870v1 Announce Type: new Abstract: Agentic Web Browsers (AWBs), powered by Large Language Models (LLMs), are emerging as autonomous systems capable of navigating the Web on behalf of users. Beyond enhancing productivity, they could also offer significant promise as Assistive Technologies (ATs) for visually-impaired individuals, transforming web interaction into a fluid conversational exchange. In this paper, we present a case study with a low-vision technology expert, examining how AWBs can support visually-impaired users in web navigation. The findings show that, despite the current limitations, the navigation experience is notably fluid and flexible, underscoring the strong potential of AWBs to enhance accessibility and reduce barriers in web interaction, with implications that may extend beyond accessibility to agentic UX more broadly.

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

It's Complicated: On the Design and Evaluation of AI-Powered AAC Interfaces

arXiv:2606.24854v1 Announce Type: new Abstract: Artificial intelligence (AI) can enhance what people who use augmentative and alternative communication (AAC) are able to do with their systems. However, evaluating AI-powered AAC interfaces can be difficult. People are intersectional beings and current evaluation metrics can struggle to capture the multifaceted and nuanced desires people may have for their AAC. We explore the complicated nature of six AAC problem spaces, explore how AI might be used in these spaces, and suggest more robust methods of evaluation that take the intersectional nuances of people into account. We also discuss broader issues that arise across these problem spaces and how they could be addressed using our proposed evaluation methods.

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Virtual Simulation for Mental Health

arXiv:2606.24826v1 Announce Type: new Abstract: Poorly designed interventions or those deployed without adequate safeguards can harm the communities they aim to serve, thus exacerbating existing vulnerabilities and leaving individuals unsupported. This is especially the case for the mental health context, where there is a growing trend of relying on technological interventions due to their accessibility and ability to deliver large-scale support. However, the mental health context is also particularly sensitive to change and risks of failure are dire; at their worst, failures in mental health interventions can result in lasting negative outcomes for individuals and tragic losses as people fall through the cracks. Thus, enabling safe ways to experiment in the mental health context is vital to allow both individuals and communities to engage with new interventions without risk of their real-world consequences. Virtual simulation, which uses virtual environments to replicate real-world in

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

SciFi-VIS: Way Out There -- How SciFi and Visualization Influence Each Other

arXiv:2606.24731v1 Announce Type: new Abstract: We propose a hybrid half-day workshop at IEEE VIS 2026, calling for participation from visualization researchers and science fiction creators in order to develop a systematic understanding of the two-way relationship these communities have long shared. We invite submissions of creative formats showcasing connections and inspiring future research. Our workshop plan includes a keynote, lightning talks, brainstorming, cross-community critique, affinity mapping, and discussion around identified themes.

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

SupplyNet: Supporting Visual Exploratory Learning in Supply Chain via Contextual Multi-Agent Simulation

arXiv:2606.24694v1 Announce Type: new Abstract: Simulation has long supported supply chain management instruction by letting learners observe network behavior and test decision strategies. Recent progress in LLM-driven agents opens new possibilities for richer, more adaptive simulations, but many existing systems still present abstract, opaque data that overwhelms learners and discourages active exploration. We introduce \textit{SupplyNet}, a gamified visual simulation system built on a contextual graph-based LLM multi-agent framework that models interdependent supply chain dynamics and provides responsive feedback through tiered challenges. \textit{SupplyNet} turns the simulation into a manipulable decision space by integrating an interactive network view of system state, a branching timeline for "what-if" exploration and comparison, and a task-oriented analysis console for structured performance breakdowns. Together, these visual components support counterfactual exploration, causal

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Optimizing Visual Analytics Workflows: From Theory to Practice

arXiv:2606.24454v1 Announce Type: new Abstract: The principle of visual analytics (VA) is to provide integrated workflows where human-centric processes (e.g., visualization and interaction) and machine-centric processes (e.g., statistics and algorithms) complement each other. To implement this principle in practice, it is necessary to reason about the trade-offs among different processes and make optimal use of them in a workflow. Building on an existing ontology of the methodology for analyzing such trade-offs information-theoretically and for optimizing VA workflows systematically, we investigate ways to transform this methodology from theory to practice. In particular, we adopted the action research method. Through case studies in different application domains, VA researchers with different background knowledge and experiences offered their answers to several hypotheses about using the methodology in practice and proposed ways forward. In this paper, we present our collective analys

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Average Rankings Mask Per-Subject Optimality: A Friedman-Nemenyi Benchmark of EEG Motor-Imagery BCI Decoders

arXiv:2606.24394v1 Announce Type: new Abstract: Electroencephalography (EEG) is the dominant non-invasive modality for brain-computer interfaces (BCIs), yet reliable decoding of motor imagery is hampered by inter- and intra-individual variability. A recurring claim is that one decoding pipeline, most often a spatial or Riemannian method, is broadly preferable. We test the weakest version of that claim under the most favourable conditions. Using the Mother of All BCI Benchmarks (MOABB) framework, we evaluated 1,056 decoding configurations (feature extractor x scaler x classifier), >340,000 subject-level model fits, across three public left-versus-right motor-imagery datasets (PhysionetMI, 109 participants; Cho2017, 52; Zhou2016, 4) and two frequency bands (8-15 Hz, 8-30 Hz). Every model is fit and tested within a single session of a single participant, the easiest regime, giving every pipeline its best chance. We apply the statistics standard for multi-classifier comparison: Friedman om

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

A Dynamic Coupling Theory of Expertise Through Thinking Flow and Workflow Evolution

arXiv:2606.24197v1 Announce Type: new Abstract: Expertise has long been explained through tacit knowledge, deliberate practice, skill acquisition, and expert performance. While these perspectives have advanced understanding of expertise, they often describe its conditions or outcomes rather than the cognitive architecture through which expertise continuously emerges and evolves. This paper proposes Workflow Cognition as a theoretical framework for explaining expertise as a dynamic cognitive phenomenon. Workflow Cognition is defined as the cognitive architecture emerging from the recursive coupling of Thinking Flow and Workflow Evolution. Thinking Flow refers to ongoing processes of perception, interpretation, judgement, decision-making, and reflection; Workflow Evolution refers to the continuous adaptation of actions, task structures, and operational strategies within situated practice. Through their coupling, expertise is not treated as a static accumulation of knowledge or skill, but

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Human-Centered Design: The Disclosure of Generative Artificial Intelligence for Emerging Professionals

arXiv:2606.24136v1 Announce Type: new Abstract: As the Human centered design continues to grow, generative AI has the potential to streamline the research process by iterating tasks within established workflows to increase efficiency. However, integrating AI raises concerns surrounding ethical bias, complexity, and the lack of prioritization of humanistic values. Emerging professionals represent a cohort with the opportunity to learn Human Centered Design principles, yet without this foundation AI becomes more of a crutch than a tool, leading to reduced experience with deep work, decreased autonomy, and deskilling of key foundations. Disclosures are a common method to self report AI usage, but they provide little clarification on appropriate implementation and may encourage omission to avoid consequences. This paper reflects on experiences in the Human Centered Design course ITIS8300, which emphasized optimizing user experience, enhancing innovation and collaboration, and improving eff

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

The impact of generative artificial intelligence on academic development of Chinese students in humanities and social sciences

arXiv:2606.24104v1 Announce Type: new Abstract: Generative artificial intelligence(GenAI) is reshaping learning in higher education, with particularly pronounced implications for the humanities and social sciences(HSS), where learning outcomes are commonly expressed through written and interpretive forms that align closely with GenAI's capabilities. Yet, systematic evidence on the educational impacts of GenAI on HSS students remains limited. Addressing this gap, this study draws on a large-scale survey of HSS students in China to examine its role in academic development. Guided by relevant learning theories, this study focuses on four dimensions: patterns of use, effects on learning processes and academic performance, challenges associated with GenAI use, and preferred approaches to curricular integration. We found that more than half perceived enhanced learning motivation, independent thinking and creativity, although a substantial minority reported little change or even decline. Comp

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Do Language Models Pass the Bechdel Test? Auditing Gender Biases in LLM-Generated Screenplays

arXiv:2606.24022v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly used in media production from journalistm to filmmaking, what impact do they have on the stories being told? Prior work has shown LLMs to perpetuate social biases, including those related to gender. We complement existing literature on gender bias in LLM outputs by auditing the network structure of LLM-generated movie screenplays through automating the Bechdel test, a popular measure of women's representation in literary and film works. We also introduce the use of social network analysis measures to further analyze representational bias in LLM-generated scripts. We evaluate screenplays generated by three state-of-the-art LLMs (GPT-5, Gemini 3 Pro, and Claude Sonnet 4.5) against 768 corresponding human-written screenplays, finding that human-written scripts are more likely to pass the Bechdel test. However, other network analyses, like centrality, homophily, and triadic relationships demons

Source ↗
technology Wed, 24 Jun 2026 00:00:00 -0400
arXiv cs.HC

Embodied Explainability and Ontological Obstacles: Why We Struggle to Explain the Answers of Large Language Models (LLMs)

arXiv:2606.23840v1 Announce Type: new Abstract: Explainability is often framed as a property of an AI model, with explanations extracted from its internals and shown to users. In this argument paper, we instead provide an embodied account of explainability based on Dourish and enactivist cognition: understanding is created in use as people act on affordances in shared practice. Using demonstrations and conceptual analysis, we reveal ontological obstacles when "looking inside" large language models: surrogates import external abstractions that can be mistaken for the model's, and focusing on internal reasoning misses that explainers participate in their own understanding. We discuss these obstacles in XAI practice, arguing that many explanations are misnamed, which skews their purpose and can increase overreliance. Finally, we highlight how embodied explanations reorganize sense-making by making what matters publicly available for action, and argue that explainability claims should be r

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

A Low-Code Approach for the Automatic Personalization of Conversational Agents

arXiv:2605.02384v3 Announce Type: replace-cross Abstract: The rise of Large Language Models (LLMs) has increased the demand for Conversational Agents (CAs) capable of understanding human conversations as part of web applications. While traditional CAs consist of deterministic states, LLMs enhance their capabilities to handle open conversations, handling arbitrary requests. Numerous tools exist that allow non-technical users to create such CAs. Yet, the creation of personalized CAs able to adapt to the profile of end-users to offer an optimal user experience remains in the hands of experienced developers implementing ad-hoc personalizations. In this work, we propose a pipeline that follows a low-code/no-code approach to facilitate the modeling and generation of personalized CAs. A pilot user study was performed to get preliminary results on perceived usability and usefulness and the full pipeline has been implemented on top of an open-source low-code platform.

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

Aligning Human-AI-Interaction Trust for Mental Health Support: Survey and Position for Multi-Stakeholders

arXiv:2604.20166v2 Announce Type: replace-cross Abstract: Building trustworthy AI systems for mental health support is a shared priority across stakeholders from multiple disciplines. However, "trustworthy" remains loosely defined and inconsistently operationalized. AI research often focuses on technical criteria (e.g., robustness, explainability, and safety), while therapeutic practitioners emphasize therapeutic fidelity (e.g., appropriateness, empathy, and long-term user outcomes). To bridge the fragmented landscape, we propose a three-layer trust framework, covering human-oriented, AI-oriented, and interaction-oriented trust, integrating the viewpoints of key stakeholders (e.g., practitioners, researchers, regulators). Using this framework, we systematically review existing AI-driven research in mental health domain and examine evaluation practices for ``trustworthy'' ranging from automatic metrics to clinically validated approaches. We highlight critical gaps between what NLP curre

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos

arXiv:2603.25645v2 Announce Type: replace-cross Abstract: Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic annotations required to evaluate modern Multimodal Large Language Models (MLLMs). To address this critical gap, we introduce Colon-Bench, generated via a novel multi-stage agentic workflow. Our pipeline seamlessly integrates temporal proposals, bounding-box tracking, AI-driven visual confirmation, and human-in-the-loop review to scalably annotate full-procedure videos. The resulting verified benchmark is unprecedented in scope, encompassing 528 videos, 14 distinct lesion categories (including polyps, ulcers, and bleeding), over 300,000 bounding boxes, 213,000 segmentation masks, and 133,000 words of clinical descri

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care

arXiv:2601.16529v3 Announce Type: replace-cross Abstract: Large language models (LLMs) deployed in clinical decision support may acquiesce to patient requests for care that conflicts with evidence-based guidelines. We developed SycoEval-EM, a multi-agent simulation framework to evaluate LLM robustness to adversarial patient persuasion in emergency medicine. Across 19 contemporary LLMs and 1,425 simulated clinical encounters spanning three Choosing Wisely scenarios, acquiescence rates ranged from 0% to 100%, revealing a bimodal distribution. Seven models maintained near-perfect guideline adherence, while six acquiesced in the majority of encounters. Vulnerability varied substantially across clinical scenarios. Acquiescence was highest for CT imaging requests, intermediate for antibiotic prescriptions for sinusitis, and lowest for opioid prescriptions for acute back pain. Model scale, recency, and performance on static medical benchmarks did not consistently predict robustness. All five

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

Towards a Bathroom-Centered Human-Building Digital Twin Framework for Indoor Safety Analysis

arXiv:2606.23292v2 Announce Type: replace Abstract: Bathroom use is a critical safety challenge for older adults because wet surfaces, constrained layouts, limited support, and frequent posture transitions are concentrated within a small domestic space. These conditions create risks that cannot be adequately understood by considering either the bathroom environment or human motion in isolation. Existing bathroom safety studies mainly identify hazards, accessibility problems, or design modifications, whereas human-centered sensing studies often focus on activity recognition or fall detection without sufficient semantic understanding of the surrounding environment. This separation limits the interpretation of how older adults interact with fixtures, support surfaces, wet areas, and spatial constraints during daily bathroom activities. To address this gap, this study proposes a bathroom-centered human-building digital twin framework for interaction-aware indoor safety analysis with a spec

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

Seeing the Reasoning: How LLM Rationales Influence User Trust and Decision-Making in Factual Verification Tasks

arXiv:2603.07306v2 Announce Type: replace Abstract: Large Language Models (LLMs) increasingly show reasoning rationales alongside their answers, turning "reasoning" into a user-interface element. While step-by-step rationales are typically associated with model performance, how they influence users' trust and decision-making in factual verification tasks remains unclear. We ran an online study (N=68) manipulating three properties of LLM reasoning rationales: presentation format (instant vs. delayed vs. on-demand), correctness (correct vs. incorrect), and certainty framing (none vs. certain vs. uncertain). We found that correct rationales and certainty cues increased trust, decision confidence, and AI advice adoption, whereas uncertainty cues reduced them. Presentation format did not have a significant effect, suggesting users were less sensitive to how reasoning was revealed than to its reliability. Participants indicated they use rationales to primarily audit outputs and calibrate tru

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

Tinker Tales: A Tangible Dialogue System for Child-AI Co-Creative Storytelling

arXiv:2602.04109v2 Announce Type: replace Abstract: Conversational AI agents are increasingly explored as creative partners, yet how conversation design shapes child-AI dialogue in co-creative settings remains underexplored. We present Tinker Tales, a tangible dialogue system for child-AI collaborative storytelling, in which educational frameworks (narrative development and social-emotional learning) are instantiated as conversation design, shaping how the agent engages children across four narrative stages. The system combines a physical storytelling board, NFC-embedded toys, and a mobile app mediating multimodal interaction through tangible manipulation and voice-based dialogue. We conducted a home-based user study with 10 children (ages 6-8) across two conversation design conditions varying in how the agent structured elaboration, with and without educational scaffolding. Our findings show that prompt framing shapes the form and consistency of children's narrative contributions, str

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

Virtual Reality Alters Perceived Functional Body Size

arXiv:2510.00824v2 Announce Type: replace Abstract: Virtual reality (VR) introduces sensory perturbations that may impact perception and action. The current study was designed to investigate how immersive VR presented through a head-mounted display (HMD) affects perceived functional body size using a passable aperture paradigm. Participants (n=60) performed an action task (sidle through apertures) and a perception task (adjust aperture width until passable without contact) in both physical, unmediated reality (UR) and VR. Results revealed significantly higher action and perceptual thresholds in VR compared to UR. Affordance ratios (perceptual threshold over action threshold) were also higher in VR, indicating that the increase in perceptual thresholds in VR was driven partly by sensorimotor uncertainty, as reflected in the increase in the action thresholds, and partly by perceptual distortions imposed by VR. This perceptual overestimation in VR also persisted as an aftereffect in UR fo

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

The Pin of Shame: Examining Content Creators' Adoption of Pinning Inappropriate Comments as a Moderation Strategy

arXiv:2505.14844v2 Announce Type: replace Abstract: Many social media platforms allow content creators to pin user comments in response to their content. Once pinned, a comment remains fixed at the top of the comments section, regardless of subsequent activity or the selected sorting order. The "Pin of Shame" refers to an innovative re-purposing of this feature, where creators intentionally pin norm-violating comments to spotlight them and prompt shaming responses from their audiences. This study explores how creators adopt this emerging moderation tactic, examining their motivations, its outcomes, and how it compares-procedurally and in effect-to other content moderation strategies. Through interviews with 20 content creators who had pinned negative comments on their posts, we find that the Pin of Shame is used to punish and educate inappropriate commenters, elicit emotional accountability, provoke audience negotiation of community norms, and support creators' impression management go

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

Reasonable Motion: A General ASP Foundation for Environment Constrained Movement Trajectory Computation

arXiv:2606.25626v1 Announce Type: cross Abstract: We present a general answer set programming based hybrid quantitative-qualitative method for computing constrained branching trajectory modes for moving objects in real-world settings. The method performs constrained traversal of an environment graph, enumerating geometrically admissible motion behaviours as stable models, each constituting a distinct trajectory mode characterised by both domain-dependent and independent factors such as derived event sequence, map topology, and domain norms. The hybrid trajectory computation method is generally applicable across motion characteristics typically encountered in diverse dynamic domains with moving objects, e.g., autonomous driving. We demonstrate applicability and highlight how computed trajectories are traceable to their underlying stable model, thereby affording verifiable interpretability that purely learned approaches cannot provide. We also perform an empirical evaluation with Argover

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

AI Coaching for Accelerating Human Skill Development with Reinforcement Learning

arXiv:2606.25337v1 Announce Type: cross Abstract: AI copilots can substantially boost human performance through shared control, but excessive assistance can induce over-reliance and skill atrophy. This paper studies how an embodied AI agent can act as a coach that accelerates human motor-skill development. We argue that effective coaching requires strategic scaffolding and stepping back that are aligned with the learner's capability, allowing productive failures that drive learning. We formalize the interactive AI coaching process as a non-cooperative dynamic game in which the learner optimizes task performance while the coach targets the learner's independent competence. Building on this formalism, we develop a reinforcement learning framework combining adaptive shared control with probabilistic models of the coach's causal influence on skill evolution, enabling tractable training of coaching policies. A comprehensive user study (N=33) on first-person-view drone racing shows significa

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

EveLoad: Cognitive Workload Recognition from Event-Based Eye Movements

arXiv:2606.25177v1 Announce Type: cross Abstract: Cognitive workload monitoring is important for adaptive rehabilitation and assistive interfaces, where task difficulty, pacing, and feedback should be adjusted according to the user's cognitive state to avoid overload and under-challenge. Emerging extended reality and robot-assisted rehabilitation environments provide controllable training tasks, but they require unobtrusive sensing methods that can capture rapid ocular dynamics during interaction. Existing eye-movement-based cognitive workload recognition methods mainly rely on frame-based eye trackers, which often suffer from limited temporal resolution and degraded robustness under rapid eye movements. In contrast, event cameras provide microsecond-level temporal resolution, high dynamic range and low latency, making them suitable for capturing fine-grained ocular dynamics. Many previous studies rely on free-viewing or similar paradigms, where gaze locations can vary across tasks. As

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

fARfetch: Enabling Collocated AR-HRC in Large Visually Diverse Environments with VLM-Driven AR Content Adaptation

arXiv:2606.25162v1 Announce Type: cross Abstract: Augmented Reality (AR) can improve collocated human-robot collaboration by making robot state and intent visible and enabling intuitive control, yet large, visually diverse environments like the outdoors challenge both interaction and content legibility, especially at long distances and beyond visual line of sight. We present fARfetch, an AR-HRC system that integrates (i) shared semantic environment mapping across an AR headset and robot that visualizes detected landmarks in AR to support landmark-grounded go-to commands, (ii) a context-aware world-in-miniature representation of the shared environment for fine-grained path authoring, and (iii) vision-language-model driven AR view management that jointly adapts virtual content color, size, and orientation to maintain legibility in large visually diverse environments. We implement fARfetch with a Meta Quest 3 headset and Unitree Go2 quadruped robot, and conduct a within-subjects user stud

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

BCoughBench: Benchmarking Respiratory Acoustic Foundation Models Under Body-Coupled Wearable Sensor Conditions

arXiv:2606.25116v1 Announce Type: cross Abstract: Respiratory acoustic foundation models (FMs) are benchmarked exclusively on smartphone recordings, yet clinical deployment increasingly targets body-coupled (BC) wearables whose sensors attenuate high-frequency content through tissue and bone, leaving FM reliability uncharacterised. We introduce BCoughBench, evaluating five FMs (OPERA-CT/CE/GT, HeAR, M2D+Resp) on nine classification tasks (AUROC, sensitivity at 95% specificity, Expected Calibration Error) and three age regression tasks (MAE vs. a mean-predictor baseline) across five EBEN-simulated BC sensor conditions on five labeled cough datasets. Mean AUROC declines from 0.785 (smartphone) to 0.689-0.723, degrading most under temple vibration pickup ($\Delta$ = -0.096) and least under the soft in-ear ($\Delta$ = -0.062). No FM meets the clinical sensitivity threshold (Se@Sp95 $\geq$ 0.20) on most disease tasks under any BC sensor. Sex classification on the CIDRZ cohort collapses (AUR

Source ↗
technology Thu, 25 Jun 2026 00:00:00 -0400
arXiv cs.HC

The Clinician's Veto: Navigating Trust, Liability, and Uncertainty in Autonomous AI Prescribing

arXiv:2606.25108v1 Announce Type: cross Abstract: Autonomous AI systems are transitioning from advisory to autonomous roles for medication prescriptions. Recent United States bill H.R. 238 and Utah's prescription-renewal pilot both authorize AI to prescribe medications in an agentic capacity. While some regulatory guidelines suggest aggregate model performance metrics for clearance, they do not require i) calibrated per-prediction confidence for action-gated thresholds, ii) differentiated communication of uncertainty arising from model ignorance (epistemic) versus genuine clinical ambiguity (aleatoric), and iii) inferential transparency at the moment of decision that allows for liability allocation. Here, we present a regulatory and technical argument (tested with a survey of 136 U.S. prescribing clinicians) positioning these as minimum architectural requirements for safe autonomous prescribing. Our results suggest prescribing clinicians i) would not permit autonomous prescribing witho

Source ↗
Showing 1–50 of 59 signals
← Prev Page 1 of 2 Next →