EdTech Discovery
Hermes

An instrument for spotting the next edtech opportunity — generated ideas, each traced to the real-world signals behind it.

Updated Jun 24, 2026 · 10 ideas · 1624 signals
Admin mode — curation controls visible. Keep this URL (with token) private.

Signals

The evidence library — the raw signals the pipeline is watching across the education ecosystem. Every idea is built from these.

technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Same Scrutiny, More Time: Eye Tracking Insights into Reviewing LLM-Labelled Code

arXiv:2606.26505v1 Announce Type: cross Abstract: Modern software development increasingly involves the use of large language models (LLMs) to generate code. Despite their rapid advancement, LLMs remain prone to errors and hallucinations, emphasizing the importance of careful code inspection. However, in practice, developers' trust in LLM-generated code and their willingness to review it thoroughly may differ from these recommendations. How developers actually behave when reviewing LLM-generated code remains largely unexplored. In this study, we conduct a Wizard-of-Oz experiment to examine how software engineers behave when code is explicitly labeled as LLM-generated during a code review task. We collect both behavioral data and participant feedback through eye-tracking and exit interviews. Combining Bayesian data analysis with qualitative analysis, we found that while the thoroughness of code review did not change for participants, they spent more time fixating on LLM-labelled code, i

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Utilizing Cognitive Signals Generated during Human Reading to Enhance Keyphrase Extraction from Microblogs

arXiv:2606.26485v1 Announce Type: cross Abstract: Microblogging platforms generate massive amounts of short, noisy, and dispersed user content, making automatic keyphrase extraction (AKE) an important but challenging task. Prior studies have used eye-tracking signals to improve microblog-based AKE because such signals reflect readers' attention to salient words. However, eye tracking alone is limited by physiological, acquisition, and feature-decoding constraints. To address this issue, we investigate whether electroencephalogram (EEG) signals can complement eye-tracking signals for AKE. Using the ZuCo cognitive language processing corpus, we select 8 EEG features and 17 eye-tracking features and incorporate them into microblog-based AKE models. To reduce possible distortion of cognitive signals by model structures, we inject these features into the input of the soft-attention layer and the query vectors of the self-attention layer. We then evaluate different combinations of cognitive

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Charting the Growth of Social-Physical HRI (spHRI): A Systematic Review Pipeline Augmented by Small Language Models

arXiv:2606.26382v1 Announce Type: cross Abstract: Social-physical human-robot interaction (spHRI) has grown rapidly across robotics, human-computer interaction, human-robot interaction, and haptics. Yet, fragmented terminology and inconsistent methodologies make systematic synthesis difficult. To support scalable review practices, we evaluated the extent to which small language models (SLMs; < 1.5B parameters) can assist with title and abstract screening for a large spHRI systematic review. While no SLMs matched human reviewers' performance, the models operated locally and screened papers orders of magnitude faster. The combined SLM ensemble identified 39 papers reviewers missed, representing 10.29% of the final relevant dataset. These results demonstrate that SLMs can augment, rather than replace, expert reviewers and make large-scale literature reviews accessible and sustainable.

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

AI Healthcare Chatbots as Information Infrastructure: A Large-Scale Study of User-Reported Breakdowns

arXiv:2606.27302v1 Announce Type: new Abstract: AI healthcare chatbots are increasingly used to support health information seeking and self-management, yet their performance and impact on users remains to be studied. This study examines over 15,000 user reviews from 59 AI healthcare chatbot apps to explore how these systems function in everyday informational and emotional contexts. Topic modeling and interpretive analysis identify three recurring breakdowns: access barriers and service unreliability, user experience and interaction quality, and billing and customer support issues. Privacy and security concerns are associated with the most negative experiences. By framing AI healthcare chatbots as information infrastructures, our findings highlight how failures in access, usability, and trust affect users, offering actionable insights for designers, policymakers, and information professionals aiming to improve digital health systems.

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Reading the Same Data Differently: Interpretive Labor Across System Boundaries in Electronic Monitoring

arXiv:2606.27301v1 Announce Type: new Abstract: Electronic monitoring (EM) systems are increasingly used in community corrections to enforce spatial, temporal, and behavioral rules through continuous sensing. While prior work has examined EM as a criminal justice tool or as a mechanism for compliance, less is known about how sensed data become meaningful in everyday practice. This poster examines EM as a dual-sided sensing system in which supervised individuals and authorities reason about the same data stream from different positions. Based on semi-structured interviews with 26 supervised individuals and 12 authorities in China's community corrections system, we show that supervised individuals infer system logic from outcomes with limited visibility into how data are interpreted, while authorities reconstruct behavior from ambiguous traces using contextual knowledge, professional experience, and institutional procedures. We call this structural divergence interpretive misalignment. I

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

"Everyone Says Them": Deception Typologies, Probabilistic Trust, and Grassroots Safety Knowledge Among Gay Dating App Users in China

arXiv:2606.27284v1 Announce Type: new Abstract: Gay dating applications have become critical platforms for sexual minority men to seek relationships and community, yet they also expose users to deceptive interactions that remain underexplored in HCI and CSCW research. This study examines how gay male users in China experience, identify, and respond to deception on dating applications. Through semi-structured interviews with 22 participants across platforms including Blued, Aloha, Fanka, and Soul, we make three contributions. First, we identify a typology of deceptive practices extending beyond profile misrepresentation to encompass relational, emotional, financial, and commercial forms of deception. Second, we document the layered, probabilistic verification strategies users develop through long-term platform use, showing that trust assessment operates as a multi-signal, provisional process rather than a binary judgment. Third, we demonstrate that risk recognition is a collaborative pr

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Behind the Mask: A Taxonomic Analysis of Activities in Online Social Networks

arXiv:2606.27111v1 Announce Type: new Abstract: The broadcast of disinformation in online social networks (OSN) is a growing concern examined across several disciplines, including human-computer interaction (HCI). The pervasive issue has been prompting novel approaches to identify the malicious actors behind the dissemination of deceptive and fabricated content. Analyzing the characteristics and activities of these actors, we designed a taxonomy informed by collaboration with subject matter experts (SMEs) and a review of the academic literature. Our study explores how to distinguish the characteristics, activities, and strategies of malicious actors on OSN and examines how they contribute to the spread of disinformation. We describe the design process and the application of the taxonomy in a case study analyzing anti-migration discourse in social media channels, and reflect on its potential to aid researchers and practitioners in the responsible design of network systems.

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Urban Context and Travel Experience Events: An Exploratory Comparison of Two German Cities

arXiv:2606.27077v1 Announce Type: new Abstract: The presented study investigates events influencing public transportation experience in both urban (Hamburg) and rural (Tuttlingen) areas in Germany, with the aim of identifying events that affect travel experience and as a result travel behavior. Using a mobile application, 21 participants in Tuttlingen and 70 participants in Hamburg tracked everyday trips, providing real-time evaluations of travel experiences along with situational data. Multi-level regression analyses were applied to assess the impact of events such as punctuality, capacity offer, information about public transportation and others on the ontrip experience. Results indicate that a sufficient public transportation capacity offer has the strongest positive effect in Tuttlingen, whereas a lack of punctuality and low personal well-being have the strongest negative effects. In Hamburg, a lack of punctuality and a negative information event have the largest impacts. These ide

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Floor Raiser or Ceiling Limiter? Differential Storytelling Outcomes with a Child-Centric GenAI System Across Individual Differences

arXiv:2606.27067v1 Announce Type: new Abstract: Generative AI (GenAI) holds promise for democratizing creative literacy, yet whether it benefits all children equally remains unclear. Using a child-centric GenAI storytelling system for children aged 7-12, we conducted a mixed-methods within-subjects experiment (N = 40, Grades 2-6) comparing GenAI-assisted and traditional storyboard conditions. Three findings emerged. First, the GenAI-assisted condition was associated with a floor-raising convergence pattern, with the quality gap narrowing by 83.5%, driven by lower-end support and upper-end constraint mechanisms. This convergence was dimension-selective, improving creativity and richness while leaving coherence and narrative structure tied to baseline performance. Second, younger children more often selected semantically distant keywords while older children preferred semantically closer ones, although engagement orientation varied across individuals regardless of age. Third, image regen

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

What Holds Back Brain-Computer Interfaces? Uncovering Challenges and Opportunities in BCI-controlled Games for Cerebral Palsy Rehabilitation

arXiv:2606.26951v1 Announce Type: new Abstract: Brain-computer interfaces (BCIs) offer promising avenues for cerebral palsy (CP) rehabilitation at home and in the clinic, using games that promote engagement and sustained training effort. Nonetheless, the design constraints of BCI-based CP rehabilitation remain unclear, especially how individuals with CP experience a sense of control through BCI, and how they experience computer-mediated game assistance. To address this gap, we present preliminary clinical and user perspectives on BCI-based CP rehabilitation, drawing on in-clinic insights from a CP therapist and experiential accounts from ten individuals with CP engaging with BCI game prototypes. Sporadic help in BCI games eased monotony, but also fostered doubts regarding agency. The therapist saw BCI rehabilitation as complementary to traditional training, facilitating the transition from playful exercises to autonomous, self-managed training. We outline key challenges and opportuniti

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Continuous Behavioral Synthesis for Adaptive Health Dashboards: An LLM-Mediated Architecture Integrating Explicit Preference, Spatial Reorganization, and Attention Allocation Signals

arXiv:2606.26937v1 Announce Type: new Abstract: The engineering of adaptive user interfaces has traditionally relied on either rule-based systems encoding designer intuitions about user needs or machine learning approaches requiring substantial historical data before achieving effective personalization. We present a technical architecture that leverages Large Language Models as behavioral synthesis engines to enable immediate adaptation from sparse, heterogeneous user signals. Our system integrates three distinct behavioral channels, i) explicit micro-feedback on individual interface elements, ii) spatial priority inferred from manual widget reorganization through drag-and-drop interaction, iii) and attentional investment measured through dwell time during hover events, within a structured prompt engineering framework that continuously regenerates dashboard layouts while maintaining explanatory coherence. The architecture addresses the technical challenge of translating low-level inter

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Game Changers: Designing and Measuring Dynamic Feedback To Help Users Self-Regulate in a VR Pointing Game

arXiv:2606.26925v1 Announce Type: new Abstract: The way games dynamically convey information through feedback is critical to players' ability to perform, learn, and improve. However, it is poorly understood how performance metrics impact player performance and perception in core game tasks like pointing or steering. With a virtual reality pointing task we systematically explored how three performance metrics driving the feedback affected players when rewarding short completion times, straight movements, or high peak speed. across different points in time - continuously, at end-of-action, or at end-of-task. On average the dynamic feedback helped people point more straight and faster, while for others it had small or opposite effect. The study quantitatively compared dynamic feedback across three forms with the metrics driving the form as the intended locus of quantitative comparison. Our work improves game designers basis for crafting dynamic feedback by helping them know when to employ

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Optimizing Human-Machine Interface for Real-Time AI Support in the Operating Room: the CVS Copilot

arXiv:2606.26886v1 Announce Type: new Abstract: Artificial intelligence (AI) systems for automated Critical View of Safety (CVS) assessment in laparoscopic cholecystectomy are nearing clinical translation. Beyond algorithmic performance, clinical safety and effectiveness depend on the quality of the human-machine interface (HMI). This work examines how AI-generated predictions should be presented and controlled intraoperatively. Seventeen surgeons, including residents, attending surgeons, and professors, took part in a mixed-methods, user-centered design study to optimize an intraoperative HMI for AI-assisted safe laparoscopic cholecystectomy. Interviews explored interaction modalities, timing of assistance, visualization strategies, and control mechanisms across surgical roles, and were analyzed using reflexive thematic analysis and human-factors heuristics. Most surgeons (16/17) supported the use of AI for intraoperative decision support while rejecting autonomous decision-making. At

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

MedSWFlow: An Open-Source LLM Workflow for Drafting Medical Social Work Case Plans

arXiv:2606.26884v1 Announce Type: new Abstract: We present MedSWFlow, an open-source, model-agnostic LLM workflow for drafting medical social work case plans. The framework translates professional case-planning tasks into six stages: assessment, problem analysis, goal setting, intervention planning, risk anticipation, and planned effect evaluation. Drawing on established social work and behavioral frameworks, MedSWFlow standardizes case inputs, builds structured case profiles, and generates reviewable assessment forms and service plans through staged prompting. The system is released as an open-source research framework for reproducible case-plan generation across LLM providers. Outputs are intended as practitioner-reviewed drafts rather than final service decisions. Source code: https://github.com/santhiyacw-droid/MedSWFlow/tree/main.

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

'A bit of chaos and madness': The AI Assessment Scale and the work of assessment reform

arXiv:2606.26729v1 Announce Type: new Abstract: Generative artificial intelligence (GenAI) has intensified pressure on universities to redesign assessment while maintaining integrity, equity, and validity. Structured frameworks such as the Artificial Intelligence Assessment Scale (AIAS) offer one response, but evidence of how staff experience their implementation remains limited. This qualitative study examines AIAS implementation at a private international university in Vietnam and a public university in the United Kingdom. Data from five focus groups with 30 academic staff were analysed using hybrid thematic analysis, with Critical AI Literacy used as a sensitising concept. Six themes were developed: recognising and integrating AI, facilitating conditions, building capacity, pathways to adoption, ethics in practice, and reframing pedagogy. Staff valued the AIAS as a shared language for legitimising GenAI use, clarifying boundaries, and prompting reflection on assessment design. Howev

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Modeling Adaptive Visual Search in Semantically Hierarchical Layouts

arXiv:2606.26725v1 Announce Type: new Abstract: This paper introduces a computational cognitive model to investigate how information grouping impacts visual search, a key consideration in user interface design. The model uses computational rationality to view user behavior as an adaptation to cognitive and task constraints. Our work highlights that humans use hierarchical task representations, exploiting semantic and visual structures to improve search efficiency within the constraints of the visual system. We validate this model with data from two human studies focused on visual search and semantic categorization, demonstrating that semantic grouping improves search performance when it aligns with spatial grouping. Our model replicates task durations and eye movement patterns. By improving understanding of how hierarchical memory structures are utilized in human cognition, the model extends previous visual search models. We showcase our model in the rapid prototyping and evaluation of

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

From Content to Strategy: Understanding the Motivations, Processes, and Impacts of AI-Guided Communication

arXiv:2606.26672v1 Announce Type: new Abstract: Artificial intelligence-mediated communication (AI-MC) is conceptualized as applying AI to augment or generate message content (Hancock et al., 2020). However, advances in generative AI have expanded its use beyond generating content to guiding individuals' communication strategies, that is, AI-guided communication, yet theoretical and empirical understandings of this emerging use pattern and its consequences remains limited. To address this gap, this study conducted 26 in-depth interviews with individuals who have used AI to develop their communication strategies. Findings suggest participants strongly preferred using AI to analyze challenging scenarios in close relationships, because it fostered self-reflection, eased emotions, prevented conflict escalation, offered multiple perspectives, and provided a safe, nonjudgmental space for self-disclosure. Participants also stated that AI-guided communication enhanced their empathy and communi

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Invisible Impact of Empathy on Behavioral Change: Isolating the Effect of Empathy in Long-term Physical Activity Coaching Chatbot Interactions

arXiv:2606.26641v1 Announce Type: new Abstract: Current dialogue systems, powered by large language models, often treat empathy as essential without assessing its true impact, especially in behavior change, where motivation and adherence often depend on subtle user-chatbot dynamics. We examine this assumption by building three WhatsApp physical-activity (PA) coaching chatbots that differ only in empathy level and evaluating them in a six-week within-subject study (N = 13). Participants struggled to distinguish between the empathy conditions, and the non-empathetic version was often rated as more engaging and useful. However, higher-empathy variants were still associated with a larger overall average increase in step counts and faster improvement in intention to follow advice. These results suggest empathy's role is nuanced: it may be hard for lay users to identify explicitly, but it can still shape motivation and trust that support sustained change. We interpret this pattern through th

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Reviving Reflection-in-Action: Instilling Designerly Thinking in AI-Supported Ideation through Multimodal Prompting

arXiv:2606.26626v1 Announce Type: new Abstract: Current AI-powered creativity support tools (AI-CSTs) primarily use text prompting to generate solution-oriented outputs. However, the potential value of multimodal prompting in designer-AI interaction, specifically the introduction of productive friction to encourage iteration and reflection, has not been fully explored. To address this, we developed SketchifAI, a prototype AI-CST, and evaluated it with design students. In a mixed-methods, within-participants study, we examined how different input modalities (text, sketch, and sketch-plus-tags) affected design students' perceived ability to express their intent, their perception of creativity support, and their divergent thinking performance. Our preliminary findings suggest that the sketch modality tended to enhance fluency, with inconclusive evidence for differences in variety, originality, or quality compared to text modality. Yet, paradoxically, participants showed a strong preferenc

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

HiLSVA: Design and Evaluation of a Human-in-the-Loop Agentic System for Scientific Visualization

arXiv:2606.26614v1 Announce Type: new Abstract: Large language model (LLM) agents enable natural language interaction for scientific visualization (SciVis). Still, prior systems have essentially prioritized autonomy over human analytical control, thereby limiting transparency and human oversight. We present HiLSVA, a human-in-the-loop agentic system that supports mixed-initiative SciVis workflows. HiLSVA integrates a plan-first multi-agent architecture with explicit human oversight, stepwise provenance tracking, and learn-at-test-time adaptation from user feedback. The system supports fluid handoff between humans and agents through both natural language and direct manipulation of visualizations, while sandboxed execution ensures safe, reproducible workflows. In doing so, HiLSVA reframes agentic SciVis as a collaborative process that augments, rather than replaces, human analytical reasoning. We evaluate HiLSVA through representative case studies and a controlled user study with twelve

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Co-Designing Community-Centered AI Education for Adults: A Midwestern Case Study

arXiv:2606.26565v1 Announce Type: new Abstract: Artificial Intelligence (AI) education is increasingly important, yet adults outside higher education receive less attention. We report a case study of an AI education session with 54 adults (48 in-person and 6 virtual) in a predominantly African American community on the east side of a major Midwestern city. We ask: "What does AI education for adults outside formal educational systems look like in practice?" and "What does this AI education session reveal about AI literacy at the community level?" Through a co-designed session developed with community partners, we found that concerns about AI persisted but shifted to specific, locally grounded questions about AI design and deployment. We also discuss AI literacy from a community capacity perspective and argue for AI literacy frameworks grounded in local community contexts that strengthen community capacity.

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Budget-Aware Keyboardless Interaction

arXiv:2606.26508v1 Announce Type: new Abstract: Interacting with computers typically relies on traditional input devices such as keyboards, mice, and monitors, which can be cumbersome for users seeking greater mobility. Virtual keyboards have been explored to address these limitations, but they often involve complex setups or expensive equipment. This paper proposes a novel virtual keyboard system that leverages only a standard camera and a paper with a printed keyboard layout. Unlike previous methods requiring complex calibration or special lighting conditions, our approach can work on standard environment using modern computer vision technologies. Combining modern segmentation and detection models with traditional image processing algorithms, we efficiently identify the keyboard region. Touch detection is performed using an algorithm analyzing the color of the user's fingernail. Experiments demonstrated a promising results our proposed solution of keyboard and keystroke detection for

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

DanceDuo: Bridging Human Movement and AI Choreography

arXiv:2606.26507v1 Announce Type: new Abstract: In recent years, advancements in deep learning and generative models have revolutionized music-driven dance generation. This paper introduces a novel platform, namely DanceDuo, leveraging diffusion models to generate AI-choreographed dance sequences synchronized with a variety of music genres, to encourage dancing practice. The system allows users to interact with AI by selecting music tracks, humanoid models, and importing personal dance videos for comparison, fostering a rich and engaging user experience. DanceDuo not only offers dance generation but also integrates human pose estimation models to provide users with insightful comparisons of their own performances with AI-generated sequences. We conducted a comprehensive user study, revealing that users found the interface intuitive, with particular praise for the dance comparison feature. Our DanceDuo contributes significantly to the integration of AI in dance choreography, offering no

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

TinyCNNDeep: Lightweight Attention-Based CNN for EEG Classification of Eye States and Sleep Deprivation

arXiv:2606.26506v1 Announce Type: new Abstract: Sleep deprivation impairs vigilance and cognitive function, yet jointly identifying the sleep condition (normal vs deprived) and the eye state (open vs closed) from electroencephalography (EEG) remains underexplored. We address this four-class problem with TinyCNNDeep, a lightweight convolutional neural network that combines residual learning with a Squeeze-and-Excitation (SE) attention module. We convert short multi-channel EEG segments from five physiologically relevant channels (Fp1, Fp2, O1, Oz, O2) into 224x224 grayscale images through per-channel Z-score normalization, min-max scaling, and center padding, enabling 2D convolutions to jointly model inter-channel and temporal structure. On a 35-subject dataset recorded under normal-sleep and sleep-deprivation sessions, TinyCNNDeep attains a subject-wise mean accuracy of 83.69%, outperforming the strongest baseline (Random Forest with combined time-frequency features, 47.66%) by 36.03 p

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Assistive Visual Cues for Visual Neglect Patients

arXiv:2606.26407v1 Announce Type: new Abstract: Previous research on exogenous and endogenous cues has shown how they direct attention and improve interaction speed and error rate in applications. However, most studies focus on people with normal sight. People suffering from visual neglect have difficulties attending to parts of the visual field. One treatment method calls for the use of strong visual cues to remind patients of their neglected area and help guide their attention to it. Therefore, we examine the effects of endogenous and exogenous cues on visual neglect patients. Our results showed that visual neglect patients perform better with endogenous cues, when targets are within their neglected area. In some cases, combining exogenous and endogenous cues improve performance further. However, the performance varies greatly between patients. Using one neglect patient as an example, we saw that the best endogenous cue had an average acquisition time of 3.5 seconds compared to 6.5 f

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.HC

Having Dog Ears "for Real": Effects of Active and Passive Haptics on Embodying Non-Human Body Parts in VR

arXiv:2606.26364v1 Announce Type: new Abstract: Embodying non-human body parts in VR is a prevalent practice among certain subcultures and is a personally important creative outlet to many individuals. However, the discrepant morphology between real and virtual bodies can decrease Sense of Embodiment (SoE). Haptic feedback can compensate by increasing SoE felt towards non-human body parts, but there is a literature gap in comparing the effects of different haptic modalities, and their combinations, on SoE. Through an online survey sent out to social VR communities (n = 63), we determined that animal ears are a commonly embodied and ecologically valid non-human body part to study. We then ran a 2x2 within-subjects user study (n = 28) with two independent variables: active haptics, delivered through vibrotactile gloves, and passive haptics, delivered through a physical headband, for when participants reach up to touch virtual dog ears appended to their avatar in VR. Our findings show tha

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Post-Training Recipe, More Than Model Family, Shapes Multi-Agent LLM Conversational Behavior

arXiv:2606.20632v2 Announce Type: replace-cross Abstract: Multi-LLM systems use multiple language models to deliberate, judge each other's outputs, or coordinate as agents. Their value depends on the models producing measurably different conversational behaviors when given the same input. Prior offline studies recommend drawing one model per family for behavioral diversity, because LLMs prefer outputs from their own family when rating one another in isolation. Whether the same family label predicts behavior in interactive multi-LLM systems, the setting that real deployed systems use, has not been tested. We study this with a 940,000-chain 11-checkpoint corpus and a 1.6M-chain same-base Llama factorial. On our validated headline metric, hedging, a reasoning-distilled Llama checkpoint shifts by 18% depending on which same-base partner it replies to, more than any cross-family hedging gap in the controlled subset. Qwen, closed-API, and runtime checks suggest the pattern is not isolated, w

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Organizing in the Digital Age: Understanding Community, Challenges, and Consequences in Digitally-facilitated Labor Organizing

arXiv:2606.20375v2 Announce Type: replace-cross Abstract: The contemporary American labor force is highly dispersed, necessitating the use of digital communication tools to bridge spatial and temporal gaps in union organizing. This study provides an in-depth analysis of how workers within various labor unions utilize digital, text-based communication platforms -- including Discord, WhatsApp, and Slack -- for labor organizing. Through 17 qualitative interviews, we examine the challenges and opportunities presented by digital organizing, identifying both technical and social obstacles. Our findings reveal that although digital tools are integral to contemporary labor successes, they also introduce new complexities, such as navigating technical security, managing information overload, and building trust and consensus. Based on these insights, we draw connections to broader understandings of digital organizing and the role of digital tools in unions.

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Delegation and Verification Under AI

arXiv:2603.02961v2 Announce Type: replace-cross Abstract: As AI systems enter institutional workflows, workers must decide whether to delegate task execution to AI and how much effort to invest in verifying AI outputs, while institutions evaluate workers using outcome-based standards that may misalign with workers' private costs. We model delegation and verification as the solution to a rational worker's optimization problem, and define worker quality by evaluating an institution-centered utility (distinct from the worker's objective) at the resulting optimal action. We formally characterize optimal worker workflows and show that AI induces *phase transitions*, where arbitrarily small differences in verification ability lead to sharply different behaviors. As a result, AI can amplify workers with strong verification reliability while degrading institutional worker quality for others who rationally over-delegate and reduce oversight, even when baseline task success improves and no behav

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Somatic in the East, Psychological in the West?: Investigating Clinically-Grounded Cross-Cultural Depression Symptom Expression in LLMs

arXiv:2508.03247v2 Announce Type: replace-cross Abstract: Prior clinical psychology research shows that Western individuals with depression tend to report psychological symptoms, while Eastern individuals report somatic ones. We test whether Large Language Models (LLMs), which are increasingly used in mental health, reproduce these cultural patterns by prompting them with Western or Eastern personas. Results show that LLMs largely fail to replicate the patterns when prompted in English, though prompting in major Eastern languages (i.e., Chinese, Japanese, and Hindi) improves alignment in several configurations. Our analysis pinpoints two key reasons for this failure: the models' low sensitivity to cultural personas and a strong, culturally invariant symptom hierarchy that overrides cultural cues. These findings reveal that while prompt language is important, current general-purpose LLMs lack the robust, culture-aware capabilities essential for safe and effective mental health applicati

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Wearable Device-Based Real-Time Monitoring of Physiological Signals: Evaluating Cognitive Load Across Different Tasks

arXiv:2406.07147v3 Announce Type: replace-cross Abstract: This study employs cutting-edge wearable monitoring technology to conduct high-precision, high-temporal-resolution (1-second interval) cognitive load assessment on electroencephalogram (EEG) data from the FP1 channel and heart rate variability (HRV) data of secondary vocational students. By jointly analyzing these two critical physiological indicators, the research delves into their application value in assessing cognitive load among secondary vocational students and their utility across various tasks. The study designed two experiments to validate the efficacy of the proposed approach: Initially, a random forest classification model, developed using the N-BACK task, enabled the precise decoding of physiological signal characteristics in secondary vocational students under different levels of cognitive load, achieving a classification accuracy of 97%. Subsequently, this classification model was applied in a cross-task experiment

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Trust in Generative AI for Health Information Consumption and the Effect of Learned Dependency: An Experimental Investigation

arXiv:2606.20605v2 Announce Type: replace Abstract: Background: Generative artificial intelligence (GenAI) is increasingly used for health information, yet its influence on users' trust calibration remains unclear. Objective: This study examines whether learned dependency on GenAI influences trust in AI-generated health information and whether text highlighting reduces overreliance on incorrect outputs. Methods: Two randomized controlled experiments were conducted with 338 college students and 563 Amazon Mechanical Turk participants. Both experiments used a 2 by 2 between-subjects design manipulating information accuracy (correct versus incorrect) and text highlighting (highlight versus no highlight). Trust and learned dependency were measured using validated scales, and linear regression models tested main and interaction effects. Results: In both experiments, information accuracy significantly increased trust (p < 0.001), while learned dependency was positively associated with trust

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Position: Align AI to Our Aspirations, Not Our Flaws

arXiv:2606.13755v2 Announce Type: replace Abstract: We argue that aligning AI to aggregated human preferences is the wrong target. With current technology, one can train AIs to share the values of a Silicon Valley techno-optimist, a degrowth environmentalist, a national-conservative culture warrior, a single-party state cadre, or a devout religious traditionalist. We should not. Human values produce societies that thrive or fail on the merits of those values - from failed states and extreme inequality to declining happiness, political polarization, and government dysfunction in the world's wealthiest democracies. The pluralistic-alignment program correctly diagnoses that there is no single "humanity" to align with, but is dangerous if taken as the main directive. We argue that AI should be trained to a non-negotiable floor of objective alignment goals - competence, bounded by the constraints of factual accuracy, honesty, and lawfulness and that pluralism belongs at the surface (languag

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Power Couple? AI Growth and Renewable Energy Investment

arXiv:2603.26678v2 Announce Type: replace Abstract: AI and renewable energy are increasingly framed as a "power couple," on the premise that surging AI demand will accelerate clean-energy investment, yet concerns persist that AI will entrench fossil-fuel carbon lock-in. We reconcile these views by modeling the equilibrium between AI growth and renewable investment. In a parsimonious game, a policymaker designs policies that guide investment in renewable capacity for AI, while an AI developer chooses its model's capability. The equilibrium depends on scaling regimes and market incentives. When the market payoff to capability is supermodular and performance gains are near-linear in compute (so the market rewards capability at least as fast as scaling raises its energy cost), developers push toward frontier scale even when the marginal megawatt-hour is fossil-based. In this regime, renewable expansion mainly relaxes scaling constraints rather than displacing fossil generation; clean capac

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

The Journal of Prompt-Engineered (Moral) Philosophy Or: Why AI-Assisted Ethics Research Requires Process Transparency

arXiv:2511.08639v5 Announce Type: replace Abstract: Existing AI disclosure mandates in scholarship require that AI assistance be reported but leave transparency philosophically unspecified: they fix the duty without explaining what the duty serves. We argue that ethical inquiry is essentially contested at two independent levels -- about what it is, and about what it demands of the inquirer -- defeating output-only evaluation and welfare-economic dismissal of the transparency question, and, by extension, reproducibility framings imported from the empirical sciences. The transparency duty is grounded instead in agent-integrity: the legibility, before a community of inquiry, of the identity-constituting commitments that the author's mode of philosophising expresses. Because the standards for evaluating such work are not communally settled, the achievable goal for transparency is not evaluation against agreed criteria but tracking -- accumulating the evidentiary record that lets each tradi

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Advanced Applications of Generative AI in Actuarial Science: Case Studies Beyond ChatGPT

arXiv:2506.18942v3 Announce Type: replace Abstract: This article explores the potential of generative AI (GenAI) to support actuarial practice through four implemented case studies. It situates these case studies within the broader evolution of artificial intelligence in actuarial science, from early neural networks and machine learning to modern transformer-based GenAI systems. The first case study illustrates how large language models (LLMs) can improve claim cost prediction by extracting informative features from unstructured text for use in the underlying supervised learning task. The second case study demonstrates the automation of market comparisons using Retrieval-Augmented Generation to identify, extract, and structure relevant information from insurers' annual reports. The third case study highlights the capabilities of fine-tuned vision-enabled LLMs in classifying car damage types and extracting contextual information from images. The fourth case study presents a multi-agent

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Application of Large Language Models in Automated Question Generation: A Case Study on ChatGLM's Structured Questions for National Teacher Certification Exams

arXiv:2408.09982v3 Announce Type: replace Abstract: This study delves into the application potential of the large language models (LLMs) ChatGLM in the automatic generation of structured questions for National Teacher Certification Exams (NTCE). Through meticulously designed prompt engineering, we guided ChatGLM to generate a series of simulated questions and conducted a comprehensive comparison with questions recollected from past examinees. To ensure the objectivity and professionalism of the evaluation, we invited experts in the field of education to assess these questions and their scoring criteria. The research results indicate that the questions generated by ChatGLM exhibit a high level of rationality, scientificity, and practicality similar to those of the real exam questions across most evaluation criteria, demonstrating the model's accuracy and reliability in question generation. Nevertheless, the study also reveals limitations in the model's consideration of various rating cr

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Fortress and Gatekeeper: Theorizing Transitive Trust in Third-Party Cybersecurity Risk Governance

arXiv:2606.26866v1 Announce Type: cross Abstract: Third-party vendors, such as analytics platforms, cloud services, identity providers, and software suppliers, are increasingly embedded in digital service delivery. While these arrangements enable scale and specialization, they also move customer data and security-relevant practices into environments that customers rarely see, select, or evaluate. This paper examines this problem through a document analysis of the November 2025 OpenAI-Mixpanel security incident. The incident serves as an illustrative case for showing how a security event in a vendor environment can become a governance and accountability problem for the focal organization that maintains the customer relationship. Drawing on organizational trust research and agency theory, the paper argues that third-party cybersecurity risk is both a trust relationship and a delegation problem. Customers trust the visible service provider, while the provider relies on vendors whose secur

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

The Fungible Reserve Standard: A Deterministic Framework for Encoding Carrying Costs in Asset-Backed Tokens

arXiv:2606.26704v1 Announce Type: cross Abstract: The tokenization of real-world assets (RWAs) has emerged as a transformative application of blockchain technology, with market projections estimating trillions of dollars in tokenized assets within the coming decade. However, a fundamental challenge remains unaddressed: physical assets such as precious metals, stored commodities, and warehoused goods incur structural negative carry -- custody, insurance, and audit costs that accumulate over time. While existing tokenization models have successfully established the market for digital gold and treasuries, they typically manage operational costs at the issuer level. The FRS introduces a framework to bring these economics directly on-chain, avoiding mechanisms such as token rebasing that compromise fungibility and composability with decentralized finance (DeFi) protocols. This paper proposes the Fungible Reserve Standard (FRS), a deterministic token design framework that encodes carrying co

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

An exploratory behavioral and electroencephalographic study of artificial intelligence-assisted learning modes in high school students

arXiv:2606.26579v1 Announce Type: cross Abstract: As artificial intelligence (AI) is rapidly integrating into education, concerns have emerged regarding its potential implications on cognitive engagement and problem-solving behavior. However, existing research largely treats AI exposure as a binary condition (AI vs. no-AI), with limited differentiation between interaction modalities and post-exposure effects. This study investigates whether distinct AI interaction modes (Tutor, Collaborator, Solver) influence frontal EEG spectral activity. Electroencephalography (EEG) data and quantified behavioral metrics were recorded from 48 study participants (24 males, 24 females; ages 14-18) across two counterbalanced quizzes in a within-subject design. Statistical analyses included Friedman tests, repeated-measures ANOVA, paired t-tests, and effect size calculations. Behavioral changes were mathematically analyzed in an observation matrix of three characteristics -Initiation, Processing, and Str

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Can Large Language Models Reliably Code Qualitative Humanitarian Data? A Benchmark Study Against Human Expert Adjudication

arXiv:2606.26541v1 Announce Type: cross Abstract: Data from affected populations are crucial for informing humanitarian response, but their value depends on timely and consistent interpretation of nuanced accounts of need. Humanitarian organizations often lack the staff, time, and specialist expertise required to analyze this information at scale. Large language models (LLMs) may expand this capacity, but their reliability for coding qualitative humanitarian data has not been directly established. This benchmark study compares 46 LLMs to a human Gold Standard using 150 high-fidelity synthetic humanitarian transcripts. Evaluation combined inter-rater reliability testing with Krippendorff's alpha, discrepancy analysis distinguishing correct, near-correct, and incorrect codes, and qualitative assessment across humanitarian-specific criteria including discrimination, complex needs hierarchies, and non-standard communication styles. The authors find that multiple LLMs can perform deductive

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Scoring Is Not Enough: Addressing Gaps in Utility-fairness Trade-offs for Ranking

arXiv:2606.26369v1 Announce Type: cross Abstract: Scoring functions are used to represent the relevance of individual documents. In modern information retrieval or recommendation systems, they are often learned from data and play a pivotal role in ranking sets of documents or items in a way that maximizes utility to a query or user. With the recent interest in algorithmic fairness, the success of scoring has naturally led to methods that learn scores that simultaneously trade off fairness and utility. In this work, we show that in stark contrast with utility-centric objectives, scoring is sub-optimal in achieving all utility-fairness trade-offs. We establish this with a series of counter-examples with a generic fairness formulation. We show that the issue persists whether we have a deterministic scoring function or a randomized one, or whether we measure fairness at the scope of a single query or across multiple queries. On the positive side, we empirically demonstrate that semi-greedy

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Narration-of-Thought: Inference-Time Scaffolding for Defeasible Ethical Reasoning in Large Language Models

arXiv:2606.26366v1 Announce Type: cross Abstract: Standard chain-of-thought on moral dilemmas exhibits two failure modes: stakeholder collapse (the trace names at most one party with a stake in the outcome) and uncertainty suppression (no explicit unknowns or hedges before committing to an action). We introduce narration-of-thought (NoT), a system prompt that structures chain-of-thought into five sections: protagonist, stakeholders, two-step consequences, uncertainty, then commitment. NoT adds no training, parameters, or fine-tuning. On 100 DailyDilemmas scenarios across four generators from three vendors, NoT cuts stakeholder collapse from up to 31% to under 1% and uncertainty suppression from up to 72% to 1-24% on every model. A matched-budget verbose-CoT control rules out token spend as the active ingredient; NoT retains Cliff's delta advantages of +0.79 to +0.90 on stakeholder count and +0.65 to +0.93 on uncertainty score for three of four generators, and a section ablation attribu

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

RoboTales: ROBOTic Anthropomorphic LEarning Systems

arXiv:2606.26213v1 Announce Type: cross Abstract: RoboTales is a low-cost robotic storytelling system that animates narratives using expressive sock puppetry. Implemented autonomously on a Baxter robot as a test case, RoboTales synchronizes narration, gestures, and mouth movements to perform character-driven stories. In a pilot study, puppet-based storytelling outperformed a gesture-only mode, producing higher HRIES ratings and improved story recall, suggesting that embodied puppetry enhances engagement and narrative comprehension. Designed to be modular and platform-agnostic, RoboTales can be adapted to other manipulators and offers a screen-free alternative to passive media, supporting future deployment in child-centered learning environments.

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Helpfulness Hurts: Domain-Dependent Degradation of Mid-Trained Compassion Values Under Post-Training

arXiv:2606.26102v1 Announce Type: cross Abstract: Standard post-training pipelines apply supervised fine-tuning (SFT) and reinforcement learning (RL) to make language models helpful, but these processes may inadvertently degrade values instilled during pre-training. We investigate whether the domain of post-training data differentially affects the retention of animal compassion values in a Llama 3.1 8B model mid-trained on compassion-oriented synthetic data, using both SFT (helpfulness via Dolly-15k vs. coding via Magicoder-110K) and GRPO (helpfulness via RLHFlow vs. coding via Magicoder), evaluated on the Animal Harm Benchmark (AHB 2.2) and MORU benchmark (Moral Reasoning Under Uncertainty). Helpfulness training significantly degrades animal compassion relative to coding training on AHB (SFT: 35.7% vs. 65.2%; GRPO: 18.7% vs. 32.0%), replicating across two independent helpfulness datasets and two training paradigms. On English MORU items, helpfulness training degrades general moral rea

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

From Celebrities to Anyone: Characterizing AI Nudification Content, Technology, and Community Dynamics on 4chan

arXiv:2606.27234v1 Announce Type: new Abstract: AI nudification uses generative models to create synthetic non-consensual sexually explicit imagery (SNEACI) of real individuals. Prior work has examined dedicated nudification platforms and model repositories, finding that most targets are female celebrities. However, the anonymous content community, where SNEACI is actively requested, generated, and exchanged, remains unexplored. In this work, we present a large-scale study of AI nudification in the wild, identifying 24,105 SNEACI items. We find a significant shift in target demographics: non-celebrity individuals now account for 55.8\% of targets, compared to only 4.7\% in prior studies, indicating that AI nudification has expanded from targeting public figures to increasingly harming individuals within users' own social circles. Meanwhile, open-source models dominate production, with Stable Diffusion family generating 42.7\% of images and Wan generating 66.5\% of videos, all driven by

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Human--LLM Collaboration Is Transforming Complexity Metrics in Scientific Texts

arXiv:2606.27052v1 Announce Type: new Abstract: While human language has long been studied as a complex system, Large Language Models (LLMs) are rapidly becoming contributors to its dynamics. Because LLMs are trained on human language use, their effects on the broader human-AI linguistic ecosystem are likely subtle at first. As their use becomes more widespread, however, LLMs may alter emergent properties of language, particularly as models increasingly train on mixed human-LLM textual data. Here, we draw on complexity science to look for subtle LLM effects in millions of arXiv abstracts from 2010 to 2025. The year 2023, when LLMs rapidly became widely used, serves as a landmark in a natural experiment. While we find a sharp increase in a composite LLM-associated style index after early 2023, we observe only subtle changes in the exponents of Zipf's law and Heaps' law. More compelling, however, are two subtle changes in complexity metrics that emerge from 2023 onward. First, turnover a

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Pingquanqi (Equalizer): A Cross-Domain Sociotechnical Framework for Human-Agent Interaction Governance

arXiv:2606.26573v1 Announce Type: new Abstract: LLM agents are transitioning from experimental tools to permanent infrastructure -- a computational layer as enduring as the electrical grid. Like any infrastructure, they carry a cost chain from physical capital through enterprise investment to user consumption, ending at the user's most irreplaceable resource: lifetime. When unoptimized, this chain leaks, consuming user lifetime without adequate compensation. This paper proposes Pingquanqi (Equalizer), a cross-domain sociotechnical framework for Human-Agent Interaction Governance (HAIGF). Its product form is an Agent framework-level embedded design specification, analogous to WCAG for web accessibility, whose goal is not to be purchased but adopted as a standard. Pingquanqi consists of four integrated components deployable as native middleware: (1) a user-state discrimination model enabling proactive knowledge leveling, (2) a Bayesian progressive stop-loss rule capping per-session inter

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

Do more heads imply better performance? An empirical study of team thought leaders' impact on scientific team performance

arXiv:2606.26483v1 Announce Type: new Abstract: Thought leadership plays a crucial role in boosting team performance; thus, teams with more thought leaders may perform better. However, the impact of the number of thought leaders on team performance in a scientific context remains understudied. In this study, we consider the authors of a publication as a scientific team and define authors responsible for conceptual tasks, such as conceived and designed the experiments in the PLOS contribution statement classification system, as thought leaders. Leveraging more than 140,000 papers from PLOS journals, we examine the relationship between the number of thought leaders and two aspects of team performance, namely team impact and team disruptiveness, from both correlational and causal perspectives. The results show that (1) an inverted U-shaped relationship exists between the number of thought leaders and team impact, and (2) teams with more thought leaders tend to produce less disruptive idea

Source ↗
technology Fri, 26 Jun 2026 00:00:00 -0400
arXiv cs.CY

The Tilted Playing Field for Women in Science

arXiv:2606.26469v1 Announce Type: new Abstract: Institutional prestige shapes access to resources, visibility, and collaboration opportunities in science. Yet whether prestige benefits researchers equally, and how it relates to differences in scientific productivity and collaboration, remains unclear. Here, we quantify prestige advantage as the relative likelihood that researchers at higher-ranked institutions have more collaborators and produce more high-impact papers compared to their lower-ranked peers. Analyzing nearly 5 million papers by 6.5 million authors across more than 65,000 institutions, we present a distributional, tail-sensitive framework to compare prestige advantage across groups. We find that the association between prestige and scientific achievement differs systematically by gender. While both men and women benefit from prestige, the returns are not gender-neutral: women experience comparable advantages only at the most elite institutions, whereas men retain persiste

Source ↗
Showing 601–650 of 681 signals
← Prev Page 13 of 14 Next →