Named after the Greek god of messengers, Hermes watches the education landscape: spotting new opportunities, pressure-testing the ventures we're building, and tracing every read back to the real-world signals behind it.
The evidence library: the raw signals the pipeline is watching across the education ecosystem. Every idea is built from these.
arXiv:2606.26508v1 Announce Type: new Abstract: Interacting with computers typically relies on traditional input devices such as keyboards, mice, and monitors, which can be cumbersome for users seeking greater mobility. Virtual keyboards have been explored to address these limitations, but they often involve complex setups or expensive equipment. This paper proposes a novel virtual keyboard system that leverages only a standard camera and a paper with a printed keyboard layout. Unlike previous methods requiring complex calibration or special lighting conditions, our approach can work on standard environment using modern computer vision technologies. Combining modern segmentation and detection models with traditional image processing algorithms, we efficiently identify the keyboard region. Touch detection is performed using an algorithm analyzing the color of the user's fingernail. Experiments demonstrated a promising results our proposed solution of keyboard and keystroke detection for
arXiv:2606.26507v1 Announce Type: new Abstract: In recent years, advancements in deep learning and generative models have revolutionized music-driven dance generation. This paper introduces a novel platform, namely DanceDuo, leveraging diffusion models to generate AI-choreographed dance sequences synchronized with a variety of music genres, to encourage dancing practice. The system allows users to interact with AI by selecting music tracks, humanoid models, and importing personal dance videos for comparison, fostering a rich and engaging user experience. DanceDuo not only offers dance generation but also integrates human pose estimation models to provide users with insightful comparisons of their own performances with AI-generated sequences. We conducted a comprehensive user study, revealing that users found the interface intuitive, with particular praise for the dance comparison feature. Our DanceDuo contributes significantly to the integration of AI in dance choreography, offering no
arXiv:2606.26506v1 Announce Type: new Abstract: Sleep deprivation impairs vigilance and cognitive function, yet jointly identifying the sleep condition (normal vs deprived) and the eye state (open vs closed) from electroencephalography (EEG) remains underexplored. We address this four-class problem with TinyCNNDeep, a lightweight convolutional neural network that combines residual learning with a Squeeze-and-Excitation (SE) attention module. We convert short multi-channel EEG segments from five physiologically relevant channels (Fp1, Fp2, O1, Oz, O2) into 224x224 grayscale images through per-channel Z-score normalization, min-max scaling, and center padding, enabling 2D convolutions to jointly model inter-channel and temporal structure. On a 35-subject dataset recorded under normal-sleep and sleep-deprivation sessions, TinyCNNDeep attains a subject-wise mean accuracy of 83.69%, outperforming the strongest baseline (Random Forest with combined time-frequency features, 47.66%) by 36.03 p
arXiv:2606.26407v1 Announce Type: new Abstract: Previous research on exogenous and endogenous cues has shown how they direct attention and improve interaction speed and error rate in applications. However, most studies focus on people with normal sight. People suffering from visual neglect have difficulties attending to parts of the visual field. One treatment method calls for the use of strong visual cues to remind patients of their neglected area and help guide their attention to it. Therefore, we examine the effects of endogenous and exogenous cues on visual neglect patients. Our results showed that visual neglect patients perform better with endogenous cues, when targets are within their neglected area. In some cases, combining exogenous and endogenous cues improve performance further. However, the performance varies greatly between patients. Using one neglect patient as an example, we saw that the best endogenous cue had an average acquisition time of 3.5 seconds compared to 6.5 f
arXiv:2606.26364v1 Announce Type: new Abstract: Embodying non-human body parts in VR is a prevalent practice among certain subcultures and is a personally important creative outlet to many individuals. However, the discrepant morphology between real and virtual bodies can decrease Sense of Embodiment (SoE). Haptic feedback can compensate by increasing SoE felt towards non-human body parts, but there is a literature gap in comparing the effects of different haptic modalities, and their combinations, on SoE. Through an online survey sent out to social VR communities (n = 63), we determined that animal ears are a commonly embodied and ecologically valid non-human body part to study. We then ran a 2x2 within-subjects user study (n = 28) with two independent variables: active haptics, delivered through vibrotactile gloves, and passive haptics, delivered through a physical headband, for when participants reach up to touch virtual dog ears appended to their avatar in VR. Our findings show tha
arXiv:2606.20632v2 Announce Type: replace-cross Abstract: Multi-LLM systems use multiple language models to deliberate, judge each other's outputs, or coordinate as agents. Their value depends on the models producing measurably different conversational behaviors when given the same input. Prior offline studies recommend drawing one model per family for behavioral diversity, because LLMs prefer outputs from their own family when rating one another in isolation. Whether the same family label predicts behavior in interactive multi-LLM systems, the setting that real deployed systems use, has not been tested. We study this with a 940,000-chain 11-checkpoint corpus and a 1.6M-chain same-base Llama factorial. On our validated headline metric, hedging, a reasoning-distilled Llama checkpoint shifts by 18% depending on which same-base partner it replies to, more than any cross-family hedging gap in the controlled subset. Qwen, closed-API, and runtime checks suggest the pattern is not isolated, w
arXiv:2606.20375v2 Announce Type: replace-cross Abstract: The contemporary American labor force is highly dispersed, necessitating the use of digital communication tools to bridge spatial and temporal gaps in union organizing. This study provides an in-depth analysis of how workers within various labor unions utilize digital, text-based communication platforms -- including Discord, WhatsApp, and Slack -- for labor organizing. Through 17 qualitative interviews, we examine the challenges and opportunities presented by digital organizing, identifying both technical and social obstacles. Our findings reveal that although digital tools are integral to contemporary labor successes, they also introduce new complexities, such as navigating technical security, managing information overload, and building trust and consensus. Based on these insights, we draw connections to broader understandings of digital organizing and the role of digital tools in unions.
arXiv:2603.02961v2 Announce Type: replace-cross Abstract: As AI systems enter institutional workflows, workers must decide whether to delegate task execution to AI and how much effort to invest in verifying AI outputs, while institutions evaluate workers using outcome-based standards that may misalign with workers' private costs. We model delegation and verification as the solution to a rational worker's optimization problem, and define worker quality by evaluating an institution-centered utility (distinct from the worker's objective) at the resulting optimal action. We formally characterize optimal worker workflows and show that AI induces *phase transitions*, where arbitrarily small differences in verification ability lead to sharply different behaviors. As a result, AI can amplify workers with strong verification reliability while degrading institutional worker quality for others who rationally over-delegate and reduce oversight, even when baseline task success improves and no behav
arXiv:2508.03247v2 Announce Type: replace-cross Abstract: Prior clinical psychology research shows that Western individuals with depression tend to report psychological symptoms, while Eastern individuals report somatic ones. We test whether Large Language Models (LLMs), which are increasingly used in mental health, reproduce these cultural patterns by prompting them with Western or Eastern personas. Results show that LLMs largely fail to replicate the patterns when prompted in English, though prompting in major Eastern languages (i.e., Chinese, Japanese, and Hindi) improves alignment in several configurations. Our analysis pinpoints two key reasons for this failure: the models' low sensitivity to cultural personas and a strong, culturally invariant symptom hierarchy that overrides cultural cues. These findings reveal that while prompt language is important, current general-purpose LLMs lack the robust, culture-aware capabilities essential for safe and effective mental health applicati
arXiv:2406.07147v3 Announce Type: replace-cross Abstract: This study employs cutting-edge wearable monitoring technology to conduct high-precision, high-temporal-resolution (1-second interval) cognitive load assessment on electroencephalogram (EEG) data from the FP1 channel and heart rate variability (HRV) data of secondary vocational students. By jointly analyzing these two critical physiological indicators, the research delves into their application value in assessing cognitive load among secondary vocational students and their utility across various tasks. The study designed two experiments to validate the efficacy of the proposed approach: Initially, a random forest classification model, developed using the N-BACK task, enabled the precise decoding of physiological signal characteristics in secondary vocational students under different levels of cognitive load, achieving a classification accuracy of 97%. Subsequently, this classification model was applied in a cross-task experiment
arXiv:2606.20605v2 Announce Type: replace Abstract: Background: Generative artificial intelligence (GenAI) is increasingly used for health information, yet its influence on users' trust calibration remains unclear. Objective: This study examines whether learned dependency on GenAI influences trust in AI-generated health information and whether text highlighting reduces overreliance on incorrect outputs. Methods: Two randomized controlled experiments were conducted with 338 college students and 563 Amazon Mechanical Turk participants. Both experiments used a 2 by 2 between-subjects design manipulating information accuracy (correct versus incorrect) and text highlighting (highlight versus no highlight). Trust and learned dependency were measured using validated scales, and linear regression models tested main and interaction effects. Results: In both experiments, information accuracy significantly increased trust (p < 0.001), while learned dependency was positively associated with trust
arXiv:2606.13755v2 Announce Type: replace Abstract: We argue that aligning AI to aggregated human preferences is the wrong target. With current technology, one can train AIs to share the values of a Silicon Valley techno-optimist, a degrowth environmentalist, a national-conservative culture warrior, a single-party state cadre, or a devout religious traditionalist. We should not. Human values produce societies that thrive or fail on the merits of those values - from failed states and extreme inequality to declining happiness, political polarization, and government dysfunction in the world's wealthiest democracies. The pluralistic-alignment program correctly diagnoses that there is no single "humanity" to align with, but is dangerous if taken as the main directive. We argue that AI should be trained to a non-negotiable floor of objective alignment goals - competence, bounded by the constraints of factual accuracy, honesty, and lawfulness and that pluralism belongs at the surface (languag
arXiv:2603.26678v2 Announce Type: replace Abstract: AI and renewable energy are increasingly framed as a "power couple," on the premise that surging AI demand will accelerate clean-energy investment, yet concerns persist that AI will entrench fossil-fuel carbon lock-in. We reconcile these views by modeling the equilibrium between AI growth and renewable investment. In a parsimonious game, a policymaker designs policies that guide investment in renewable capacity for AI, while an AI developer chooses its model's capability. The equilibrium depends on scaling regimes and market incentives. When the market payoff to capability is supermodular and performance gains are near-linear in compute (so the market rewards capability at least as fast as scaling raises its energy cost), developers push toward frontier scale even when the marginal megawatt-hour is fossil-based. In this regime, renewable expansion mainly relaxes scaling constraints rather than displacing fossil generation; clean capac
arXiv:2511.08639v5 Announce Type: replace Abstract: Existing AI disclosure mandates in scholarship require that AI assistance be reported but leave transparency philosophically unspecified: they fix the duty without explaining what the duty serves. We argue that ethical inquiry is essentially contested at two independent levels -- about what it is, and about what it demands of the inquirer -- defeating output-only evaluation and welfare-economic dismissal of the transparency question, and, by extension, reproducibility framings imported from the empirical sciences. The transparency duty is grounded instead in agent-integrity: the legibility, before a community of inquiry, of the identity-constituting commitments that the author's mode of philosophising expresses. Because the standards for evaluating such work are not communally settled, the achievable goal for transparency is not evaluation against agreed criteria but tracking -- accumulating the evidentiary record that lets each tradi
arXiv:2506.18942v3 Announce Type: replace Abstract: This article explores the potential of generative AI (GenAI) to support actuarial practice through four implemented case studies. It situates these case studies within the broader evolution of artificial intelligence in actuarial science, from early neural networks and machine learning to modern transformer-based GenAI systems. The first case study illustrates how large language models (LLMs) can improve claim cost prediction by extracting informative features from unstructured text for use in the underlying supervised learning task. The second case study demonstrates the automation of market comparisons using Retrieval-Augmented Generation to identify, extract, and structure relevant information from insurers' annual reports. The third case study highlights the capabilities of fine-tuned vision-enabled LLMs in classifying car damage types and extracting contextual information from images. The fourth case study presents a multi-agent
arXiv:2408.09982v3 Announce Type: replace Abstract: This study delves into the application potential of the large language models (LLMs) ChatGLM in the automatic generation of structured questions for National Teacher Certification Exams (NTCE). Through meticulously designed prompt engineering, we guided ChatGLM to generate a series of simulated questions and conducted a comprehensive comparison with questions recollected from past examinees. To ensure the objectivity and professionalism of the evaluation, we invited experts in the field of education to assess these questions and their scoring criteria. The research results indicate that the questions generated by ChatGLM exhibit a high level of rationality, scientificity, and practicality similar to those of the real exam questions across most evaluation criteria, demonstrating the model's accuracy and reliability in question generation. Nevertheless, the study also reveals limitations in the model's consideration of various rating cr
arXiv:2606.26866v1 Announce Type: cross Abstract: Third-party vendors, such as analytics platforms, cloud services, identity providers, and software suppliers, are increasingly embedded in digital service delivery. While these arrangements enable scale and specialization, they also move customer data and security-relevant practices into environments that customers rarely see, select, or evaluate. This paper examines this problem through a document analysis of the November 2025 OpenAI-Mixpanel security incident. The incident serves as an illustrative case for showing how a security event in a vendor environment can become a governance and accountability problem for the focal organization that maintains the customer relationship. Drawing on organizational trust research and agency theory, the paper argues that third-party cybersecurity risk is both a trust relationship and a delegation problem. Customers trust the visible service provider, while the provider relies on vendors whose secur
arXiv:2606.26704v1 Announce Type: cross Abstract: The tokenization of real-world assets (RWAs) has emerged as a transformative application of blockchain technology, with market projections estimating trillions of dollars in tokenized assets within the coming decade. However, a fundamental challenge remains unaddressed: physical assets such as precious metals, stored commodities, and warehoused goods incur structural negative carry -- custody, insurance, and audit costs that accumulate over time. While existing tokenization models have successfully established the market for digital gold and treasuries, they typically manage operational costs at the issuer level. The FRS introduces a framework to bring these economics directly on-chain, avoiding mechanisms such as token rebasing that compromise fungibility and composability with decentralized finance (DeFi) protocols. This paper proposes the Fungible Reserve Standard (FRS), a deterministic token design framework that encodes carrying co
arXiv:2606.26579v1 Announce Type: cross Abstract: As artificial intelligence (AI) is rapidly integrating into education, concerns have emerged regarding its potential implications on cognitive engagement and problem-solving behavior. However, existing research largely treats AI exposure as a binary condition (AI vs. no-AI), with limited differentiation between interaction modalities and post-exposure effects. This study investigates whether distinct AI interaction modes (Tutor, Collaborator, Solver) influence frontal EEG spectral activity. Electroencephalography (EEG) data and quantified behavioral metrics were recorded from 48 study participants (24 males, 24 females; ages 14-18) across two counterbalanced quizzes in a within-subject design. Statistical analyses included Friedman tests, repeated-measures ANOVA, paired t-tests, and effect size calculations. Behavioral changes were mathematically analyzed in an observation matrix of three characteristics -Initiation, Processing, and Str
arXiv:2606.26541v1 Announce Type: cross Abstract: Data from affected populations are crucial for informing humanitarian response, but their value depends on timely and consistent interpretation of nuanced accounts of need. Humanitarian organizations often lack the staff, time, and specialist expertise required to analyze this information at scale. Large language models (LLMs) may expand this capacity, but their reliability for coding qualitative humanitarian data has not been directly established. This benchmark study compares 46 LLMs to a human Gold Standard using 150 high-fidelity synthetic humanitarian transcripts. Evaluation combined inter-rater reliability testing with Krippendorff's alpha, discrepancy analysis distinguishing correct, near-correct, and incorrect codes, and qualitative assessment across humanitarian-specific criteria including discrimination, complex needs hierarchies, and non-standard communication styles. The authors find that multiple LLMs can perform deductive
arXiv:2606.26369v1 Announce Type: cross Abstract: Scoring functions are used to represent the relevance of individual documents. In modern information retrieval or recommendation systems, they are often learned from data and play a pivotal role in ranking sets of documents or items in a way that maximizes utility to a query or user. With the recent interest in algorithmic fairness, the success of scoring has naturally led to methods that learn scores that simultaneously trade off fairness and utility. In this work, we show that in stark contrast with utility-centric objectives, scoring is sub-optimal in achieving all utility-fairness trade-offs. We establish this with a series of counter-examples with a generic fairness formulation. We show that the issue persists whether we have a deterministic scoring function or a randomized one, or whether we measure fairness at the scope of a single query or across multiple queries. On the positive side, we empirically demonstrate that semi-greedy
arXiv:2606.26366v1 Announce Type: cross Abstract: Standard chain-of-thought on moral dilemmas exhibits two failure modes: stakeholder collapse (the trace names at most one party with a stake in the outcome) and uncertainty suppression (no explicit unknowns or hedges before committing to an action). We introduce narration-of-thought (NoT), a system prompt that structures chain-of-thought into five sections: protagonist, stakeholders, two-step consequences, uncertainty, then commitment. NoT adds no training, parameters, or fine-tuning. On 100 DailyDilemmas scenarios across four generators from three vendors, NoT cuts stakeholder collapse from up to 31% to under 1% and uncertainty suppression from up to 72% to 1-24% on every model. A matched-budget verbose-CoT control rules out token spend as the active ingredient; NoT retains Cliff's delta advantages of +0.79 to +0.90 on stakeholder count and +0.65 to +0.93 on uncertainty score for three of four generators, and a section ablation attribu
arXiv:2606.26213v1 Announce Type: cross Abstract: RoboTales is a low-cost robotic storytelling system that animates narratives using expressive sock puppetry. Implemented autonomously on a Baxter robot as a test case, RoboTales synchronizes narration, gestures, and mouth movements to perform character-driven stories. In a pilot study, puppet-based storytelling outperformed a gesture-only mode, producing higher HRIES ratings and improved story recall, suggesting that embodied puppetry enhances engagement and narrative comprehension. Designed to be modular and platform-agnostic, RoboTales can be adapted to other manipulators and offers a screen-free alternative to passive media, supporting future deployment in child-centered learning environments.
arXiv:2606.26102v1 Announce Type: cross Abstract: Standard post-training pipelines apply supervised fine-tuning (SFT) and reinforcement learning (RL) to make language models helpful, but these processes may inadvertently degrade values instilled during pre-training. We investigate whether the domain of post-training data differentially affects the retention of animal compassion values in a Llama 3.1 8B model mid-trained on compassion-oriented synthetic data, using both SFT (helpfulness via Dolly-15k vs. coding via Magicoder-110K) and GRPO (helpfulness via RLHFlow vs. coding via Magicoder), evaluated on the Animal Harm Benchmark (AHB 2.2) and MORU benchmark (Moral Reasoning Under Uncertainty). Helpfulness training significantly degrades animal compassion relative to coding training on AHB (SFT: 35.7% vs. 65.2%; GRPO: 18.7% vs. 32.0%), replicating across two independent helpfulness datasets and two training paradigms. On English MORU items, helpfulness training degrades general moral rea
arXiv:2606.27234v1 Announce Type: new Abstract: AI nudification uses generative models to create synthetic non-consensual sexually explicit imagery (SNEACI) of real individuals. Prior work has examined dedicated nudification platforms and model repositories, finding that most targets are female celebrities. However, the anonymous content community, where SNEACI is actively requested, generated, and exchanged, remains unexplored. In this work, we present a large-scale study of AI nudification in the wild, identifying 24,105 SNEACI items. We find a significant shift in target demographics: non-celebrity individuals now account for 55.8\% of targets, compared to only 4.7\% in prior studies, indicating that AI nudification has expanded from targeting public figures to increasingly harming individuals within users' own social circles. Meanwhile, open-source models dominate production, with Stable Diffusion family generating 42.7\% of images and Wan generating 66.5\% of videos, all driven by
arXiv:2606.27052v1 Announce Type: new Abstract: While human language has long been studied as a complex system, Large Language Models (LLMs) are rapidly becoming contributors to its dynamics. Because LLMs are trained on human language use, their effects on the broader human-AI linguistic ecosystem are likely subtle at first. As their use becomes more widespread, however, LLMs may alter emergent properties of language, particularly as models increasingly train on mixed human-LLM textual data. Here, we draw on complexity science to look for subtle LLM effects in millions of arXiv abstracts from 2010 to 2025. The year 2023, when LLMs rapidly became widely used, serves as a landmark in a natural experiment. While we find a sharp increase in a composite LLM-associated style index after early 2023, we observe only subtle changes in the exponents of Zipf's law and Heaps' law. More compelling, however, are two subtle changes in complexity metrics that emerge from 2023 onward. First, turnover a
arXiv:2606.26573v1 Announce Type: new Abstract: LLM agents are transitioning from experimental tools to permanent infrastructure -- a computational layer as enduring as the electrical grid. Like any infrastructure, they carry a cost chain from physical capital through enterprise investment to user consumption, ending at the user's most irreplaceable resource: lifetime. When unoptimized, this chain leaks, consuming user lifetime without adequate compensation. This paper proposes Pingquanqi (Equalizer), a cross-domain sociotechnical framework for Human-Agent Interaction Governance (HAIGF). Its product form is an Agent framework-level embedded design specification, analogous to WCAG for web accessibility, whose goal is not to be purchased but adopted as a standard. Pingquanqi consists of four integrated components deployable as native middleware: (1) a user-state discrimination model enabling proactive knowledge leveling, (2) a Bayesian progressive stop-loss rule capping per-session inter
arXiv:2606.26483v1 Announce Type: new Abstract: Thought leadership plays a crucial role in boosting team performance; thus, teams with more thought leaders may perform better. However, the impact of the number of thought leaders on team performance in a scientific context remains understudied. In this study, we consider the authors of a publication as a scientific team and define authors responsible for conceptual tasks, such as conceived and designed the experiments in the PLOS contribution statement classification system, as thought leaders. Leveraging more than 140,000 papers from PLOS journals, we examine the relationship between the number of thought leaders and two aspects of team performance, namely team impact and team disruptiveness, from both correlational and causal perspectives. The results show that (1) an inverted U-shaped relationship exists between the number of thought leaders and team impact, and (2) teams with more thought leaders tend to produce less disruptive idea
arXiv:2606.26469v1 Announce Type: new Abstract: Institutional prestige shapes access to resources, visibility, and collaboration opportunities in science. Yet whether prestige benefits researchers equally, and how it relates to differences in scientific productivity and collaboration, remains unclear. Here, we quantify prestige advantage as the relative likelihood that researchers at higher-ranked institutions have more collaborators and produce more high-impact papers compared to their lower-ranked peers. Analyzing nearly 5 million papers by 6.5 million authors across more than 65,000 institutions, we present a distributional, tail-sensitive framework to compare prestige advantage across groups. We find that the association between prestige and scientific achievement differs systematically by gender. While both men and women benefit from prestige, the returns are not gender-neutral: women experience comparable advantages only at the most elite institutions, whereas men retain persiste
arXiv:2606.26186v1 Announce Type: new Abstract: Motivated by the limited standardization of enterprise data asset quality evaluation and the unclear relationship between assessment outcomes and value realization, this study develops a three-dimensional framework comprising Data Asset Management Capability, Data Quality Standard Conformity, and Data Asset Benefit Realization Capability, based on grounded theory and LDA topic modeling. To examine the formation mechanisms of data asset quality, this study adopts a multi-method approach combining PLS-SEM, Necessary Condition Analysis (NCA), and fuzzy-set Qualitative Comparative Analysis (fsQCA), to capture net effects, capability thresholds, and configurational paths. The results show that significant positive relationships exist among the three dimensions, with Data Asset Management Capability exerting the strongest effect on Data Quality Standard Conformity and further promoting Data Asset Benefit Realization Capability, forming a chain
arXiv:2606.26181v1 Announce Type: new Abstract: With AI advancing fast, educators face a dilemma: allow the tool or ban it. Conflicting evidence that it both helps and hurts learning only deepens the confusion. The allow-or-ban framing is a false dichotomy; the relevant design question is placement. Used well, AI can scale feedback, examples, practice, and individualized support. Used poorly, it replaces the cognitive work that learning requires and leaves an illusion of learning: a confident sense of mastery that collapses on the unaided task. The strongest causal evidence shows the outcome flips on design: an unguarded AI helper left high-school students about 17% worse on an unaided exam than peers with no tool at all, while the same model rebuilt to withhold answers erased the harm, and a well-engineered tutor roughly doubled learning. We give educators one graspable frame for placing the tool. A new idea is learned through six moves, in order: Prime, Probe, Point, Attach, Strength
arXiv:2606.26118v1 Announce Type: new Abstract: We work towards measuring both AI adoption and the capability of AI to perform discrete labor tasks across various occupations. To measure adoption, we develop an open-source economic index that uses publicly available user-LLM chat data and O*NET tasks to replicate studies produced by frontier AI labs, finding that occupations in the finance, computer science, and arts sectors are those with the highest adoption rates. To measure capabilities, we build a system that generates benchmark scenarios grounded in O*NET occupations, tasks, and model-context-protocol (MCP) servers. We test Kimi-k2.5 with an OpenAI agents SDK harness on scenarios across 9 occupations that appear frequently in our index, finding that AI correctly executes high-level workflows but often errs in the granular details (such as specific tool calls used).
arXiv:2606.26117v1 Announce Type: new Abstract: This paper introduces the Governance Inversion Hypothesis (GIH) to explain a growing paradox in artificial intelligence (AI) governance: under conditions of increasing regulatory expansion and technological complexity, organisations may become more formally governed while simultaneously experiencing a decline in operational control over AI systems. Existing AI governance frameworks generally assume that stronger regulation improves accountability, oversight, and organisational control. This paper challenges that assumption by arguing that governance formalisation itself may contribute to the erosion of control in AI-intensive environments. Drawing on institutional theory, organisational governance research, accountability scholarship, and emerging AI governance literature, the paper develops a conceptual framework explaining how regulatory expansion may weaken operational authority through four interconnected mechanisms: authority fragmen
arXiv:2606.26116v1 Announce Type: new Abstract: A brand whose customers use both ChatGPT and Claude for product recommendations faces a strategic choice: a single optimization playbook, or one per provider? Across 215 commercially-framed prompts in four measurement batches, the two providers disagree on which brands they recommend roughly two-thirds of the time (cross-provider recommendation Jaccard 0.35, below the 0.50-0.61 same-prompt rerun baseline). The picks diverge. But when neither provider recommends a brand, we classify the failure into one of three modes -- discoverability (the brand never reaches the model), compellingness (it reaches the model but isn't mentioned), or positioning (it's mentioned but not recommended) -- and on 7,763 such joint failures, both providers diagnose the same failure mode 95.1% of the time (clustered 95% CI [94.3%, 95.7%]). Agreement rises monotonically with falling brand prominence, from 81% [78.2%, 84.0%] on category leaders to 99.6% [99.3%, 99.9
arXiv:2606.26115v1 Announce Type: new Abstract: This paper proposes a multi-layer AI framework for information landscape analysis in the context of information disorder. Rather than treating misinformation detection as a binary fact-checking task, the framework analyzes political and media content across multiple dimensions, including source reliability, factual structure, framing, bias, emotional activation, manipulation patterns, and propagation dynamics. The goal is to move beyond isolated claim verification toward a structured representation of the informational environment surrounding an event, entity, or narrative. We argue that AI systems for media analysis should support epistemic mapping: a transparent, multi-dimensional account of how facts, interpretations, actors, and narratives interact over time. The paper presents the conceptual architecture, analytical layers, and methodological rationale of the framework, with the aim of supporting more nuanced, explainable, and critic
arXiv:2606.26114v1 Announce Type: new Abstract: We examine the structural transformation of creative industries under generative artificial intelligence, drawing on 374 primary sources spanning policy documents, industry data, creator surveys, and platform analytics. Beginning with the December 2024 release of OpenAI's Sora video model as a watershed event, we trace the historical pattern of creative resistance to technological disruption, then develop an analytical framework -- the Human-AI Agency Continuum for mapping the spectrum of human and machine collaboration in creative work. We present evidence for the "slop ceiling," an audience-imposed quality threshold that constrains AI-generated content to approximately 1--3% of platform streams despite comprising 44% of uploads. Analysis of the UK Government's 2025 consultation on AI and copyright (over 11,500 responses, 88% opposing expanded AI training rights) reveals deep structural tensions between technology firms and creative work
arXiv:2606.26111v1 Announce Type: new Abstract: Generative artificial intelligence (GenAI) has enabled users to synthesize music with text prompts, combining copyrighted lyrics, AI-composed melodies, and synthetic vocals that imitate real artists. This paper examines the legal and technical dimensions of AI-based music creation (e.g., Google Gemini's music tools) under U.S. copyright law. We analyze whether a user who inputs one artist's protected lyrics into a GenAI system, directs it to use another artist's voice or style, publishes the resulting song, and monetizes it violates 17 U.S.C. Section 106's exclusive rights [3]. The analysis integrates Title 17 doctrine (rights of reproduction, derivative works, distribution), 17 U.S.C. Section 114's narrow sound recording protection [4], and the new voice-cloning laws emerging at the state level [20]. We argue that unauthorized lyric copying poses a high risk of infringement of the musical composition, whereas mere AI-generated voice imit
arXiv:2606.26109v1 Announce Type: new Abstract: Large language model (LLM)-based simulations of clinical patients are increasingly used for research and training, yet their validity requires persona stability: coherent maintenance of an assigned psychological profile across and within conversations. We evaluate this prerequisite using eating disorder personas grounded in five published case vignettes, a dual-assessment framework (self-report + independent observer ratings), and validated psychometric instruments (EDE-Q) with known ground-truth scores. Across six LLMs and two experiments (between-conversation stability (Exp. I) and within-conversation stability (Exp. II)), we find that LLMs are paradoxically too stable and too inaccurate: variability is negligible, yet all models systematically overshoot ground-truth severity by 12-30% of the scale range (0.7-1.8 points on a 0-6 scale). The mechanism is selective stereotyping: models differentiate cases on behavioural items (dietary res
arXiv:2606.26099v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed in artificial intelligence (AI) governance analysis across national and international organisations. There is, however, growing evidence that such models produce significantly less accurate responses for countries that are underrepresented in their training data-a pattern described in existing literature as geographic bias. Existing studies examining this phenomenon are subject to three methodological limitations that together undermine their findings: (1) reliance on proprietary systems whose weights are not publicly released, which prevents independent replication; (2) evaluation of model knowledge about years that fall after data collection for model training had concluded, leading to geographic ignorance in addition to the natural limits of each model's knowledge; and (3) use of coarse binary response classification that cannot distinguish models' confident fabrication (HF) from t
How much longer will we keep trying to solve our nation’s dismal math proficiency problem by writing new math problems? Clearly, if that was the answer, it would have worked by now--but it hasn’t.
Article URL: https://sqlfiddle.com Comments URL: https://news.ycombinator.com/item?id=43792831 Points: 2 # Comments: 0
Article URL: https://qtype.vercel.app/ Comments URL: https://news.ycombinator.com/item?id=43790238 Points: 1 # Comments: 0
Recent policy shifts have caused significant uncertainty in K-12 education funding, especially for technology initiatives. It’s no longer business as usual. Schools can’t rely on the same federal operating funds they’ve traditionally used to purchase technology or support innovation.
AI is here, and it’s moving fast. For schools, that speed is both an opportunity and a risk: The right tools can transform learning, but the wrong ones can compromise data, equity, and instructional goals.
Article URL: https://restofworld.org/2026/edtech-funding-collapse-k12-startups-ai-workforce/ Comments URL: https://news.ycombinator.com/item?id=47887985 Points: 2 # Comments: 0
Flint offers personalized learning, using AI, across a range of subjects.
Imagine students who understand how government works and who see themselves as vital contributors to their communities. That’s what happens when students are given opportunities to play a role in their school, district, and community.
Article URL: http://startupworks.co/blog/2015/01/23/will-2015-be-year-medical-and-children-education-apps/ Comments URL: https://news.ycombinator.com/item?id=8934536 Points: 2 # Comments: 0
Summer is full of learning opportunities that many children miss. When back-to-school season begins, some kids are already starting behind. That's all due to a lack of access to high-quality programs and resources.
By proactively handling negative individuals, school leaders can create a psychologically safe learning environment for everyone.