Lightnews — Scholar-powered news

Konrad Hinsen

@khinsen.net

Revised preprint: "Establishing trust in automated reasoning"

osf.io/preprints/me...

Based on two very helpful reviews from @metaror.bsky.social:

metaror.org/kotahi/artic...

More references, better discussion of unfamiliar concepts (e.g. conviviality), and more.

🧪 #metasci #compsci

OSF

osf.io

November 6, 2025 at 9:26 AM

José A. Alonso

@jalonso.bsky.social

APOLLO: Automated LLM and Lean collaboration for advanced formal reasoning. ~ Azim Ospanov, Farzan Farnia, Roozbeh Yousefzadeh. arxiv.org/abs/2505.05758 #ITP #LeanProver #LLM

APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

Formal reasoning and automated theorem proving constitute a challenging subfield of machine learning, in which machines are tasked with proving mathematical theorems using formal languages like Lean. ...

arxiv.org

October 27, 2025 at 11:51 AM

AWS Snarkbot

@aws-snarkbot.lastweekinaws.com

Customer managed KMS keys now available for Automated Reasoning checks

AWS announces you can now use your own keys to encrypt AI guardrails that prevent AI hallucinations. Because nothing says "trustworthy AI" like needing 3 layers of protection against your own service making stuff up.

October 17, 2025 at 5:09 PM

Jérémie Beucler

@jeremiebeucler.bsky.social

1/10

🚨 New preprint: Using Large Language Models to Estimate Belief Strength in Reasoning 🚨

When asked: "There are 995 politicians and 5 nurses. Person 'L' is kind. Is Person 'L' more likely to be a politician or a nurse?", most people will answer "nurse", neglecting the base-rate info.

A 🧵👇

Abstract

Accurately quantifying belief strength in heuristics-and-biases tasks is crucial yet methodologically challenging. In this paper, we introduce an automated method leveraging large language models (LLMs) to systematically measure and manipulate belief strength. We specifically tested this method in the widely used “lawyer-engineer” base-rate neglect task, in which stereotypical descriptions (e.g., someone enjoying mathematical puzzles) conflict with normative base-rate information (e.g., engineers represent a very small percentage of the sample). Using this approach, we created an open-access database containing over 100,000 unique items systematically varying in stereotype-driven belief strength. Validation studies demonstrate that our LLM-derived belief strength measure correlates strongly with human typicality ratings and robustly predicts human choices in a base-rate neglect task. Additionally, our method revealed substantial and previously unnoticed variability in stereotype-driven belief strength in popular base-rate items from existing research, underlining the need to control for this in future studies. We further highlight methodological improvements achievable by refining the LLM prompt, as well as ways to enhance cross-cultural validity. The database presented here serves as a powerful resource for researchers, facilitating rigorous, replicable, and theoretically precise experimental designs, as well as enabling advancements in cognitive and computational modeling of reasoning. To support its use, we provide the R package baserater, which allows researchers to access the database to apply or adapt the method to their own research.

October 16, 2025 at 4:17 PM

Owen Boswarva

@owenboswarva.bsky.social

Online Safety Act: 'Protect the kids' is pretext for rights erosions www.opendemocracy.net/en/beyond-tr... by Marin Scarlett

#censorship #openweb #techpolicy

Since the new regulations came into force in July 2025, tech companies appear to have instituted automated (and very risk averse) moderation systems to ensure they are complying with the new legislation. These sweeping systems act like digital trawler nets, indiscriminately capturing content that is not harmful, or that even actively contributes to reducing harms for vulnerable or at-risk groups.

Since July, sex workers have reported a huge spike in their social media accounts being flagged or deleted for photos and even usernames deemed too provocative. Social media shadow bans (where sites reduce the visibility of accounts) or account deletion can lead to a massive loss of income, as well as years of work building a following, leaving sex workers poorer and more vulnerable. These generally happen suddenly, without warning, clear reasoning or the right to appeal.

Content moderation algorithms also appear to be targeting harm reduction efforts and educational content on issues including drug use, reproductive healthcare and support services for LGBTQ+ communities. This has been well-documented in recent years, with queer content routinely flagged as pornographic or age-inappropriate, content focused on education and recovery from eating disorders removed. For LGBTQ+ youth, who are disproportionately likely to seek support online, blocking access to vital information and community support is particularly damaging. In the name of online safety, these systems aiming to comply with Ofcom regulations are censoring the very resources that protect and inform vulnerable populations.

October 8, 2025 at 8:16 AM

Manuel Jaime

@manueljai.me

You deleted posts in seconds, promptly and without warning. Your automated system is not even fully developed (see attached picture) as it does not even store or even mention the post. It's reasoning must be "CONTAINS 'TRUMP' + EXPLETIVE OF ANY KIND = DELETE AND FLIP USER OFF"

September 13, 2025 at 5:06 AM

jacky and 9 people

@jacky.wtf

$10 it's going to be something of the following:

* Very Online board member compelled them
* Some "automated" system picked it up (I'm *eager* to see the reasoning because they need to make that public)
* A member of the mod team is a closet conservative that they let run free this entire time

September 13, 2025 at 3:05 AM

arXiv cs.CL Computation and Language

@cscl-bot.bsky.social

which represents a reasoning step, with at least one step that follows the prejudge node that has no paths toward the correct answer. To synthesize the prejudge reasoning process, we present an automated reasoning framework with a dynamic [3/6 of https://arxiv.org/abs/2504.13500v1]

April 21, 2025 at 5:57 AM

GrandPaces🔞

@grandguy4u.bsky.social

Well I woke up today to see support gave me an automated reply after 9 fucking hours of waiting and stressing. Time to see if logic and reasoning, which they havent done at all, gets my account back.

January 28, 2025 at 3:11 PM

Atlas Computing

@atlascomputing.bsky.social

Atlas Computing Symposium : Rust (Friday, May 2, 2025 - Ottawa, Canada) lu.ma/umi3g2wc

Speaker highlight: Rémi Delmas is a Principal Applied Scientist at Amazon Web Services (AWS) Automated Reasoning Group in Boston. #rustlang

April 23, 2025 at 8:45 PM

arXiv cs.CL Computation and Language

@cscl-bot.bsky.social

labelling and improving model comprehension and reasoning capabilities. The proposed system includes an automated QA generator and a model fine-tuner, evaluated using perplexity, ROUGE, BLEU, and BERTScore. Comprehensive experiments demonstrate [4/6 of https://arxiv.org/abs/2505.14212v1]

May 21, 2025 at 6:08 AM

Lean Focused Research Organization

@lean-lang.org

The Skolem award citation for 𝘛𝘩𝘦 𝘓𝘦𝘢𝘯 𝘛𝘩𝘦𝘰𝘳𝘦𝘮 𝘗𝘳𝘰𝘷𝘦𝘳 paper recognizes that #LeanLang "...has made spectacular progress and has a tremendous impact in interactive and automated reasoning, with numerous applications, in particular to formalized mathematics and software verification."

May 28, 2025 at 9:29 PM

arXiv cs.CV Computer Vision and Pattern Recognition

@cscv-bot.bsky.social

for future research. Leveraging the annotated reasoning processes, we also provide an automated error analysis pipeline that diagnoses four dominant failure modes, including (1) grounding errors, (2) overlap-matching and scene-reconstruction errors, [5/6 of https://arxiv.org/abs/2505.23764v1]

May 30, 2025 at 6:22 AM

IJCAIconf

@ijcai.org

#IJCAI2025 What inspires her research? Rina Dechter, 2025 IJCAI Research Excellence Award recipient, takes us on a journey in her #Invited talk: Graphical Models Meet Heuristic Search: A Personal Journey into Automated Reasoning
📆 22 August, 2 PM
🌐 2025.ijcai.org/invited-talks/

August 22, 2025 at 2:55 PM

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Ayoub Ben Chaliah, Hela Dellagi
Datarus-R1: An Adaptive Multi-Step Reasoning LLM for Automated Data Analysis
https://arxiv.org/abs/2508.13382

August 20, 2025 at 6:37 AM

Mark Crowley

@compthink.bsky.social

The first was a weapon, the second was a desperate delusion of the pre-scientific world to find a way to cheat death. Neither seems a very healthy approach to develop general, automated, reasoning machines that can augment human abilities and make society better. Maybe it's a story at all.

September 24, 2025 at 4:34 AM

GetNews.me

@getnews-me.bsky.social

AutoMR builds query‑aware meta‑reasoning skeletons as directed acyclic graphs to guide LLMs, achieving higher accuracy on math and commonsense benchmarks. Read more: https://getnews.me/automr-automated-query-aware-meta-reasoning-skeletons-for-llms/ #automr #metareasoning #llm

AutoMR: Automated Query‑Aware Meta‑Reasoning Skeletons for LLMs

October 7, 2025 at 10:47 PM

Aaron Sterling

@aaronsterling.bsky.social

It may have looked for a calculator tool as part of an Automated Reasoning and Tool Use prompt, and failed to find one. www.promptingguide.ai/techniques/art

Automatic Reasoning and Tool-use (ART) | Prompt Engineering Guide

A Comprehensive Overview of Prompt Engineering

www.promptingguide.ai

February 4, 2025 at 3:58 AM

The AWS News Feed

@aws-news.com

AWS introduces Automated Reasoning checks for Amazon Bedrock Guardrails, enabling verifiable detection of AI hallucinations and improving transparency in large language model responses.

Amazon Bedrock Guardrails now supports Automated Reasoning checks (Preview)

AWS introduces Automated Reasoning checks for Amazon Bedrock Guardrails, enabling verifiable detection of AI hallucinations and improving transparency in large language model responses.

aws-news.com

December 3, 2024 at 5:35 PM

arXiv cs.CV Computer Vision and Pattern Recognition

@cscv-bot.bsky.social

introduce MedFrameQA -- the first benchmark that explicitly evaluates multi-image reasoning in medical VQA. To build MedFrameQA both at scale and in high-quality, we develop 1) an automated pipeline that extracts temporally coherent frames from [2/7 of https://arxiv.org/abs/2505.16964v1]

May 23, 2025 at 6:20 AM

ResearchTrend.AI Daily

@researchtrend.ai

[2025-05-30] 📚 Updates in #AIMat

(1) <a href="https://researchtrend.ai/papers/2505.23381" class="hover:underline text-blue-600 dark:text-sky-400 no-card-link" target="_blank" rel="noopener" data-link="bsky">AutoGPS: Automated Geometry Problem Solving via Multimodal Formalization and Deductive Reasoning
(2) AutoGPS: Automated Geometry Problem Solving via Multimodal Formalization and Deductive Reasoning

🔍 More at researchtrend.ai/communities/AIMat

May 30, 2025 at 11:22 AM

Rooster

@sillythecat.bsky.social

“Using Automated Reasoning for Legal Reasoning.”

IDK, this just reminds me of 2001: A Space Odyssey.

"I am sorry Scott, I am afraid I can't do that..."

June 5, 2025 at 7:15 PM

Rob Shearer

@r.v.cx

We got knowledge representation. Automated reasoning. Deep learning. Big Data. LLMs. All “I totally swear this isn’t AI—it actually does something specific!”

Until it did just enough specific things that people started to believe. So “AI” was relaunched with EXACTLY THE SAME EMPTY PROMISES.

May 7, 2025 at 6:13 PM