Lightnews — Scholar-powered news

Aaron Tay

@aarontay.bsky.social

Looking like gpt4o maybe mini work. I've been playing with pdf to infographics since DALLE3 days, can recognise it

Uhh,,, Figure 1 shows you... what exactly? Trying to understand Medical fryrmbial, runctitional features and mum's legs going through concrete.

This whole article is a bit of a disaster. And it's very difficult to find other published work for the author. Strange! 🧪

(via @smutclyde.bsky.social)

Text says: This section provides a comprehensive description of the methodology employed in this study to develop an explainable XAI framework for ASD diagnosis and parental support. The proposed framework integrates the TabPFNMix regressor, a state-of-the-art machine learning (ML) model for tabular data, with SHapley Additive exPlanations (SHAP) to ensure both high predictive accuracy and interpretability. The methodology is structured into five subsections: (1) Dataset Description, (2) TabPFNMix Overview, (3) SHAP for Explainability, (4) Framework Design, and (5) Evaluation Metrics. Figure 1 shows the overall methodology. Informed consent was obtained from all participants involved in the study, and, where applicable, consent was obtained from legal guardians for minor participants as approved by the IRB.

And then Figure 1 is shown with lots of AI errors on the image.

An absolutely totally fine this can be published in a science journal image that purports to show how the study was carried out but instead is just AI slop.

November 27, 2025 at 12:09 PM

Aaron Tay

@aarontay.bsky.social

Looking at discovery vendor offer "AI summarse" buttons on selected search results. Unfortunately, I think this is a waste of time given >90% if not more of search results do not have the "AI summary" button. No user is going to make the effort to remember to use this if it is so inconsistent

November 27, 2025 at 5:45 AM

Aaron Tay

@aarontay.bsky.social

People that tend to want to lecture people on "AI definitions" often betray their lack of depth. e.g. people ranting about evil "generative ai" & drawing a line between that and "neural search" and NLP - except Gen AI *is* part of NLP & "neural search" are now derived from "evil transformers"

November 27, 2025 at 5:31 AM

Aaron Tay

@aarontay.bsky.social

Navigating the new world of AI in research: Lessons learned using AI for information literacy and scholarship - interesting article not just below it quotes me but also the librarians seem to have 2 opposing experiences of AI/LLM generation of Boolean (1) liaisonlife.wordpress.com/2025/11/24/n...

Navigating the new world of AI in research: Lessons learned using AI for information literacy and scholarship (with Ophelia Chapman and Morgan Ritchie-Baum)

This is a written version of a talk given at the NCLA Biennial Conference 2025 by Ophelia Chapman (UNC Wilmington), Morgan Ritchie-Baum (Wake Forest University), and me (Steve Cramer, UNC Greensbor…

liaisonlife.wordpress.com

November 26, 2025 at 6:14 AM

Aaron Tay

@aarontay.bsky.social

Was testing out how good Gemini 3 pro, GPT5.1, Opus 4.5 was at crafting Boolean for systematic review in pubmed & asking to compare & I suddenly realised because Claude has access to Pubmed MCP it actually uses that to TEST and evaluate search strings just like humans! (1)

November 25, 2025 at 12:45 AM

Aaron Tay

@aarontay.bsky.social

Watching webinar & "comment not a question" come in.. lecture abt diff between ML, DL vs GPT/ LLMs & says tech that generates content does not improve evidence retrieval - Do these people think they only ones who know the diff? (2)

November 23, 2025 at 6:49 PM

Aaron Tay

@aarontay.bsky.social

Thinking about the business case for using specialised academic deep research eg elicit, Undermind, Consensus, Scispace, Scopus DR etc Vs general deep research (openai, Gemini, Claude research etc). (1)

November 23, 2025 at 8:47 AM

Aaron Tay

@aarontay.bsky.social

Google NotebookLM is the most impressive google product in years (the last one was Google photos). If you still one of those who think "AI" is all hype, please try Google NotebookLM

November 22, 2025 at 5:58 PM

Reposted by Aaron Tay

Ted Underwood

@tedunderwood.com

MajinBook is a badly-needed catalog for shadow libraries. It provides metadata (e.g., date of first publication, popularity on Goodreads) for over half a million English-language books. arxiv.org/abs/2511.11412 +

MajinBook: An open catalogue of digital world literature with likes

This data paper introduces MajinBook, an open catalogue designed to facilitate the use of shadow libraries--such as Library Genesis and Z-Library--for computational social science and cultural analyti...

arxiv.org

November 21, 2025 at 2:24 PM

Aaron Tay

@aarontay.bsky.social

This sounds crazy . So because paper mills are using AI to mass produce papers on some specific open datasets, journals are desk rejecting all work based on those open datasets???

petersuber @petersuber.fediscience.org.ap.brid.gy · Oct 19

Update. In response to this problem (previous post, this thread), some publishers are desk-rejecting papers based on open health datasets. The problem is not the quality of the data, but the absence of additional work to validate the findings.

Two reports:

1. "Journals and publishers crack […]

Original post on fediscience.org

fediscience.org

November 21, 2025 at 3:14 PM

Reposted by Aaron Tay

petersuber

@petersuber.fediscience.org.ap.brid.gy

Update. In response to this problem (previous post, this thread), some publishers are desk-rejecting papers based on open health datasets. The problem is not the quality of the data, but the absence of additional work to validate the findings.

Two reports:

1. "Journals and publishers crack […]

Original post on fediscience.org

fediscience.org

October 19, 2025 at 3:58 PM

Reposted by Aaron Tay

petersuber

@petersuber.fediscience.org.ap.brid.gy

Update. #socarxiv (@socarxiv) is dealing with a similar problem by requiring submitters to have #orcids and tightening its focus on the social sciences.
https://socopen.org/2025/11/19/socarxiv-submission-rule-changes/

#greenOA #preprints #repositories

SocArXiv submission rule changes

**Context** SocArXiv is experiencing record high submission rates. In addition, now that we have paper versioning – which is great – our moderators have to approve every paper revision. As a result, our volunteer workload is increasing. In addition we are receiving many non-research, spam, and AI-generated submissions. We do not have a technological way of identifying these, and it is time-consuming to read and assess them according to our moderation rules. We also don’t have moderation workflow tools that allow us to, for example, sort incoming papers by subject, to get them to specific expert moderators. So all our moderators look at all papers as they come in. That encourages us to think about narrowing the range of subjects we accept. The two rule changes below are intended to help manage the increased moderator burden. More policy changes may follow if the volume keeps increasing. **1. ORCID requirement** We require the submitting author to have a publicly accessible ORCID linked from the OSF profile page, with a name that matches that on the paper and the OSF account. In the case of non-bibliographic submittors (e.g., a research assistant submitting for a supervisor), the first author must have an ORCID. We can make exceptions for institutional submitters upon request, such as journals that upload their papers for authors. At present we are not requiring additional verification or specific trust markers on the ORCID (such as email or employer verification), just the existence of an account that lists the author’s name. It’s not a foolproof identity verification, obviously, but it adds a step for scammers, and also helps identify pseudonymous authors, which we do not permit. We may take advantage of ORCID’s trust markers program in the future and require additional elements on the ORCID record. We are happy to host papers by independent scholars, but a disproportionate share of non-research, spam, and AI-generated submissions come from independent scholars, many of whom do not have ORCIDs. For those scholars with institutional affiliations, we urge you to get an ORCID. This is a good practice that we should all endorse. **2. Focus on social sciences** At its founding, SocArXiv did not want to maintain disciplinary boundaries. It was our intention to be the big paper server for all of social sciences, and we couldn’t draw an easy line between social sciences and some humanities subjects, especially history, philosophy, religious studies, and some area studies, which are humanities in the taxonomy we use, but have significant overlap with social sciences. It was more logical just to accept them all. As the volume has increased, this has become less practical. In addition, a lot of junk and AI submissions are in the areas of religion, philosophy, and various language studies. We also don’t have moderators working in arts and humanities, and our moderators trained in social sciences are not expert at reviewing these papers. Finally, there is an excellent, open humanities archive: Knowledge Commons (KC Works), which is freely available for humanities scholars. With approval from that service, we will now direct authors to their site for papers we are rejecting in arts and humanities subjects. We continue to accept papers in education and law, which are also generally adjacent to social science. For a limited time we will accept revisions of papers we already host in arts and humanities, but urge those authors to include links to Knowledge Commons or somewhere else that can host their work in the future. We will assess papers that include arts/humanities as well as social science subject identifiers, and if we determine they are principally in art/humanities, reject them. We will continue to host all work we have already accepted. ### Share this: * Tweet * * Click to share on Reddit (Opens in new window) Reddit * More * * * Like Loading...

socopen.org

November 21, 2025 at 3:06 PM

Aaron Tay

@aarontay.bsky.social

I actually have quite a few posts queued up on latest trends in "ai academic search" eg mcp/connectors etc and sure all these features will make literature review more efficient but I can't help but feel we scratching the surface, remaining at the Horseless Carriage Syndrome stage (1)

November 21, 2025 at 11:58 AM

Aaron Tay

@aarontay.bsky.social

I wrote up an EARLY but still lengthy review of the hot new AI enhanced Google Scholar offering - Scholar Labs service aarontay.substack.com/p/scholar-la... - Image is just generated by Nano Banana Pro from my text....

November 21, 2025 at 10:49 AM

Aaron Tay

@aarontay.bsky.social

first conclusion. It runs a search then evaluates the results likely with Gemini 3? At certain points you can make it go deeper, but it seems to always stop when it has found 50 relevant results OR looked at the top 300 results! (1).

Aaron Tay @aarontay.bsky.social · 9d

Google Scholar gets into "AI powered" space Assuming this can use all the full-text they have indexed this might be a game changer. The timing of this release maybe suggests Gemini 3 is being used? scholar.googleblog.com/2025/11/scho... . Apparently some hit a waitlist, I have access though (1)

Scholar Labs: An AI Powered Scholar Search

Research questions are often detailed. Answering them can require looking at a topic from multiple angles. Today, we are introducing Scholar...

scholar.googleblog.com

November 19, 2025 at 11:09 AM

Aaron Tay

@aarontay.bsky.social

Google Scholar gets into "AI powered" space Assuming this can use all the full-text they have indexed this might be a game changer. The timing of this release maybe suggests Gemini 3 is being used? scholar.googleblog.com/2025/11/scho... . Apparently some hit a waitlist, I have access though (1)

Scholar Labs: An AI Powered Scholar Search

Research questions are often detailed. Answering them can require looking at a topic from multiple angles. Today, we are introducing Scholar...

scholar.googleblog.com

November 19, 2025 at 10:18 AM

Aaron Tay

@aarontay.bsky.social

This looks interesting. As a librarian, I'm always tryng to see if we can use this to show impact.

Alex Holcombe @alexh.bsky.social · 10d

Our new preprint! The Acknowledgments problem, and what to do about it.
With @martonkovacs.bsky.social, @pietropollo.bsky.social, @philobolobstime.bsky.social, Losia Lagisz, & Mohammad Hosseini.

November 18, 2025 at 7:57 AM

Aaron Tay

@aarontay.bsky.social

[Read] Content-aware rankings: a new approach to rankings in scholarship arxiv.org/abs/2504.05206 - essentially using scite citation typing to rank journals, institutions. Ranking can be found here scite.ai/rankings - interesting idea but there are many issues (1)

Content-aware rankings: a new approach to rankings in scholarship

Entity rankings (e.g., institutions, journals) are a core component of academia and related industries. Existing approaches to institutional rankings have relied on a variety of data sources, and appr...

arxiv.org

November 18, 2025 at 7:22 AM

Aaron Tay

@aarontay.bsky.social

Great series of articles on query understanding and measuring query granularity eg shoes Vs Nike man's shoes dtunkelang.medium.com/search-as-tr...

Search as Translation

The core challenge of search has always been communicating meaning across representations.

dtunkelang.medium.com

November 18, 2025 at 12:24 AM

Aaron Tay

@aarontay.bsky.social

uploading my blog post into Google NotebookLM and ask it to critique, just gives me mostly irrelevant comments. It wants me to go in directions I dont want to go to. kind like the legendary peer reviewer two ha. But I guess LLMs have learnt librarian = wanting to talk about IL/pedagogical approach

November 17, 2025 at 4:31 AM

Aaron Tay

@aarontay.bsky.social

Bumper crop of submissions for FORCE 2026 to be held in Singapore 3-5 June 2026! Thank you for your support!

www.linkedin.com/posts/force1...

#force2026 #researcher #librarian #practitioner #innovator #force11 #scholarlycommunication | FORCE11

The #FORCE2026 Call for Proposals is now closed – and what a response! Over 100 submissions from 28 countries worldwide. Thank you to every #researcher, #librarian, #practitioner, and #innovator who ...

www.linkedin.com

November 15, 2025 at 9:47 AM

Aaron Tay

@aarontay.bsky.social

When I wrote this. I was worried I was too harsh aarontay.substack.com/p/were-good-.... To be fair another reason why librarians don't focus on information retrieval mechanics metrics is most of these systems are black boxes anyway. Vendors don't disclose anything.