Aaron Tay
banner
aarontay.bsky.social
Aaron Tay
@aarontay.bsky.social
I'm librarian + blogger from Singapore Management University. Social media, bibliometrics, analytics, academic discovery tech.
Pinned
I'm an academic librarian blogger at Musings about librarianship since 2009.
To get a taste of what I blog, see "Best of..." Musings about librarianship - Posts on discovery, open access, bibliometrics, social media & more. Have you read them?
musingsaboutlibrarianship.blogspot.com/p/best-of.html
Was testing out how good Gemini 3 pro, GPT5.1, Opus 4.5 was at crafting Boolean for systematic review in pubmed & asking to compare & I suddenly realised because Claude has access to Pubmed MCP it actually uses that to TEST and evaluate search strings just like humans! (1)
November 25, 2025 at 12:45 AM
Watching webinar & "comment not a question" come in.. lecture abt diff between ML, DL vs GPT/ LLMs & says tech that generates content does not improve evidence retrieval - Do these people think they only ones who know the diff? (2)
November 23, 2025 at 6:49 PM
Thinking about the business case for using specialised academic deep research eg elicit, Undermind, Consensus, Scispace, Scopus DR etc Vs general deep research (openai, Gemini, Claude research etc). (1)
November 23, 2025 at 8:47 AM
Google NotebookLM is the most impressive google product in years (the last one was Google photos). If you still one of those who think "AI" is all hype, please try Google NotebookLM
November 22, 2025 at 5:58 PM
Reposted by Aaron Tay
MajinBook is a badly-needed catalog for shadow libraries. It provides metadata (e.g., date of first publication, popularity on Goodreads) for over half a million English-language books. arxiv.org/abs/2511.11412 +
MajinBook: An open catalogue of digital world literature with likes
This data paper introduces MajinBook, an open catalogue designed to facilitate the use of shadow libraries--such as Library Genesis and Z-Library--for computational social science and cultural analyti...
arxiv.org
November 21, 2025 at 2:24 PM
This sounds crazy . So because paper mills are using AI to mass produce papers on some specific open datasets, journals are desk rejecting all work based on those open datasets???
Update. In response to this problem (previous post, this thread), some publishers are desk-rejecting papers based on open health datasets. The problem is not the quality of the data, but the absence of additional work to validate the findings.

Two reports:

1. "Journals and publishers crack […]
Original post on fediscience.org
fediscience.org
November 21, 2025 at 3:14 PM
Reposted by Aaron Tay
Update. In response to this problem (previous post, this thread), some publishers are desk-rejecting papers based on open health datasets. The problem is not the quality of the data, but the absence of additional work to validate the findings.

Two reports:

1. "Journals and publishers crack […]
Original post on fediscience.org
fediscience.org
October 19, 2025 at 3:58 PM
Reposted by Aaron Tay
Update. #socarxiv (@socarxiv) is dealing with a similar problem by requiring submitters to have #orcids and tightening its focus on the social sciences.
https://socopen.org/2025/11/19/socarxiv-submission-rule-changes/

#greenOA #preprints #repositories
SocArXiv submission rule changes
**Context** SocArXiv is experiencing record high submission rates. In addition, now that we have paper versioning – which is great – our moderators have to approve every paper revision. As a result, our volunteer workload is increasing. In addition we are receiving many non-research, spam, and AI-generated submissions. We do not have a technological way of identifying these, and it is time-consuming to read and assess them according to our moderation rules. We also don’t have moderation workflow tools that allow us to, for example, sort incoming papers by subject, to get them to specific expert moderators. So all our moderators look at all papers as they come in. That encourages us to think about narrowing the range of subjects we accept. The two rule changes below are intended to help manage the increased moderator burden. More policy changes may follow if the volume keeps increasing. **1. ORCID requirement** We require the submitting author to have a publicly accessible ORCID linked from the OSF profile page, with a name that matches that on the paper and the OSF account. In the case of non-bibliographic submittors (e.g., a research assistant submitting for a supervisor), the first author must have an ORCID. We can make exceptions for institutional submitters upon request, such as journals that upload their papers for authors. At present we are not requiring additional verification or specific trust markers on the ORCID (such as email or employer verification), just the existence of an account that lists the author’s name. It’s not a foolproof identity verification, obviously, but it adds a step for scammers, and also helps identify pseudonymous authors, which we do not permit. We may take advantage of ORCID’s trust markers program in the future and require additional elements on the ORCID record. We are happy to host papers by independent scholars, but a disproportionate share of non-research, spam, and AI-generated submissions come from independent scholars, many of whom do not have ORCIDs. For those scholars with institutional affiliations, we urge you to get an ORCID. This is a good practice that we should all endorse. **2. Focus on social sciences** At its founding, SocArXiv did not want to maintain disciplinary boundaries. It was our intention to be the big paper server for all of social sciences, and we couldn’t draw an easy line between social sciences and some humanities subjects, especially history, philosophy, religious studies, and some area studies, which are humanities in the taxonomy we use, but have significant overlap with social sciences. It was more logical just to accept them all. As the volume has increased, this has become less practical. In addition, a lot of junk and AI submissions are in the areas of religion, philosophy, and various language studies. We also don’t have moderators working in arts and humanities, and our moderators trained in social sciences are not expert at reviewing these papers. Finally, there is an excellent, open humanities archive: Knowledge Commons (KC Works), which is freely available for humanities scholars. With approval from that service, we will now direct authors to their site for papers we are rejecting in arts and humanities subjects. We continue to accept papers in education and law, which are also generally adjacent to social science. For a limited time we will accept revisions of papers we already host in arts and humanities, but urge those authors to include links to Knowledge Commons or somewhere else that can host their work in the future. We will assess papers that include arts/humanities as well as social science subject identifiers, and if we determine they are principally in art/humanities, reject them. We will continue to host all work we have already accepted. ### Share this: * Tweet * * Click to share on Reddit (Opens in new window) Reddit * More * * * Like Loading...
socopen.org
November 21, 2025 at 3:06 PM
I actually have quite a few posts queued up on latest trends in "ai academic search" eg mcp/connectors etc and sure all these features will make literature review more efficient but I can't help but feel we scratching the surface, remaining at the Horseless Carriage Syndrome stage (1)
November 21, 2025 at 11:58 AM
I wrote up an EARLY but still lengthy review of the hot new AI enhanced Google Scholar offering - Scholar Labs service aarontay.substack.com/p/scholar-la... - Image is just generated by Nano Banana Pro from my text....
November 21, 2025 at 10:49 AM
first conclusion. It runs a search then evaluates the results likely with Gemini 3? At certain points you can make it go deeper, but it seems to always stop when it has found 50 relevant results OR looked at the top 300 results! (1).
Google Scholar gets into "AI powered" space Assuming this can use all the full-text they have indexed this might be a game changer. The timing of this release maybe suggests Gemini 3 is being used? scholar.googleblog.com/2025/11/scho... . Apparently some hit a waitlist, I have access though (1)
Scholar Labs: An AI Powered Scholar Search
Research questions are often detailed. Answering them can require looking at a topic from multiple angles. Today, we are introducing Scholar...
scholar.googleblog.com
November 19, 2025 at 11:09 AM
Google Scholar gets into "AI powered" space Assuming this can use all the full-text they have indexed this might be a game changer. The timing of this release maybe suggests Gemini 3 is being used? scholar.googleblog.com/2025/11/scho... . Apparently some hit a waitlist, I have access though (1)
Scholar Labs: An AI Powered Scholar Search
Research questions are often detailed. Answering them can require looking at a topic from multiple angles. Today, we are introducing Scholar...
scholar.googleblog.com
November 19, 2025 at 10:18 AM
This looks interesting. As a librarian, I'm always tryng to see if we can use this to show impact.
Our new preprint! The Acknowledgments problem, and what to do about it.
With @martonkovacs.bsky.social, @pietropollo.bsky.social, @philobolobstime.bsky.social, Losia Lagisz, & Mohammad Hosseini.
November 18, 2025 at 7:57 AM
[Read] Content-aware rankings: a new approach to rankings in scholarship arxiv.org/abs/2504.05206 - essentially using scite citation typing to rank journals, institutions. Ranking can be found here scite.ai/rankings - interesting idea but there are many issues (1)
Content-aware rankings: a new approach to rankings in scholarship
Entity rankings (e.g., institutions, journals) are a core component of academia and related industries. Existing approaches to institutional rankings have relied on a variety of data sources, and appr...
arxiv.org
November 18, 2025 at 7:22 AM
Great series of articles on query understanding and measuring query granularity eg shoes Vs Nike man's shoes dtunkelang.medium.com/search-as-tr...
Search as Translation
The core challenge of search has always been communicating meaning across representations.
dtunkelang.medium.com
November 18, 2025 at 12:24 AM
uploading my blog post into Google NotebookLM and ask it to critique, just gives me mostly irrelevant comments. It wants me to go in directions I dont want to go to. kind like the legendary peer reviewer two ha. But I guess LLMs have learnt librarian = wanting to talk about IL/pedagogical approach
November 17, 2025 at 4:31 AM
Bumper crop of submissions for FORCE 2026 to be held in Singapore 3-5 June 2026! Thank you for your support!

www.linkedin.com/posts/force1...
#force2026 #researcher #librarian #practitioner #innovator #force11 #scholarlycommunication | FORCE11
The #FORCE2026 Call for Proposals is now closed – and what a response! Over 100 submissions from 28 countries worldwide. Thank you to every #researcher, #librarian, #practitioner, and #innovator who ...
www.linkedin.com
November 15, 2025 at 9:47 AM
When I wrote this. I was worried I was too harsh aarontay.substack.com/p/were-good-.... To be fair another reason why librarians don't focus on information retrieval mechanics metrics is most of these systems are black boxes anyway. Vendors don't disclose anything.
“We’re Good at Search”… Just Not the Kind That the AI era Demands - a Provocation
I might be exaggerating slightly, but if you look at the few new evaluation matrices for AI-powered search circulating, “relevancy” is often just one of several categories, evaluated in a highly subje...
aarontay.substack.com
November 15, 2025 at 5:29 AM
interesting read on business models.
November 13, 2025 at 2:57 AM
Reposted by Aaron Tay
Bug in Springer Nature metadata may be causing ‘significant, systemic’ citation inflation
Bug in Springer Nature metadata may be causing ‘significant, systemic’ citation inflation
Millions of researchers could be affected by a “dramatic distortion of citation counts” likely caused by flaws in how the academic publishing giant Springer Nature handles article metadata, accordi…
retractionwatch.com
November 11, 2025 at 5:54 PM
Reposted by Aaron Tay
Bug in Springer Nature metadata may be causing ‘significant, systemic’ citation inflation retractionwatch.com/2025/11/11/b...
Bug in Springer Nature metadata may be causing ‘significant, systemic’ citation inflation
Millions of researchers could be affected by a “dramatic distortion of citation counts” likely caused by flaws in how the academic publishing giant Springer Nature handles article metadata, accordi…
retractionwatch.com
November 12, 2025 at 12:42 AM
Ok officially closed. Hit historic high number of submissions... Nice headache to have to select from I guess
Amazed by the volume of last minute submissions... Still a little birdie tells me unofficially the form will be open for a few more days...
📣 Registration OPEN for #FORCE2026 (3–5 Jun, Singapore). A conference on the future of research communication & open science.

Early-bird: by 28 Feb 2026.
Call for Proposal: Ends 9 Nov 2025. Authors get a special rate. Details & CFP force11.org/force2026/

#FORCE2026 #ScholarlyCommunication
November 12, 2025 at 9:46 AM
Interesting new Google Scholar PDF button feature scholar.googleblog.com/2025/11/mark...
November 12, 2025 at 6:06 AM