Aaron Tay
@aarontay.bsky.social
3.2K followers 330 following 2K posts
I'm librarian + blogger from Singapore Management University. Social media, bibliometrics, analytics, academic discovery tech.
Posts Media Videos Starter Packs
Pinned
aarontay.bsky.social
I'm an academic librarian blogger at Musings about librarianship since 2009.
To get a taste of what I blog, see "Best of..." Musings about librarianship - Posts on discovery, open access, bibliometrics, social media & more. Have you read them?
musingsaboutlibrarianship.blogspot.com/p/best-of.html
aarontay.bsky.social
Note that this analysis applies to before the latest openalex "Walden" rebuild update still in beta. Quick check shows quite different results.
aarontay.bsky.social
As i learn more on the nuts and bolts of IR eg HNSW, ivf/pq its interesting but for most end users it isnt useful except maybe it makes you understand why its somewhat tricky to implement prefilter + dense embeddings particularly if it isnt setup initially for it. (5)
aarontay.bsky.social
It also makes a subtle distinction between sparse vector vs sparse "representation". A sparse vector is as you expect most values are zero and usually high dimensional. The sparse representation according to the book refers to the way you store the vector. Eg inverted index/COO/CSR formats. (4)
aarontay.bsky.social
Also a very nice way of decomposing user intent such that system needs (a) content understanding (b) domain understanding and (c) user understanding (3)
aarontay.bsky.social
For example I was always somewhat confused when it comes to search vs recommendations but the book frames it as a spectrum which is very nice way to look at it (2)
aarontay.bsky.social
Finished the first 3 chapters on lexical search and the last 3 on LLM embeddings + RAG. Mostly covering things i knew but I like some of the overall conceptual framework (1)
aarontay.bsky.social
Next piece. Things i still dont quite fully grasp about the topic.
aarontay.bsky.social
Really curious about the new natural language search in Primo NDE (not Primo Research Assistant). Hopefully they account for the fact a large proportion of queries in Primo are known item searchs not subject searches
aarontay.bsky.social
It's ironic to see 2025 publications talking about academic ai search engines saying things like Elicit uses GPT3 and Undermind.ai uses arxiv. (Might want to check if there are more updated sources).
aarontay.bsky.social
Sorry. All virtual seats for Mike's session are now over. But we still have seats for other events in this series. eventregistration.smu.edu.sg/event/TTT202...
aarontay.bsky.social
I realise I am very uncomfortable with agents and I've been thinking why (1)
aarontay.bsky.social
Want to hear more from @mikecaulfield.bsky.social? Mike the creator of SIFT and co-author of Verified- is doing a free online class on using searching + AI for Verification. 24 October 2025 (Friday), 10:00–11:30 am SGT (UTC +8). eventregistration.smu.edu.sg/event/TTT2025/
aarontay.bsky.social
It definitely cant do things like find citations/references of paper x and... the thinking pretends it can but it can't . To be fair neither can undermind.ai etc. So far, I haven't found a acad deep research that is agentic enough to do that but modern general LLMs like chatgpt CAN do such things(2)
aarontay.bsky.social
Playing more with Scopus deep research.. Looking at "thinking" & testing it looks like Scopus Deep Research doesn't have citation searching as a tool, unlike Undermind, Consensus Deep search etc. Looks more like it is generated various questions & choosing keywords to try to find content (1)
aarontay.bsky.social
Tried with the new Warden OpenAlex rewrite & the % with abstracts in Elsevier is higher but still below last year. I wonder - OpenAlex stores abstracts as inverted index, is that a way to bypass the issue? Technically the record does not have the abstract but you can still match text in abstract?
aarontay.bsky.social
That's interesting
aarontay.bsky.social
Hmm yeah i may misunderstand what this feature is meant to do. Will ask at our official webinar today
aarontay.bsky.social
Another favourite question, can you use GS alone for systematic review, like scite assistant and a few others, Sciencedirect is tripped up by Gehanno(2023) because the first sentence of abstract is "it is said that..." though the paper findings actually say "could be used alone for SR" (6)
aarontay.bsky.social
Maybe I misunderstand how or what "compare experiment" feature is for, but it makes little sense to compare papers that are totally different in method and/or objective?!?! (5)
aarontay.bsky.social
Some of it can be explained. e.g. Sciencedirect AI does a secondary cite on X and hence "thinks" that X is relevant, when actually X is on a totally different topic (just happens to mention results from truly relevant y). But some I really can't explain why it appears in compare experiment table(4)
aarontay.bsky.social
The biggest disappointment is the "compare experiment" feature. Leaving aside you cant control the headers for comparison in many tests the top few results totally not related to the question? eg in this one the first two are not studies estimating size of GS! Why is this so? (3)
aarontay.bsky.social
The generated answer here looks ok. That said the fact it searches full-text instead of just abstract means it more likely to do secondary citations similar to scite assistant.
e.g (Momodu, Okunade & Adepoju, 2022) is cited because it mentions the result from (Gusenbauer, 2019) (2)
aarontay.bsky.social
Kicking the tires of Sciencedirect AI. (1)
aarontay.bsky.social
in other words I like the grainularity of ASJC Subject Areas but I dont want to assign it by using the journal the article is in....