harisec
@harisec.bsky.social
2.3K followers 750 following 34 posts
Interested in web security, bug bounties, machine learning and investing. SolidGoldMagikarp
Posts Media Videos Starter Packs
Pinned
I've released 'brainstorm': an alternative way to do web fuzzing combining my fav fuzzing tool 'ffuf' (from @joohoi.bsky.social )with local LLMs (via Ollama API) to generate smarter filename tests. It usually finds more endpoints with fewer requests. Added a IIS shortname support @irsdl.bsky.social
I wrote an article about how it's possible to use Assistant Prefill to jailbreak LLMs (Large Language Models).

Here is an example of the latest model from Microsoft (Phi-4) writing a phishing email:
Great paper from Orange Tsai about unicode transformations: worst.fit/assets/EU-24...
worst.fit
OpenAI o3 model just achieved unbelievable scores (75% and 87%) on ARC-AGI, the previous models made maximum 20% and humans make around 85%. arcprize.org/blog/oai-o3-...
OpenAI o3 Breakthrough High Score on ARC-AGI-Pub
OpenAI o3 scores 75.7% on ARC-AGI public leaderboard.
arcprize.org
Reposted by harisec
FYI, here's the entire code to create a dataset of every single bsky message in real time:

```
from atproto import *
def f(m): print(m.header, parse_subscribe_repos_message())
FirehoseSubscribeReposClient().start(f)
```
As most people know, it's trivial to save all the bsky posts.
Reposted by harisec
A librarian that previously worked at the British Library created a relatively small dataset of bsky posts, hundreds of times smaller than previous researchers, to help folks create toxicity filters and stuff.

So people bullied him & posted death threats.

He took it down.

Nice one, folks.
Reposted by harisec
qwq is a new openly licensed LLM from Alibaba Cloud's Qwen team. It's an attempt at the OpenAI o1 "reasoning" trick that runs on my Mac (20GB download) via Ollama... and it's pretty good!

My detailed notes here: simonwillison.net/2024/Nov/27/... - here's its attempt an SVG pelican riding a bicycle.
An SVG of a pelican riding a bicycle. It's quite abstract. The bicycle is two half circles and a simple frame. The pelican is sky blue with spread wings and a curved neck leading to a small head. It has definite pelican vibes.
Interesting, I've been playing with URLTeam as well but for other purposes, there is definitely a lot of noise. That's basically my main problem, how to filter out the noise. I did not found a solution until now.
Made a NotebookLM podcast about this, from a few .ro articles, if people are interested: notebooklm.google.com/notebook/742...
Sign in - Google Accounts
notebooklm.google.com
I'm from Romania, TikTok is hugely popular here, we have over 8.9 million TikTok user (from 19 million total population). Many influencers were paid to promote TikTok tags (like #echilibrușiverticalitate - this one received 2.4 million views) that were later used to promote Calin Georgescu.
CommonCrawl is this: commoncrawl.org - they have 17 of crawled data is one of the sources LLMs use for training. I think it's a great source for building links between links.
Common Crawl - Open Repository of Web Crawl Data
We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone.
commoncrawl.org
Build a huge database for that and use it to suggest new links based on links you already discovered. I think that has big potential. In the beggining I was thinking to finetune an LLM but I think a DB should be enough.
Thanks, that means a lot to me. About statistical data: i had a similar idea for a long time.I was thinking to read all the URLs from all the crawls available in CommonCrawl and then build a database with relations between links. If /wp-login.php is found you might try /wp-register.php, xmlrpc.php