1. What's clickable and their exact bounds
2. What's scrollable and in which direction
So it's sometimes necessary to augment your screenshot with distilled metadata from the DOM/a11y tree
1. What's clickable and their exact bounds
2. What's scrollable and in which direction
So it's sometimes necessary to augment your screenshot with distilled metadata from the DOM/a11y tree
2. Devs suck at accessibility: unlabeled images and icon buttons mean there's info in the pixels that's not in the DOM text
3. Image: 200-300 tokens. DOM JSON: 1K-10K tokens
2. Devs suck at accessibility: unlabeled images and icon buttons mean there's info in the pixels that's not in the DOM text
3. Image: 200-300 tokens. DOM JSON: 1K-10K tokens