Lightnews — Scholar-powered news

Explosion 💥

@explosion.ai

🔥 New case study: How GitLab built scalable spaCy pipelines to process a year's worth of support tickets and create actionable insights to better support their community.

explosion.ai/blog/gitlab-...

September 16, 2024 at 2:30 PM

Explosion 💥

@explosion.ai

📝 Out now: How S&P Global shipped NLP pipelines for real-time commodities trading insights in a high-security environment with LLMs in the loop.

10× speed-up of their data workflows and up to 99% accuracy at 6mb!

explosion.ai/blog/sp-glob...

June 21, 2024 at 5:42 PM

Explosion 💥

@explosion.ai

Out now: Thinc v9.0! 🔮

This release is the foundation of the upcoming spaCy v4 release and adds support for more powerful learning rates.

We have also merged thinc-apple-ops into Thinc, so Apple AMX is supported out-of-the-box.

Details & release notes: github.com/explosion/th...

April 22, 2024 at 6:12 AM

Explosion 💥

@explosion.ai

5️⃣ Error analysis
To maximize ROI from your data engineering, evaluation metrics should be paired with quantitative error analysis. Our latest example error analysis recipe iterates through false positives/negatives and lets you record the reasons to inform your improvement plan.

April 11, 2024 at 11:42 AM

Explosion 💥

@explosion.ai

3️⃣ Model training
During the training process, we recommend running Prodigy's train-curve command, which is a great way to quickly see whether more data of similar quality as the current dataset would improve the model.

April 11, 2024 at 11:41 AM

Explosion 💥

@explosion.ai

2️⃣ Review
Quantitative measurements of disagreements should always be accompanied by a qualitative analysis. Prodigy's review recipe is an excellent tool for that.

We use it in all our consulting projects to inform and illustrate data model discussions: explosion.ai/tailored-sol...

April 11, 2024 at 11:41 AM

Explosion 💥

@explosion.ai

Prodigy provides built-in inter-annotator agreement commands (for tokens and text-level annotations) that you can run directly on your annotated dataset. It also lets you configure custom overlap by specifying the expected number of annotations per example.

April 11, 2024 at 11:41 AM

Explosion 💥

@explosion.ai

1️⃣ Dataset development
Data development is an iterative process. It’s good practice to test your initial annotation scheme and guidelines during the pilot phase and measure the inter-annotator agreement.

April 11, 2024 at 11:40 AM

Explosion 💥

@explosion.ai

New Prodigy plugin: prodigy-evaluate!

📈 confusion matrix and per-label stats
🔎 explore examples your model struggles with most
🍬 entity-level insights for NER with MantisNLP's nervaluate library

github.com/explosion/pr...

March 27, 2024 at 11:46 AM

Explosion 💥

@explosion.ai

The SSO plugin is compatible with Prodigy >=1.15.0 and is part of our expanded company license offering, which also includes priority community and email support.

February 19, 2024 at 10:27 AM

Explosion 💥

@explosion.ai

Some data development and annotation projects need top-notch security.

🔒 Introducing the Prodigy Single Sign-On (SSO) plugin
It's the first in a series of premium Prodigy plugins for company licenses.

February 19, 2024 at 10:27 AM

Explosion 💥

@explosion.ai

🗺️ Custom mapping: Instead of using a large skills taxonomy as the NER label scheme, generic skill entities are mapped onto a taxonomy using semantic similarity.

Great work from the Nesta team & thanks to ESCoE for funding!

February 5, 2024 at 2:21 PM

Explosion 💥

@explosion.ai

✂️ Multiskill splitting: Nesta uses spaCy's dependency parsing to split up multi-skill phrases, like "developing apps and visualizations" into individual skills "developing apps" and "developing visualizations".

February 5, 2024 at 2:20 PM

Explosion 💥

@explosion.ai

New case study: How the Nesta data science team extracts skills from millions of online job ads to better understand UK skill demand, using spaCy and Prodigy.

A few project highlights in this thread 🧵✨

explosion.ai/blog/nesta-s...

February 5, 2024 at 2:20 PM

Explosion 💥

@explosion.ai

The new entity linking functionality lets you to specify a KB and candidate selector. The LLM will then pick the most likely candidate, given the context.

spacy.io/api/large-la...

January 25, 2024 at 10:32 AM

Explosion 💥

@explosion.ai

Out now: spacy-llm v0.7.0!

🔗 Built-in entity linking support
💬 New task for translation from/to arbitrary languages
❓ Use the Doc as prompt for question answering
🧩 Arbitrarily long docs via sharding

github.com/explosion/sp...

January 25, 2024 at 10:31 AM

Explosion 💥

@explosion.ai

💌 OUT NOW: The latest edition of our spaCy newsletter featuring our new Merch Store, spaCy 3.7 and spacy-llm 0.6 releases, links to our latest talks, Nesta's Skills Extractor library, and new Prodi.gy blog on 2023 updates! 🚀

Read & sign up: us12.campaign-archive.com?u=83b0498b1e...

November 30, 2023 at 8:45 PM

Explosion 💥

@explosion.ai

Dealing with a huge bucket of images that you want to annotate? The new image retrieval features in Prodigy-ANN might help!

To help explain this new feature, @koaning.bsky.social made a small demo to highlight the new feature 👀

youtu.be/vhbyekSsG8o

Prodigy interface highlighting with a bounding box a laptop and a phone.

October 30, 2023 at 9:44 PM

Explosion 💥

@explosion.ai

🛠️ Improved DX of working with custom CS/JSS by supporting loading from local dirs and remote URLs

Incorporate frameworks like HTMX for a dynamic interface using our latest Custom Events.

prodi.gy/docs/custom-....

October 26, 2023 at 12:50 PM

Explosion 💥

@explosion.ai

💥 Prodigy 1.14.5 is out! 💥 We've focused on the front end 💅
prodi.gy/docs/changelog

☑️ A new toggle between token and character-based highlighting to NER and span UI: speedy token-based annotations and precise character highlighting! 🚀

October 26, 2023 at 12:50 PM

Explosion 💥

@explosion.ai

That means that Prodigy now has 5 official plugins! The Prodigy Docs have also been updated to reflect this change.

You can see all the details here:
prodi.gy/docs/plugins

October 25, 2023 at 1:53 PM

Explosion 💥

@explosion.ai

Announcing ✨Prodigy-HF ✨

It's a new plugin that allows you to train @huggingface.bsky.social NER models directly on annotated data in Prodigy. It also provides a recipe to upload annotations to Hugging Face HUB!

October 25, 2023 at 1:52 PM

Explosion 💥

@explosion.ai

For more details on the Prodigy-PDF plugin and other Prodigy plugins, check our docs page: prodi.gy/docs/plugins

October 24, 2023 at 3:08 PM

Explosion 💥

@explosion.ai

Want to annotate PDF files with OCR? Our new Prodigy-PDF plugin can help with that!

To help explain how to use PDF segmentation and OCR @koaning.bsky.social made a small demo video to highlight the new feature 👀 www.youtube.com/watch?v=rwyz...

October 24, 2023 at 3:07 PM

Explosion 💥

@explosion.ai

We recently released ✨ Prodigy-ANN ✨ that allows you to use contextual search to find relevant subsets of data to annotate first.

To help explain this new feature, @koaning.bsky.social made a small demo to highlight the new feature 👀

youtu.be/jyu2nbjwfXw

October 20, 2023 at 1:56 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news