Peter Gao
April 24, 2023
-
5 min

Introducing Performance Clustering AutoSegments

Introducing Performance Cluster AutoSegments

Understanding why a machine learning model is underperforming can be difficult. We built Performance Cluster AutoSegments to make this easier by:

  • Automatically surfacing top failure cases in the labeled dataset,
  • Automatically clustering together which frames and labels are contributing to a given failure case.

Performance Cluster AutoSegments are a new type of AutoSegment in Aquarium. By combining traditional model performance metrics with advanced embedding analysis, Aquarium can automatically surface the top problems in your dataset and help you prioritize which actions to take to get the most improvement to your model performance.

Connect model performance with clusters of related data in embedding space.

Solving for Model Failure Modes with Targeted Data Collection

Aquarium provides an end-to-end workflow for evaluating your models' performance and performing targeted data collection to solve for specific failure modes or edge cases.

This cycle of evaluating a model, identifying failure modes, collecting relevant data, and retraining the model with this additional context is a robust, repeatable way to improve the model's performance on the scenarios that matter to your business.

Aquarium supports the targeted data collection cycle with a number of capabilities:

Fine-tuned Embedding Models for Any Domain

Aquarium will train a custom embedding model for each of your ML tasks. We also support customer-provided embeddings if you'd prefer to bring your own. High quality embeddings are a key enabler for similarity search and other embeddings based dataset analysis.

Analytical Tools for Understanding Models and Datasets

Aquarium offers a number of analytical tools to explore your imagery, labels, metadata and models via a no-code-required web app.

Now with AutoSegments, we also support automation for certain high-impact workflows - empowering data teams and reducing the time required from domain or ML experts.

High Performance Similarity Search

Aquarium's Collection Campaigns enable embedding-based similarity search either within customers' labeled datasets, or across their entire unlabeled data corpus.

Bi-directional Integration Into Your Data Stack

Aquarium's python client, webhook integrations and structured data exports allow customers to integrate Aquarium directly into their data management systems, labeling providers, model training stack, or anything else. Whether you use Aquarium or something else as your system of record for your ML data, bi-directional integration tools make it easy to keep everything in sync.

AutoSegments Speeds Up the Data Collection Cycle

With AutoSegments, teams can run through the full cycle much more quickly. By automatically identifying the top failure modes and their associated data, Aquarium frees up ML engineers' and domain experts' time to work on other important problems.

Running this cycle faster means shipping more performant models, more often.

Reference the diagrams below for where AutoSegments fit into the process and how they save time.

Before AutoSegments, ML Engineers and domain experts spend significant time identifying patterns of failures.
Before AutoSegments, ML Engineers and domain experts spend significant time identifying patterns of failures.
With AutoSegments, Aquarium automatically identifies the top failure patterns for the model tied to specific clusters of data.
With AutoSegments, Aquarium automatically identifies the top failure patterns for the model tied to specific clusters of data.

Understanding Failure Modes In The RarePlanes Dataset

Here's a practical example using Performance Clusters to identify a consistent model failure mode on the boundary between two classes and then sending data on that class boundary for curation via a collection campaign.

Dataset Context

The open source RarePlanes dataset provided by Cosmiq Works includes satellite imagery of aircraft with annotations classifying them by role.

Aquarium generated fine-tuned image and object-level embeddings on the dataset and trained a detection model for the aircraft.

In the example below:

  • We identify an issue with the model's performance using the confusion matrix - it's incorrectly classifying certain small aircraft as medium aircraft.
  • We condense the problem into a specific cluster of data representing an edge case between two classes - light jet aircraft on the class boundary between small and medium aircraft make up the majority of the confusions.
  • We queue that cluster of the dataset for data collection, setting up similarity search into our unlabeled set to find more representative examples on the class boundary.

Retraining with the newly collected and labeled data should help differentiate small and medium aircraft and reduce this failure mode in the next iteration of the model.

Identifying and Actioning a Class Boundary Issue less than 20 Seconds - Step by Step

  1. Using the confusion matrix, select the most significant confusion for the Small Civil Transport label class.
  2. Aquarium automatically clusters these confusions in embedding space and ranks the clusters by severity.
  3. Choose the most severe cluster of failures (42 of the 112 elements in the cluster represent the confusion).
  4. Look at the specific labels that make up the selected cluster(s).
  5. In this case the cluster of Small Civil Transport aircraft the model is incorrectly classifying as Medium Civil Transport aircraft are primarily light jet aircraft. Due to their size and role, these aircraft are often on the boundary between Small and Medium Civil Transports.
  6. Transfer all elements in the cluster to a Collection Campaign, where they can be used to search for similar unlabeled data. Adding more labeled examples to the training dataset and retraining the model is one possible way to improve the detector's performance on the light jet aircraft subset of the Small Civil Transport class.
Root cause which clusters of related data drive model confusions

Getting Started

Targeted data sampling is one of the most effective ways to solve for edge cases and other complex failure modes with ML models. Now, with AutoSegments, Aquarium makes the entire collection process faster than ever.

If you'd like to get started, reach out to your Aquarium Customer Support representative or contact us.

Get in touch

Schedule time to get started with Aquarium