Data Collection And Sampling

Collect the right data faster

Trawl through massive unlabeled datasets with embedding search to quickly bootstrap new classes and improve performance on edge cases.

Find the best data to label next

  • Target your data collection towards datapoints that are most helpful to improving your model.
  • Find more examples of rare scenarios or situations where your model struggles.
  • Submit datapoints to your labeling system so your model accuracy improves on the next training run.

Get value quickly

  • Bootstrap new subclasses with only a handful of examples using Aquarium’s few shot learning technology.
  • Aquarium handles embedding infrastructure for you so you can focus on improving your models.

Search your unlabeled datasets

Approach data mining as a search problem. Index hundreds of millions of unlabeled datapoints from your production environment for targeted data collection.
Run fine grained search on your data. Embedding similarity allows you to find specific datapoints of interest among large pools of data.
Reduce operational time spent on doing data collection in the field.

Collaborate on data collection

  • Smart data collection: Use data quality analysis and model evaluation work to drive data collection. Get more model improvement 10x faster.
  • Close the feedback loop between your model outputs and data inputs. Align your modeling team and your data team in improving model performance.

Get in touch

Schedule time to get started with Aquarium