Data Collection And Sampling

Collect the best data to improve your models

By focusing data collection and labeling on highest value data, you can get more model improvement in less time and with less labeling costs than random sampling.

Fix bad labels

  • Export bad data to labeling providers for correction. Aquarium's integrations make this as easy as clicking a button.
  • Have confidence in your data quality. Be sure that your model is being trained and evaluated on clean data.
Exporting data to a labeling provider

Find the best data to label next

  • Target your data collection towards datapoints that are most helpful to improving your model.
  • Find more examples of rare scenarios or situations where your model struggles.
  • Submit datapoints to your labeling system so your model accuracy improves on the next training run.

Search your unlabeled datasets

Approach data sampling as a search problem. Index millions of unlabeled datapoints from your production environment for targeted data collection.
Reduce operational time spent on doing data collection in the field.
Do fine grained search on your data. Embedding similarity allows you to find specific datapoints of interest among large pools of data.

Collaborate on data collection

  • Smart data collection: Use data quality analysis and model evaluation work to drive data collection. Get more model improvement for less labeling cost than random sampling.
  • Close the feedback loop between your model outputs and data inputs. Align your modeling team and your data team in improving model performance.

Get in touch

Schedule time to get started with Aquarium