EAR-MP-2025

Home

Title

Synthetic Training Data Generation

Motivation

Developing high-performance AI requires large amounts of training data. However, for ordinary researchers or engineers, other than those at some large platformers, it is not easy to obtain training data. In this challenge, participants compete to develop methods for artificially generating training data that can be used for machine learning.

Goal & Methodologies

Generate synthetic (non-replica) data of training data (e.g. adult, census, covtype, etc.) used as machine learning benchmarks on Kaggle, UCI KDD Archive, etc.

Criteria for synthetic data

・The difference in accuracy between an AI model trained on generated synthetic data and an AI model trained on the original data

・To what extent does generated synthetic data protect the privacy of the original data?

Deliverables

Poster, Presentation Materials

Expected number of team members

2-4 students

Expected duration in month

4-6 months