EAR-MP-2025

Home

Title

Gourmet Plate Transformer (GPT): Food photograph-based recipe generation and Q&A with VLMs

Summary

(L1) Train a good food photo classification model, which can identify the food class and retrieve its recipe from a large dataset

(L2) Train a classifier as in (L1), use RAG (Retrieval-Augmented Generation)-based approach to prompt the LLM to generate the food recipe.

(L3) Using the trained LLM in (L2), construct a chatbot that can answer your questions while cooking.

(L4) Using VLMs, perform supervised finetuning (SFT) on large-scale food photograph & recipe dataset, to construct an end-to-end recipe generation model.

(L5) Using step-by-step recipe instructions as reasoning processes for cooking a meal, we can try chain-of-thought (CoT)-based approach for “zero-shot” recipe generation from images.

Deliverables

Poster, Demo (hopefully)

Expected number of team members

2-4 students

Expected duration in month

4-6 months

Data sets

Recipe1M, RecipeNLG, additional food photograph and recipe collection from web if possible