Apr 15

CSC Workshop: Build Your Own Data Factory: AI Agents That Generate and Validate Data

-
Barnard College
  • Add to Calendar 2026-04-15 17:00:00 2026-04-15 19:00:00 CSC Workshop: Build Your Own Data Factory: AI Agents That Generate and Validate Data ​Most of us use ChatGPT to generate text. But large language models can also produce structured, typed outputs—such as JSON with defined fields and constraints—making them far more powerful for building real systems. ​In this workshop, participants build a simple two-agent pipeline: one agent generates synthetic data records, and another reviews them for quality. Along the way, we explore structured LLM outputs, generator–validator loops, and multi-agent design patterns that are quickly becoming core building blocks of production AI. ​Why synthetic data? Realistic datasets are often paywalled, privacy-restricted, expensive to annotate, or unavailable in emerging domains. In fields like clinical AI, synthetic data offers a practical alternative—and in some cases, can even outperform models trained on real data. ​We begin with templates from healthcare, civic tech, and humanitarian aid, then invite you to design your own schema for any domain.  ​Open to all; basic familiarity with Python and Jupyter or Colab is recommended. ​About Shayan: Shayan Chowdhury (Columbia '26, CS & Policy) has worked across medical AI research at Harvard Med, disaster relief coordination in 38+ countries through his nonprofit Reach4Help in partnership with the UN and Google, and COVID-19 data infrastructure for the Bangladesh government — the common thread being using data and AI to make systems work for people who usually don't get a seat at the table. He'll kick off with a 30-minute talk walking through his journey and how synthetic data and multi-agent systems show up in real research and production, before we get into hands-on coding. In his free time, he plays guitar and sings jazz-pop mashups of Frank Sinatra and The Weeknd that absolutely no one asked for.  This workshop is planned to take place in-person only (Milstein 516). ​​​​​​We look forward to seeing you there! ​​​​​​For more information about the Barnard CSC, go to https://www.csc.barnard.edu or follow us on Instagram and X (@barnard_csc). Barnard College Barnard College barnard-admin@digitalpulp.com America/New_York public

​Most of us use ChatGPT to generate text. But large language models can also produce structured, typed outputs—such as JSON with defined fields and constraints—making them far more powerful for building real systems.

​In this workshop, participants build a simple two-agent pipeline: one agent generates synthetic data records, and another reviews them for quality. Along the way, we explore structured LLM outputs, generator–validator loops, and multi-agent design patterns that are quickly becoming core building blocks of production AI.

​Why synthetic data? Realistic datasets are often paywalled, privacy-restricted, expensive to annotate, or unavailable in emerging domains. In fields like clinical AI, synthetic data offers a practical alternative—and in some cases, can even outperform models trained on real data.

​We begin with templates from healthcare, civic tech, and humanitarian aid, then invite you to design your own schema for any domain. 

​Open to all; basic familiarity with Python and Jupyter or Colab is recommended.

About Shayan:
Shayan Chowdhury (Columbia '26, CS & Policy) has worked across medical AI research at Harvard Med, disaster relief coordination in 38+ countries through his nonprofit Reach4Help in partnership with the UN and Google, and COVID-19 data infrastructure for the Bangladesh government — the common thread being using data and AI to make systems work for people who usually don't get a seat at the table. He'll kick off with a 30-minute talk walking through his journey and how synthetic data and multi-agent systems show up in real research and production, before we get into hands-on coding. In his free time, he plays guitar and sings jazz-pop mashups of Frank Sinatra and The Weeknd that absolutely no one asked for. 


This workshop is planned to take place in-person only (Milstein 516).

​​​​​​We look forward to seeing you there!

​​​​​​For more information about the Barnard CSC, go to https://www.csc.barnard.edu or follow us on Instagram and X (@barnard_csc).