Walk into any market research conference today, and you’ll hear it—the question that’s dividing our industry down the middle:
“Are surveys dead?”
Some experts are declaring that synthetic data will revolutionize everything we know about consumer research. Others are calling it overhyped nonsense that threatens the very foundation of authentic insights. And honestly? Both sides have compelling arguments.
If you’re a market researcher, brand manager, or insights professional feeling confused about where synthetic data fits into your toolkit, you’re not alone. Let’s cut through the noise and examine what’s really happening in this heated debate.
The Statement That Started a War
The 2024 conference circuit exploded with provocative declarations. “Synthetic data is as good as real!” proclaimed some vendors. “Traditional surveys are obsolete!” shouted others.
The buzz was impossible to ignore. Here’s what sparked the controversy:
69% of market research professionals used synthetic data in their research last year. That’s not a small pilot group—that’s the majority of the industry experimenting with something fundamentally new.
But here’s where it gets interesting: When asked about their experience, only 31% rated the results as “great” when synthetic data was used on its own.
So what’s the truth? Is synthetic data the future of research, or is it fool’s gold?
What Exactly Is Synthetic Data, Anyway?
Before we dive deeper into the debate, let’s make sure we’re all speaking the same language.
Synthetic data is artificially generated information created by AI models to mimic real-world survey responses and consumer behavior. Think of it as AI creating “digital twins” of your target audience—virtual respondents who answer questions based on patterns learned from actual human data.
Here’s how it works in practice:
- Training Phase: AI models analyze vast amounts of real consumer data—past surveys, behavioral data, demographic information, purchase patterns
- Generation Phase: The AI creates synthetic respondents that statistically resemble real people
- Application Phase: These synthetic respondents “answer” your survey questions based on learned patterns
Sounds futuristic, right? It is. And it’s already happening at scale.
The Case FOR Synthetic Data: Why Believers Are All In
Let’s be fair to both sides. The advocates of synthetic data aren’t crazy—they’re responding to very real pain points in traditional research.
1. Speed That Seems Impossible
Remember when getting survey results took 4-6 weeks? With synthetic data, you can have initial findings in days or even hours.
Real-world example: When EY tested synthetic data against their actual annual brand survey (which typically surveyed CEOs of US companies with over $1 billion in revenue), they got results in days instead of months. The synthetic approach cost a fraction of the price of traditional methods.
That’s not incremental improvement—that’s transformation.
2. The Cost Equation Changes Everything
Traditional surveys are expensive. Between panel recruitment, incentives, data collection, cleaning, and analysis, costs add up quickly. For many companies, this means research becomes a luxury reserved for major decisions only.
Synthetic data flips this model. Once you’ve trained the model, the marginal cost of additional research approaches zero. Suddenly, you can test ten product concepts instead of two. You can run monthly brand trackers instead of annual ones.
For startups and smaller companies especially, this democratizes access to insights that were previously out of reach.
3. Reaching the Unreachable
Try surveying 500 C-suite executives at Fortune 500 companies. Good luck with that—and your budget better have six figures allocated.
Synthetic data can help augment samples from hard-to-reach populations, boosting representation without the nightmare logistics of actually recruiting rare audiences.
4. Privacy Compliance Made Easier
With GDPR enforcement hitting €1.6 billion in fines in 2023 alone, privacy isn’t just a nice-to-have—it’s business-critical.
Synthetic data offers a privacy-preserving alternative. You can analyze patterns and behaviors without handling actual personal identifiable information. For regulated industries like healthcare and finance, this is huge.
5. The EY Study That Turned Heads
Here’s the stat that made even skeptics pause:
When EY compared results from one thousand synthetic personas to their actual survey results, they found a 95% correlation.
Let that sink in. That’s not “close enough”—that’s nearly identical results at a fraction of the time and cost.
This wasn’t some softball comparison either. This was their annual brand survey targeting CEOs—exactly the kind of strategic research where accuracy matters most.
The Case AGAINST Synthetic Data: Why Critics Are Sounding Alarms
Now, before you rush to cancel all your survey panels, let’s hear from the other side. Because the critics aren’t Luddites resisting change—they’re raising legitimate concerns about accuracy, ethics, and what we might lose.
1. The “Overhyped” Reality Check
The Market Research Society released a comprehensive report that didn’t mince words: Claims that synthetic data will completely replace primary data collection are unlikely, and the technology is “currently over-hyped.”
Why the harsh assessment?
Synthetic data is fundamentally retrospective. It can only replicate patterns that already exist in the training data. It cannot capture emerging trends, shifting cultural dynamics, or genuinely novel consumer behaviors.
When the pandemic hit in 2020, no amount of synthetic data could have predicted how dramatically consumer behavior would shift. You needed real people, experiencing real situations, sharing real reactions.
2. The Bias Amplification Problem
Here’s a scary truth: If your training data has biases (and it probably does), your synthetic data will amplify them.
Remember Amazon’s infamous recruiting AI that accidentally learned to prefer male candidates? That’s what happens when bias lurks in training data.
One researcher studying synthetic data outputs discovered that AI models consistently replicate and sometimes magnify the biases present in their source material. An ethical nightmare waiting to happen.
3. The Emotion Problem
Synthetic data struggles with something humans excel at: emotional nuance.
As one associate professor studying AI outcomes notes: “Our brains and emotions are highly complex. AI can provide good analysis of what someone has said, but it’s less effective at understanding the emotions that underpin people’s responses.”
When a consumer says they “like” a product, are they:
- Genuinely enthusiastic?
- Politely indifferent?
- Conflicted but hopeful?
- Settling because nothing better exists?
Real human researchers pick up on these subtleties. AI? Not so much.
4. The Missing Context
Here’s what synthetic data can’t capture: The lived experience of being human.
One research agency shared a powerful example: They conducted qualitative research for a global bank in the UAE and discovered that the use of English in campaigns significantly impacted effectiveness across different ethnic groups.
This insight—which led to a successful multi-lingual, multi-ethnicity campaign—came from direct interaction with real people in their real cultural context. No synthetic model trained on historical data would have surfaced this nuanced, market-specific finding.
5. The “Convincing vs. Accurate” Problem
Perhaps the most insidious issue: AI is exceptionally good at producing results that sound right, even when they’re wrong.
As industry experts warn, producing convincing answers is different from providing accurate ones—especially when making business decisions that rely on data integrity.
It’s the difference between a confident liar and a cautious truth-teller. Synthetic data delivers answers with certainty, whether or not those answers reflect reality.
The Data Quality Paradox
Here’s where things get really interesting—and complicated.
When synthetic data performs well, it’s usually because it was trained on high-quality real data. In other words, synthetic data is only as good as the real research that feeds it.
Studies show that when generative AI is trained on primary research results, synthetic datasets perform remarkably well. But when used in isolation, without that real-world foundation, quality plummets.
This creates a paradox: To get good synthetic data, you first need good real data. So synthetic data doesn’t replace traditional research—it augments it.
Or, as one research director put it: “Synthetic data used to be trained on something real. The output is only as good as the input.”
What Industry Leaders Are Actually Doing
So what’s the practical reality on the ground? Most sophisticated research teams aren’t choosing sides—they’re developing hybrid approaches.
The Smart Money Strategy:
- Use traditional research for:
- Exploring new, undefined territories
- Understanding emotional drivers and motivations
- Capturing emerging trends and cultural shifts
- Strategic decisions with high stakes
- Initial model training and validation
- Use synthetic data for:
- Rapid concept testing and iteration
- Questionnaire testing before field deployment
- Sample augmentation for rare audiences
- Scenario simulation and “what-if” modeling
- High-volume, repetitive tracking studies
- Never use synthetic data alone for:
- Novel market situations
- High-stakes business decisions
- Understanding complex human emotions
- Detecting emerging trends
- Final validation before major investments
The Research Role Is Evolving, Not Disappearing
Here’s the thing nobody’s saying loudly enough: Even if synthetic data becomes more accurate, the researcher’s role becomes MORE important, not less.
In the age of synthetic data, researchers must transform from being primarily “data collectors” and “report generators” to becoming:
- Insight Validators: Determining what synthetic findings are trustworthy
- Governance Experts: Setting standards for when and how to use synthetic data
- Quality Controllers: Ensuring outputs meet rigorous standards
- Strategic Interpreters: Providing context AI cannot
Think of it like calculators in mathematics. When calculators arrived, math teachers didn’t become obsolete. They stopped spending time on arithmetic and started teaching higher-order thinking.
The same evolution is happening in market research.
The Validation Non-Negotiable
If there’s one point of universal agreement in this debate, it’s this: Synthetic data findings must be validated before making major decisions.
The most responsible practitioners treat synthetic research outputs as “well-informed hypotheses” rather than ground truth. Before strategic or financial commitments, these hypotheses need validation through:
- Cross-referencing with existing real-world data
- Small-scale traditional research with human participants
- Pilot testing in limited markets
- A/B testing against control groups using conventional methods
Companies that skip this validation step are playing Russian roulette with their business decisions.
The Regulatory Gray Zone
Here’s something that should make everyone nervous: Some universities and research institutions are waiving ethical review requirements for studies using synthetic data instead of human data.
The logic? Since synthetic data doesn’t involve actual humans, traditional protections around participant rights, safety, and dignity don’t apply.
But this creates a troubling precedent. What happens when synthetic data claims to represent vulnerable populations but perpetuates harmful stereotypes? Who’s accountable?
Industry experts are calling for reporting standards for synthetic data, similar to those that exist for data and code availability. We need guardrails before problems multiply.
The Financial Stakes Are Enormous
Let’s talk money for a moment, because that’s ultimately what’s driving much of this debate.
The synthetic data generation market is exploding:
- Market size projected to reach $9.3 billion by 2032
- Growing from just $0.4 billion in 2024
- Driven by both legitimate innovation and aggressive vendor hype
Traditional research companies see synthetic data as both an opportunity and an existential threat. DIY research platforms have already surpassed $3.5 billion with “no signs of saturation.”
The companies that figure out how to integrate synthetic capabilities while maintaining quality standards will thrive. Those that resist all change—or those that sacrifice accuracy for speed—will struggle.
The Gartner Prediction Everyone’s Watching
Perhaps the most talked-about forecast: Gartner estimates synthetic data will overshadow real data for training AI models by 2030.
That’s just five years away.
But note what Gartner actually said: “for training AI models.” They didn’t say “for all market research” or “for strategic business decisions.” The nuance matters.
Synthetic data will likely dominate certain applications while remaining inappropriate for others. The winners will be those who understand the difference.
So… Are Surveys Actually Dead?
After examining both sides, here’s my take:
No, surveys aren’t dead. But they’re evolving.
The provocative “Are surveys dead?” question is designed to generate clicks and conference attendance (guilty as charged). The reality is far more nuanced.
What’s actually happening:
- Long, boring surveys are dying (and good riddance)
- One-size-fits-all research is dying (we won’t miss it)
- Expensive, slow, inefficient research processes are dying (overdue)
But asking people questions to understand their needs, behaviors, and motivations? That’s not dying. It’s being augmented, enhanced, and made more efficient.
The Hybrid Future We’re Already Living
The real answer isn’t “synthetic OR traditional.” It’s “synthetic AND traditional, applied strategically.”
Here’s what best-in-class research looks like in 2025:
Phase 1: Use synthetic data for rapid ideation and concept screening Phase 2: Validate promising concepts with small-scale human research Phase 3: Use synthetic data to optimize and refine based on human feedback Phase 4: Launch with traditional tracking to monitor real-world performance Phase 5: Feed real performance data back into synthetic models for continuous improvement
It’s a virtuous cycle where synthetic speed and traditional depth reinforce each other.
What You Should Do Right Now
Whether you’re a research professional, brand manager, or insights leader, here’s my practical advice:
1. Experiment, But Don’t Bet the Farm Run synthetic data pilots on low-stakes projects. Learn its strengths and limitations firsthand. But don’t base your company’s strategic direction on synthetic insights alone.
2. Build Validation Into Your Process Make validation mandatory for synthetic findings before major decisions. No exceptions. Ever.
3. Invest in Training Your team needs to understand both traditional research methods AND new AI capabilities. The future belongs to bilingual researchers who speak both languages fluently.
4. Question the Hype When vendors promise synthetic data will solve all your problems, ask for evidence. Request case studies. Demand transparency about accuracy rates and limitations.
5. Prioritize Quality Over Speed Yes, synthetic data is fast. But fast, wrong answers are worse than slow, right ones. Always.
6. Champion Ethics Push for industry standards around synthetic data use. We need guardrails before we have disasters.
The Bottom Line
The synthetic data debate isn’t really about whether technology can replace humans in market research. It’s about how we adapt our methods as capabilities evolve.
The truth both sides can agree on:
- Synthetic data has legitimate, valuable use cases
- It also has serious limitations that shouldn’t be ignored
- The future involves both synthetic and traditional methods
- Researchers who adapt will thrive; those who don’t will struggle
The industry isn’t dying—it’s being reborn. And that’s uncomfortable, messy, and ultimately exciting.
Are surveys dead? No. But the survey you conducted in 2015 definitely is. And it should be.
The question isn’t whether to embrace synthetic data or reject it. The question is: How do we use it responsibly to deliver better insights while maintaining the rigor and accuracy that makes research valuable in the first place?
That’s the debate we should be having.




