Cogentix Research

EY’s Shocking Discovery: 95% Correlation Between Synthetic and Real Data – What This Means for Market Research

When Carrie Clayton-Hine, EY’s Chief Marketing Officer, first saw the results, she couldn’t believe her eyes.

Her team had just completed EY’s annual brand survey—a comprehensive study targeting CEOs of companies with over $1 billion in revenue. The kind of research that takes months to execute, costs six figures, and forms the foundation of strategic marketing decisions.

Then, a synthetic data company called Evidenza approached with an audacious claim: “Give us your survey questions. We’ll create artificial respondents and give you the same results in days, not months.”

Clayton-Hine was skeptical. Who wouldn’t be? But what happened next sent shockwaves through the market research industry.

“It was astounding that the matches were so similar,” she told AdWeek. “I mean, it was 95% correlation.”

Let that sink in. Not 70%. Not 80%. Ninety-five percent.

For a fraction of the cost. In days instead of months. Using synthetic respondents who never actually existed.

This single study has become the most cited evidence in the synthetic data debate—and it’s forcing every research professional to confront an uncomfortable question: If AI can replicate real human insights with 95% accuracy, what does that mean for the future of our industry?

The Study That Changed Everything

Before we dive into the implications, let’s understand exactly what happened in this groundbreaking test.

The Setup

EY runs an annual brand survey that’s strategically critical to their marketing efforts. The survey targets one of the most difficult audiences to reach: CEOs at large companies (those with over $1 billion in revenue).

The traditional approach involved:

  • Identifying and recruiting CEOs from the target profile
  • Conducting both qualitative and quantitative research
  • Managing complex logistics and busy executive schedules
  • Processing and analyzing responses
  • Timeline: Several months
  • Cost: Six figures

When Evidenza proposed replicating this research using synthetic respondents, EY had nothing to lose. They’d already completed the real survey, so it became the perfect controlled experiment.

The Process

Here’s how Evidenza approached it:

  1. Profile Matching: They created synthetic customers (in this case, synthetic CEOs) to match the exact profile and sample size of EY’s actual survey
  2. Question Administration: They asked the synthetic respondents the same qualitative and quantitative questions EY had used
  3. Data Analysis: They analyzed the synthetic responses using the same methodology
  4. Results Delivery: They sent back findings in just a few days

No panel recruitment. No scheduling conflicts. No respondent fatigue. Just AI-generated insights based on learned patterns from similar populations.

The Stunning Results

When the teams compared the synthetic results to the actual CEO responses, the correlation was 95%.

The synthetic data didn’t just come close—it nearly perfectly replicated what real CEOs said about EY’s brand, positioning, and market presence.

Why This Number Is Controversial (And Misunderstood)

Before the synthetic data evangelists start celebrating and traditional researchers start panicking, we need to talk about what 95% correlation actually means—and what it doesn’t.

What It Means:

The patterns matched. On aggregate measures—brand perception, attribute ratings, preference rankings—the synthetic data showed the same statistical patterns as the real responses.

The methodology worked. For this specific type of research (brand perception among a well-defined audience), synthetic data proved capable of replicating real-world findings.

Speed and cost advantages are real. Days versus months. Five figures versus six. These aren’t marginal improvements—they’re transformational.

What It Doesn’t Mean:

It’s not perfect. That 5% gap isn’t trivial. On critical decisions, even small differences can matter significantly.

It’s not universal. This study tested brand perception among CEOs. That’s different from understanding emerging consumer trends, exploring emotional drivers, or capturing nuanced cultural insights.

The synthetic data still needed real data to train on. This is crucial: The AI models that generated these synthetic CEOs were trained on vast amounts of real executive data. Without that foundation, the 95% correlation would be impossible.

The Provocative Argument: Were the Humans Wrong?

Here’s where things get really interesting—and controversial.

One observer made a striking point about the EY study: “The 95% correlation does not necessarily mean that synthetic data has 5% of ground left to cover. It’s more likely that the inherent sampling errors, subject distractions and signalling biases mean it is the human subjects who were off the pace.”

Think about that for a moment.

Real executives get bored. Survey research shows that CEOs lose patience with questionnaires longer than 20 questions. Their attention wanders. They rush through sections.

Real respondents have bad days. They answer surveys when they’re stressed, distracted, or multitasking. Their mood affects their responses.

Real people give socially desirable answers. They want to appear smart, fair, forward-thinking. They signal rather than reveal their authentic views.

Synthetic customers never falter. They don’t get tired. They don’t misread questions. They don’t give inconsistent answers because they’re thinking about their next meeting.

Could it be that synthetic data isn’t just close to human accuracy—it might actually be more consistent and reliable in certain contexts?

That’s a mind-bending question that challenges everything we’ve assumed about research quality.

Beyond Correlation: What Synthetic Data Can Do That Humans Can’t

The EY study proved synthetic data could match real data. But the story doesn’t end there.

Synthetic data doesn’t just replicate traditional research—it opens possibilities that simply aren’t feasible with human respondents.

The “Helen” Moment

Mark Ritson, the marketing professor and columnist, shared a jaw-dropping experience with Evidenza’s synthetic data technology.

After receiving synthetic research for his Mini MBA product—including category entry points across 10 countries—the team asked if he’d like to chat with “Helen” about one of the findings.

A synthetic persona appeared on screen. An attractive woman in her 40s looked back at him patiently.

“Who is Helen?” Ritson asked, suddenly realizing what he was looking at—a conversational AI persona representing a segment of his target market.

He could ask Helen questions. Probe deeper on her responses. Explore her motivations and concerns. All in real-time.

This isn’t just faster research. This is research that was previously impossible.

Limitless Scenario Testing

With synthetic data, you can:

Test 100 product concepts instead of 5, without budget constraints Run simulations across 50 markets simultaneously Ask follow-up questions to synthetic segments any time Model “what-if” scenarios instantly—what if we changed the price? The positioning? The packaging?

Traditional research requires choosing priorities because of time and cost constraints. Synthetic data removes those constraints.

The Real-World Business Impact

Let’s move from theory to practice. What did this 95% correlation actually enable for EY and others?

Strategic Speed

In a traditional research timeline:

  • Month 1: Finalize questionnaire and recruit panel
  • Month 2: Field research and collect responses
  • Month 3: Clean data and conduct analysis
  • Month 4: Present findings and recommendations

By the time you have insights, market conditions may have shifted. Competitors have moved. Opportunities have passed.

With synthetic data, that four-month process compressed to days.

This isn’t just convenient—it’s strategically transformative. You can iterate on positioning, test multiple campaign concepts, and refine messaging while your competitors are still waiting for their survey results.

Cost Democratization

That six-figure price tag for CEO research? It meant that only the biggest companies with the largest budgets could afford strategic insights.

Synthetic data has reduced that cost to a fraction.

Suddenly, mid-market companies can access the same quality of insights. Startups can validate their positioning against established competitors. Smaller research teams can increase their output by 10x.

This is democratization of insights in the truest sense.

Risk Reduction

Here’s something that doesn’t get talked about enough: Synthetic data reduces research risk.

When you’re testing a new product concept with real consumers, you risk:

  • Competitive intelligence leaks
  • Negative word-of-mouth from poor concepts
  • Alerting the market to your plans before you’re ready

With synthetic testing, you can fail privately, iterate rapidly, and only expose validated concepts to real consumers.

The Critical Limitations Nobody Wants to Talk About

Now let’s pump the brakes. Because for all the excitement around the EY study, synthetic data has serious limitations that get glossed over in the hype.

1. It’s Retrospective, Not Predictive

Synthetic data learns from what has been. It cannot predict what will be.

When the pandemic hit in 2020, no synthetic model could have predicted:

  • The sudden shift to remote work and what that meant for consumer priorities
  • The explosion of home fitness equipment demand
  • The psychological shift in how people valued experiences versus things

You needed real humans, experiencing unprecedented change, to capture those insights.

2. It Misses Emotional Nuance

The EY study measured brand perception and attributes. That’s relatively straightforward—rational, cognitive assessment.

But what about:

  • The emotional resonance of a brand story?
  • The unspoken discomfort with a product category?
  • The conflicted feelings about luxury purchases during economic uncertainty?

Human emotions are messy, contradictory, and context-dependent. Synthetic data smooths out the messiness—which means it can miss the signals hidden in those contradictions.

3. It Amplifies Existing Biases

Remember: Synthetic data is trained on historical data. If that historical data contains biases (and it almost certainly does), synthetic data will amplify them.

Research has shown that AI models consistently replicate and magnify biases present in training data. In market research, that could mean:

  • Underrepresenting emerging consumer segments
  • Reinforcing outdated stereotypes
  • Missing cultural shifts happening in real-time

4. The “Convincing Wrong Answer” Problem

Here’s the most dangerous limitation: AI is exceptionally good at producing answers that sound right, even when they’re wrong.

The confidence of AI-generated insights can be misleading. Synthetic data delivers clean, consistent results with no messy contradictions or unexplained anomalies.

But real human behavior is full of contradictions and anomalies—and sometimes those “messy” findings are where the most valuable insights hide.

5. It Still Requires Real Data

This is the paradox nobody wants to acknowledge: To get good synthetic data, you first need good real data.

The Evidenza models that achieved 95% correlation with EY’s study were trained on vast amounts of actual executive data. Without that foundation, the accuracy would collapse.

So synthetic data doesn’t eliminate the need for traditional research—it augments it. You still need high-quality human insights to train the models.

What Smart Organizations Are Actually Doing

Despite all the hype, the most sophisticated research teams aren’t going all-in on synthetic data. They’re being strategic.

The Hybrid Approach

Here’s the practical framework emerging from industry leaders:

Use Synthetic Data For:

  • Initial concept screening (test 50 ideas, identify 5 winners)
  • Rapid iteration and optimization
  • Scenario modeling and “what-if” analysis
  • Augmenting sample sizes for rare populations
  • Questionnaire testing before field deployment
  • Tracking studies with established baselines

Use Traditional Research For:

  • Exploring new, undefined territories
  • Understanding emotional and cultural drivers
  • Capturing emerging trends and behaviors
  • High-stakes strategic decisions
  • Initial model training and validation
  • Qualitative depth and context

Always Validate Synthetic Findings Before:

  • Major product launches
  • Significant marketing investments
  • Strategic pivots
  • Entering new markets
  • Repositioning established brands

The EY Blueprint

What’s particularly telling is what EY did after discovering the 95% correlation: They didn’t abandon traditional research.

Instead, they developed a hybrid approach:

  1. Use traditional research to establish baselines and train models
  2. Deploy synthetic data for rapid testing and iteration
  3. Validate critical findings with targeted traditional research
  4. Feed new real-world data back into models continuously

This creates a virtuous cycle where synthetic speed and traditional depth reinforce each other.

The Question Nobody’s Asking

Here’s what fascinates me most about the EY study: Everyone’s debating whether synthetic data is “good enough.”

But maybe we’re asking the wrong question.

Instead of asking “Can synthetic data replace real data?” maybe we should ask:

“What becomes possible when we combine synthetic speed with human insight?”

The EY study showed that certain types of research—particularly structured brand perception studies with well-defined audiences—can be replicated with remarkable accuracy.

But the real opportunity isn’t replacement. It’s augmentation.

Imagine this workflow:

  1. Use synthetic data to rapidly test 100 positioning concepts
  2. Identify the top 10 performers
  3. Use traditional research to deeply explore why those 10 resonate
  4. Use synthetic data to optimize based on human insights
  5. Launch with confidence

You get synthetic speed AND human depth. That’s not either/or—that’s multiplicative value.

What This Means for Market Researchers

If you’re a research professional, the EY study should be neither cause for panic nor complacency.

Your job isn’t disappearing. But it is transforming.

From Data Collector to Insight Architect

The researchers who thrive in this new landscape won’t be those who resist synthetic data or embrace it uncritically.

They’ll be the ones who:

  • Understand both traditional and AI-powered methodologies
  • Know when each approach is appropriate
  • Can validate AI-generated insights critically
  • Translate data into strategic recommendations
  • Maintain ethical guardrails and quality standards

Think of yourself less as a data collector and more as an insight architect—someone who designs the optimal research approach for each question, leveraging the best tools available.

The New Core Skills

To stay relevant, researchers need to develop:

1. AI Literacy: Understanding how models work, their limitations, and when to trust them

2. Critical Validation: The ability to interrogate synthetic findings and identify red flags

3. Ethical Judgment: Knowing when synthetic approaches cross ethical lines

4. Strategic Design: Architecting hybrid research approaches that leverage both synthetic and traditional methods

5. Storytelling: Translating complex findings into compelling narratives that drive action

The Uncomfortable Truth

Here’s what the EY study really proves: For certain types of research, with specific audiences, asking specific questions, synthetic data is shockingly accurate.

But notice all those qualifiers: certain types, specific audiences, specific questions.

The danger isn’t that synthetic data doesn’t work. The danger is overgeneralizing from limited success.

Yes, synthetic data achieved 95% correlation with EY’s CEO brand survey. That’s remarkable.

But that doesn’t mean it will achieve 95% correlation when:

  • Exploring emerging consumer trends
  • Understanding emotional decision-making
  • Capturing cultural nuances across markets
  • Predicting response to truly innovative products
  • Detecting weak signals of market shifts

The researchers who succeed will be those who understand these distinctions and deploy tools accordingly.

Five Years from Now

Let me make some predictions about where this is heading:

By 2030:

  1. Synthetic data will be standard for concept screening, tracking studies, and rapid iteration—just as online surveys replaced mail surveys
  2. Traditional research will be more valuable, not less—reserved for high-stakes strategic questions where depth matters more than speed
  3. Hybrid methodologies will be the norm—with most projects using some combination of synthetic and traditional approaches
  4. Regulatory frameworks will emerge—setting standards for synthetic data use, validation requirements, and disclosure
  5. The industry will split—with commodity research firms struggling while strategic insights consultancies thrive

The question isn’t whether synthetic data will have a place in market research. The EY study settled that.

The question is: How do we use it responsibly to deliver better insights?

The Bottom Line

The EY study’s 95% correlation is impressive. Legitimately game-changing for certain applications.

But let’s be clear about what it actually demonstrated:

Synthetic data can accurately replicate specific types of structured researchSpeed and cost advantages are transformationalNew research possibilities emerge when you remove traditional constraints

Synthetic data cannot replace all traditional researchIt still requires real data for trainingCritical strategic decisions still need human validation

The researchers and organizations that will thrive are those who:

  • Embrace synthetic data for what it does well
  • Recognize its limitations clearly
  • Maintain rigorous validation standards
  • Continue investing in high-quality traditional research
  • Develop hybrid approaches that leverage both

The future isn’t synthetic OR traditional. It’s synthetic AND traditional, deployed strategically.

And that future? It’s already here.

Leave a Comment

Your email address will not be published. Required fields are marked *

You can also check:

IndiGo Crisis & The New Reality of Indian Air Travel — What Consumers Really Feel & What Brands Must Learn

The recent IndiGo flight disruptions and mass delays across India sparked nationwide frustration and created a wave of public debate ...
/

The Role of Voice, Video & Social Media Listening in Modern Market Research

Why New-Age Formats Matter in Market Research Consumers today don’t just fill out surveys — they speak, record, react, and ...
/

2026: The Year AI-Powered Chat Commerce Goes Mainstream

The way consumers shop is about to undergo its biggest transformation since the rise of mobile commerce. By 2026, AI-powered ...
/

The 29% Problem: Why Your Survey Data Might Be Worthless

Are you making million-dollar decisions based on fake data? If you're using online surveys, there's a 1 in 3 chance ...
/

EY’s Shocking Discovery: 95% Correlation Between Synthetic and Real Data – What This Means for Market Research

When Carrie Clayton-Hine, EY's Chief Marketing Officer, first saw the results, she couldn't believe her eyes. Her team had just ...
/

“Are Surveys Dead?” The Synthetic Data Debate Dividing the Market Research Industry

Walk into any market research conference today, and you'll hear it—the question that's dividing our industry down the middle: "Are ...
/

Post Launch - Report