People, Not Proxies: Truthset CEO Scott McKinley on Fixing Data Quality in Ad Tech

People, Not Proxies: Truthset CEO Scott McKinley on Fixing Data Quality in Ad Tech

In this episode of Big Brains, Media Cartographer Evan Shapiro chatted with Scott McKinley, founder and CEO of Truthset, for a candid conversation about the state of data in advertising. In a world flooded with device IDs, phantom profiles, and flawed identity graphs, McKinley is on a mission to bring clarity and accountability to audience data. Their discussion covers everything from the misuse of IP addresses to why the current digital ad ecosystem suffers from a compound error problem—and how Truthset aims to change that. If you're in the business of data, media, or marketing, this conversation is a must-watch.

Evan Shapiro: Scott, it feels like Truthset is dedicated to creating a higher standard of data. Is that the mission you’ve taken on?

Scott McKinley: Yeah, that is the mission. It's about recognizing that not all data is created equal. There's great data, middling data, and not-so-great data. We're trying to expose that to buyers and sellers so that—just like octane ratings at a gas pump—you can choose the level of accuracy you want for a given operation. We're doing this across all audience datasets.

ES: People nod when I say not all data is created equal, but they don’t necessarily understand what that means. How does bad data get into the market, and how do we know what’s good versus bad?

SM: That’s really two answers. First, it’s just hard to be precise about who a person is—their demographics, behaviors, all of it. Companies have been at it for decades, and even within data collectives, they still don’t get everyone right. If I ask 22 different providers about you, I’ll get 22 different answers. It’s partly because no one can serve everyone and get it right every time. There’s always error. That’s what we’re trying to identify—so buyers and sellers can parse it out.

Second, there’s a perpetual incentive to put more identities into more targeting buckets. The incentive is to sell your record as many times as possible. And before Truthset, there was no accountability—someone could put you in both male and female segments. And they do it all the time.

ES: And it’s not just that they’re overusing an identity. Sometimes the identity doesn’t even exist—it’s a phantom.

SM: Exactly. That’s what we’re solving. Because we have so many high-quality data providers in our cooperative, we can get it down to the human signal that actually exists. We pin everything to census data.

I once worked at a company where the head of data science claimed he could deliver six million left-handed soccer moms in Kansas City. There are only 1.2 million people in Kansas City! It breaks down fast. We’re trying to shift the industry from noisy, imprecise data to a focus on real people. There are 20 billion device IDs out there—but a device ID is just a hexadecimal value. It doesn’t have a pantry, it doesn’t drive a car. It’s the person behind it that you want to reach. We're all about people—not proxies.

ES: I worked on a project with Go Addressable and your team on a study with SIM. The topline data showed that the average data provider attributes two to four times as many IP addresses to a household than actually exist. How is that possible?

SM: Picture a monsoon of device IDs—they’re constantly coming and going. They get refreshed and deprecated. About 1% of IPs refresh daily, which means all of them refresh every three months. How can you build a reliable profile on something that transient?

IP addresses shouldn’t replace cookies. Did we not learn the lesson with cookies? They were never designed for tracking—just to remember you when you log into your bank. They’ve been abused. IPs might even be worse.

ES: Right. We’re replacing IPs continuously to protect user privacy, and then trying to use them to identify users? That’s absurd.

SM: Exactly. Some IPs represent 40 or 50 households—especially in multi-dwelling units. How can you possibly tell one apartment from another with that?

To put demographics on an IP, you need to:

  1. Associate it with a household (50% accurate),
  2. Associate that household with individuals (51% accurate),
  3. Assign attributes to those individuals (30–60% wrong on average). That’s a compound accuracy problem. You’re down to 7% real accuracy, or looking at 150% potential error. Why is that acceptable?

ES: It’s not. So the answer is to get data from high-quality providers. Who qualifies?

SM: Our model includes most major U.S. data providers—Epsilon, Experian, TransUnion, and others—who’ve agreed to let us analyze their data, even against competitors. You shouldn’t have 20 different versions of who someone is. You’re objectively a real person—I can see you.

That kind of inconsistency is a huge problem for open programmatic, and it will be for television too. Walled gardens have near-perfect identity and household resolution. That’s a massive advantage.

ES: Exactly. I tell people all the time—we can't compete with that level of identity precision. We need a shared data lake to level the playing field.

SM: It’s not just the identity graph. Walled gardens don’t have the crazy data supply chain we do. Data can start at 90% accuracy and drop to below 30% by the time it reaches a screen, thanks to all the hops and transformation. That’s a massive problem.

ES: I did a study this spring that showed 80% of CTV campaigns are missing their target—most of it is wasted.

SM: So let’s turn this positive. First, let’s stop pretending it’s fine—it’s not. This is existential for anyone not inside a walled garden. The answer has to be collaboration. We’re trying to build a universal standard that everyone can use to determine if a segment makes sense.

It’s good for consumers, who get more relevant ads. Good for brands, who convert. Good for publishers, who can justify better CPMs instead of racing to the bottom.

ES: You know how much I believe in this. If we can collaborate and create a data lake as large and cared-for as the walled gardens—especially with first-party and zero-party data—then we can provide a real alternative to companies grading their own homework.

SM: Exactly. And we’re excited to keep building that alternative.