How the Personal Data Extraction Industry Ends

Doc Searls
11 min readAug 28, 2017


We’ve passed Peak Data.

Meaning the maximum rate of personal data extraction can no longer be maintained. Not even by Google and Facebook, the data extraction and refining equivalents of of Standard Oil in 1910.

Predictably, there is much demand that these companies be broken up, much as Standard Oil was in 1911. But history moves a lot faster in the digital world, and the stink downwind of the data extraction business (a metaphor I’m borrowing from Jonathan Taplin’s Move Fast and Break Things — How Facebook, Google, and Amazon Cornered Culture and Undermined Democracy) is growing past the tolerance of pretty much everybody.

Google and Facebook are also not alone in fracking, refining, selling and re-selling personal crude. Take a look at Scott Brinker’s 2017 Marketing Technology Landscape, which is packed with over 5000 companies:

In his 2015 Idlewords essay on the advertising bubble, Maciej Ceglowski puts earlier generations of this same graphic to helpful use. Like Hugh at the top and me as far back as 2008, Maciej was ahead of his time. Which will come.

I would guess that most of those either practice or rely on data extracted from people who would rather see the drilling stopped.

So far, however, we can’t stop it. Not personally. That’s because all the controls are proprietary valves buried deep in the settings panels and pages of all the different sites and apps we rely on: one for every login and password we (or our machines) need to remember.

We have no privacy panel of our own, no dashboard for controlling what we reveal and conceal as we interact with the connected world. Hell, we don’t even have the rudimentary equivalents of clothing and shelter.

In the absence of those, the connected world is one big egology: a monoculture of companies that all think they are in charge of the “experiences” of their “users” or “consumers.”

This can’t last, for the simple reason that permission for extraction is in limited supply from you, me and regulators who are waking up to the problem. Together we’ll find ways to stop it, and to help every business in the online world deal with fully empowered human beings

What we’ll go through in the meantime was nicely illustrated by Hugh MacLeod (aka @gapingvoid) in 2004:

Egology is how big business has been thinking and working ever since industry won the industrial revolution. Ecology in our digital age will inevitably include fully empowered individuals. It has to, because that’s what the Internet supports, by design. (For more on that, see The Giant Zero.)

In the absence of clothing, shelter and standard easy-to-use dashboards on the vehicles we drive around the Net (which are already in the works, and we’ll get to those), it is only sensible to look for policy solutions.

There are some good ones already. The largest and most leveraged of these is the General Data Protection Regulation (GDPR) in Europe, which pretty much makes the personal data extraction business illegal.

Sure, there are workarounds, but both the letter and spirit of the GDPR are clear: don’t collect personal data from or about anybody without their clearly expressed permission.

Go to business conferences in Europe and you’ll find the letters GDPR displayed prominently in nearly every booth’s displays, and featured in seemingly every panel and keynote talk. The GDPR isn’t there because it’s interesting, but because it is threatening.

The penalties for violating the GDPR can be huge (“up to 4% of the annual worldwide turnover of the preceding financial year”), and the EU has already made an example of Google with a recent fine of $2.7 billion. (It was for an antitrust violation, but that’s beside the point, which is how eager the EU is to enforce laws against giant American data extraction companies. It’s also instructive to notice how easily Google paid the fine, I suppose with money it found under the cushions of some couch.)

Penalties for GDPR violations start on the 25th of next May. That’s the effective deadline for getting into compliance. That’s surely also why there’s is an upward trend in searches for “gdpr”:

Interesting thing: look “GDPR” up on Google and you’ll find the results topped by ads from some of the same suppliers that sold big companies on getting into the personal data extraction business in the first place.

It should help therefore to look back on how Big Data became a Big Thing. For that, start here:

As I wrote in this November 2015 Linux Journal column,

What happened in 2011? Did Big Data spontaneously combust? Was there a campaign of some kind? A coordinated set of campaigns?

Though I can’t prove it (at least not in the time I have), I believe the main cause was Big data: The next frontier for innovation, competition and productivity, published by McKinsey in May, 2011, to much fanfare. That report and following ones by McKinsey drove publicity in Forbes, The Economist, various O’Reilly pubs, Financial Times and many others — while providing ample sales fodder for every big vendor selling Big Data products and services.

Among those big vendors, none did a better job of leveraging and generating buzz than IBM. Here is a search for IBM+”Big Data”, for calendar years 2010–2011. Note that the first publication in that search, “Bringing big data to the Enterprise,” is dated May 16, 2011, the same month as the McKinsey report. The next, “IBM Big Data — Where do I start?” is dated November 23, 2011.

Here is a Google Trends graph for “McKinsey big data” and “IBM big data”:

Coincidence? Here’s more:

See that bump for IBM in late 2010? That was due to a lot of push on IBM’s part, which you can see in a search for IBM and big data just in 2010 — and a search just for big data. So there was clearly something in the water already. But searches, as we see, didn’t pick up until 2011. That’s when the craze hit the marketplace, as we see in this search for IBM and another four big data vendors:

So, while we may not have a clear enough answer for the cause, but we do have clear evidence of the effects.

Next question: to whom do those companies sell their Big Data stuff? At the very least, it’s the CMO, or Chief Marketing Officer — a title that didn’t come into common use until the dot-com boom, and got huge after that, as marketing’s share of corporate overhead when up and up. On February 12, 2012, for example, Forbes ran a story titled Five Years From Now, CMOs Will Spend More on IT Than CIOs Do. It begins,

“Marketing is now a fundamental driver of IT purchasing, and that trend shows no signs of stopping –or even slowing down –any time soon. In fact, Gartner analyst Laura McLellan recently predicted that by 2017, CMOs will spend more on IT than their counterpart CIOs.

“At first, that prediction may sound a bit over the top. (In just five years from now, CMOs are going to be spending more on IT than CIOs do?) But, consider this: 1) As we all know, marketing is becoming increasingly technology-based, 2) Harnessing and mastering Big Data is now key to achieving competitive advantage, and 3) Many marketing budgets already are larger –and faster growing –than IT budgets.”

In June, 2012, IBM’s index page was headlined, “Meet the new Chief Executive Customer. That’s who’s driving the new science of marketing.” The copy was directly addressed to the CMO. In response, I wrote Yes, please meet the Chief Executive Customer, which challenged some of IBM’s pitch at the time. (I’m glad I quoted what I did in that post, because all but one of the links now go nowhere. The one that works redirects from the original page to “Emerging trends, tools and tech guidance for the data-driven CMO.”)

According to Wikibon, IBM was the top Big Data vendor by 2013, raking in $1.368 billion in revenue. In February of this year, Reuters reported that IBM “is targeting $40 billion in annual revenue from the cloud, big data, security and other growth areas by 2018”, and that this “would represent about 44 percent of $90 billion in total revenue that analysts expect from IBM in 2018”.

And now IBM is selling compliance services and tech in ads above results of Google searches for GDPR. Egology at work. (Here’s one, which for yuks I include with tracking cruft appended to the URL.)

Even if surveillance marketers (the alpha egologists) find ways around the GDPR (which some will), advertisers themselves are starting to realize two things:

  1. Tracking people like animals fails outright. As Procter & Gamble is now discovering. (Bonus fact: no company sees an advertiser when they look in the mirror. That label is applied from the outside. Inside a business, advertising is a line item on the expense side of the balance sheet. They can cut it in an instant. I know this well, having worked in the advertising business for much of my life.)
  2. The human beings who constitute the actual marketplace have mounted the biggest boycott in world history against it, in the form of ad blocking and tracking protection. (If you think this is a problem, you’re in the personal data extraction industry. If you think this is a solution, you’re among the 1.7 billion people who would rather not be fracked, and are doing something about it.)

It should help to consider how fast an egology can change.

In the history of Silicon Valley, companies like Facebook and Google have often proved epiphenomenal: meaning temporary, though not apparently so at the heights of their power. Large and all-powerful though they may be today, companies with single threatened sources of income tend to be shallow and temporary effects rather than deep and enduring causes.

This is especially the case for companies whose main business is selling their B2C consumers to their B2B customers, which what both Facebook and Google do.

For a real world example of how this split can deafen an entire industry to what’s actually going on, consider commercial broadcasting. It too sells B2C eyeballs to B2B advertisers, and remains loath to admit that what they call “over the top” (e.g. Netflix and subscription Internet access) is already the new bottom. (By Q1 of 2017, there were more subscribers to Netflix than to cable TV.)

I can name many long-gone companies that once occupied Google’s and Facebook’s locations in Silicon Valley. I am also sure many new companies will occupy the same spaces in a fullness of time that will surely include at least one Next Big Thing that obsolesces advertising as we know it today online.

For example, what happens if we discover we don’t need advertising at all?(Guess what Zara, Trader Joe’s, Tesla and Krispy Kreme all have in common.)

None of today’s personal data extraction companies are utilities on the scale or the importance of power and water distribution (which we need to live), or the extraction industries behind both. Nor have today’s data extraction companies benefitted from the corrective influence of fully empowered individuals and societies: voices that can be heard directly, consciously and personally, C2B, rather than existing as mere data flows observed by machines. (Examples of successes at C2B listening: Apple and Amazon.)

Direct customer influence will be far more helpful than anything today’s data extractors can learn just by following our shadows and sniffing our exhaust, mostly against our wishes.

(If you think people’s apparent tolerance for spying and absent privacy makes that stuff okay, read The Tradeoff Fallacy: How Marketers are Misrepresenting American Consumers and Opening Them Up to Exploitation, a report by Joseph Turow, Michael Hennessy and Nora Draper of the Annenberg School for Communication at the University of Pennsylvania. )

Our influence will be most corrective when personal data extraction companies become what lawyers call second parties. That’s when they agree to our terms as first parties. These terms are in development today at Customer Commons, Kantara and elsewhere. They will prevail once they get deployed in our browsers and apps, and companies start agreeing (which they will in many cases because doing so gives them instant GDPR compliance).

We’ll start with publishers and the advertisers that support them.

From there we’ll move on to develop a full portfolio of customertech. That’s tech the customer has, starting with the online equivalents of clothing and shelter.

In Customertech Will Turn the Online Marketplace Into a Marvel-Like Universe in Which All of Us are Enhanced, I list many developments that are sure to arrive once it’s clear to the business that free customers are more valuable than passive or captive ones, and that data given voluntarily, and for reasons that make sense to both parties, is more valuable than the involuntarily extracted kind.

To spare you the labor of clicking on that last link, and to encourage you to help develop the customertech we’ll need, here’s partial list of what we’ll be talking about tomorrow at the Computer History Museum in Silicon Valley:

It’s free. Register here.

Since the museum is across the street from Google and down the road from Facebook, I invite people from both companies to show up too. There is far more opportunity in boundless markets that will grow naturally around full-powered customers than in markets limited to guesswork based on data extracted from mere consumers.

Let’s make those markets, and finish moving past Peak Data.


The original version of this essay was published at Doc Searls Weblog on 27 August, 2017.



Doc Searls

Author of The Intention Economy, co-author of The Cluetrain Manifesto, Fellow of CITS at UCSB, alumnus Fellow of the Berkman Klein Center at Harvard.