sample this.

sample this.

................. 1 0 / 2 6 / 2 0 0 0

The Wall Street Journal ran an article saying PriceGrabber is "off the radar". Some digging determined that this crack was based on data from Media Metrix, which attempts to rate overall website traffic via carefully (and hopefully, randomly) selected panels of live surfers. After much arm-twisting and ass-kissing, we extracted from Media Metrix what we really wanted to know: our site has a 0.3 share, with a confidence interval of 0.13, or 0.3 ± 43%.

Forty-three percent error. In either direction.

And that number was for this month. The WSJ article was based on data for June, which would have had an even smaller share, meaning even larger potential error.

This is both infuriating and amusing. The former, obviously, because it's directly affecting us, but amusing because (1) it shows how little homework the Journal did when it picked these numbers as authoritative, and (2) it implies something interesting: in a market with lots of small players, statistical sampling mechanisms a la A. C. Neilson fail badly if they don't use a large enough sample size. The results for TV are obvious: get a small share, lose your audience in the noise -- even if the show is otherwise profitable. This could lead to unjustifiable cancellations. But in increasingly fragmented markets, how can samplers excuse this level of sloppiness? The ability to measure more and smaller audiences will be a requirement, not an option.

On TV, Tivo and its direct reporting kin are the way to fix this problem. But what about the Web? Nobody can rely on logs -- unless the logs are from an independent source, like Doubleclick or L90. Auditing software may also play a role. But regardless of the technological solution, current panel-based methods of sampling are failing, and badly. It may not be possible to profitably estimate website usage with larger samples -- look at the ongoing failure of AllAdvantage for just one such example. In a world of a bazillion channels, Media Metrix will have some 'splainin' to do.

There is, however, another possibility -- that of a massively consolidated web with an equivalent of a big three or big twelve sites. While scary to contemplate, sampling pushes in that direction: it rewards big sites with small confidence intervals. However, I just don't see this happening, not unless the samplers turn all the incoming data to pea soup by reducing every click to --

Yahoo.

AOL.

Lycos.

cnn.com.

That may give teevee types a warm glow inside, but it won't reflect reality. And the advertisers will figure that out pretty quickly. Yahoo may get a lot of hits -- but on which pages?

more gruntles

rlm@scareduck.com
Last modified: Thu Oct 26 05:40:50 PDT 2000