Thursday, April 20, 2006

Further quantifying PDF

In my April column (Jim Lyons Observations) I referred to some installed base data that helped make my case about PDF's solid position as an entrenched file interchange standard. But then a dedicated reader communicated with me that he may buy my argument but challenged my numbers, especially the "20 million PDF files" number that he remarked seemed "way low".

I remembered having similar feelings when putting the column together, and like on so many other occasions during my business career, I faced the dilemma of quickly trying to develop a gut feel about raw numbers -- what's a lot, what's not, what's altogether laughable and not even in the ballpark? So as a result, I revisited the topic with a little back-of-the-envelope math in a blog posting on April 14, with the "null hypothesis" that the 20 million PDF files on the Internet, the main bone of contention, was a correct number.

But now, my original source of the PDF data has come in with some further clarification that I really appreciate. And the 20,000,000 files is not only an old number (four years old), it apparently only applies to .gov web sites. (My friend that disputed my data was barking up the same tree, actually, wondering if there were some definitional problems with "public internet" versus intranet, etc.)

So what's the real number? According to my data source, there are more like 613,000,000 PDF files out there on the "surface web", and included in that number are 129,000,000 .gov PDF files, meaning that number has grown more than 6x since the 20,000,000 number reported from four years ago.

Wow, all interesting stuff. Going further (is this wise?), the ratio of Acrobat readers to files is virtually 1:1, but of course this is really still an oranges to apples number, because readers are used to view lots of files, at least in my case, that are not on "the surface web", e.g. email attachments.

Enough for now, except to thank my challenging reader and my generous data-supplier. And what's next? I see a Wikipedia entry...

No comments: