More Data Every 2 Days - Island of Sanity

Island of Sanity


More Data Every 2 Days

I've heard a statistic going around the Internet that today we produce as much data every 2 days as was produced in all of human history up to the year 2003. Apparenlty this factoid comes from Google CEO Eric Schmidt.

I'd be interested to see the calculations behind this claim. Because it sounds wildly implausible to me. If it's technically true, it's highly misleading.

I don't know how he calculated how much data was produced by all the people who ever lived up to 2003. To the best of my knowlege, all the server logs of the ancient Sumericans were deleted long ago. But let me do my own rough ballpark estimate.

No one knows just how many people were alive at any given time in the past until around the 1700s. Estimates vary wildly. I've seen estimates of around 200 million for circa AD 1. Of course population has grown over time, so I think it's conservative to suppose an average world population of 300 million from AD 1 to 2003.

Let's be yet more conservative and only count data produced since AD 1. Suppose that the average person produced just 1 page of information per week -- at least a shopping list or notes on which slaves needed a beating or whatever. Then they produced 300 million people x 1 page per week x 52 weeks per year x 2000 years = 31 TRILLION pages of information.

For us to produce the same amount of information every 2 days, we'd have to produce 31 trillion divided by 7 billion people = over 4400 pages every 2 days or 2200 pages per person per day. Do you produce 2200 pages of information per day? I certainly don't.

Now granted, most people back then were illiterate. But they could draw pictures and make scratch marks, and for this statistic to be remotely plausible Mr Schmidt must be including image files, so to be fair we have to include pictures drawn by ancient people, too. If a modern person's selfie with their cat counts, then so does an ancient person's drawing of their pet saber tooth tiger. A page of text is about 3 kB. A modest sized JPEG image might run around 30 kB, or as much data space as 10 pages of text. So if the average illiterate ancient person drew one picture every 10 weeks, he'd be meeting my 1 page per week estimate.

Of course we don't have 31 trillion pages of surviving data from past times. Most of the data produced in the past is long since lost. But counting only surviving data is surely not a fair comparison. How much of the data produced today will still be available 100 or 1000 years from now?

It's possible that we're producing this much data if you include video. Video consumes huge amounts of memory. I just checked a couple of videos I have on my hard drive and they take over 1 MB per second, and they're 640x480, pretty low resolution. But if that's where it comes from, the statistic is wildly misleading. The text "1, 2, 3, testing" takes 16 bytes. A video of someone standing in front of a blank wall saying "1, 2, 3, testing" could easily take 5 megabytes. The video takes hundreds of thousands of times more disk space than the text, but it is surely not hundreds of thousands of times more real information.

If you consider video, comparison to the past becomes difficult. If I watch a 90 minute movie at the theater, that's surely the information equivalent of an ancient person going to the theater and watching a 90 minute play. If I listen to the candidates debate on TV, that's surely the information equivalent of an ancient person listening to candidates debate in the forum. But how would you begin to measure that? Does it not count because it wasn't recorded?

If we're counting total number of bytes stored, image files totally swamp text and video swamps still images. Assuming one page = 3 kB, 2200 pages would be about 7 MB, or maybe 7 seconds of video. Does the average person today take more than 7 seconds of video per day? Maybe.

The only way I see that Mr Schmidt's statistic could possibly be true is if he is counting number of bytes required to store information, and so counting a frame of video as the equivalent of several pages of printed text. But if so, the statistic means a whole lot less than it sounds like it means. It certainly does not mean that we are producing more real information, more knowledge, every 2 days than the world produced up to the year 2003. Yes, a 10 minute video of a baby making cute noises or a cat falling off the sofa might take more bytes than the ASCII text of the complete works of Shakespeare, all of Einstein's books and papers, and the Babylonian Talmud combined. But sorry, it is not more "information" in any real sense.

© 2015 by Jay Johansen


No comments yet.

Add Comment