Thursday, December 29, 2005

Windscreen wipers in the probabilistic age

The windscreen wiper, patented by Mary Anderson, 1905 You've most likely seen those "did you know?" factoids on packaging for countless disposable consumer items; those often useless pieces of information printed on the matchboxes, milk cartons, beermats, lolly wrappers... Well one such factoid series can be seen on the packaging for certain feminine products and, for these, the "did you know" takes on a bit of a feminist spin. As a result, I recently learnt that the inventors of fire escapes, bulletproof vests, laser printers and windscreen wipers were all women.

Considering these factoids are bound to be correct, based on their source, I thought I would give Wikipedia a little test. In the "fire escape" and "bulletproof vest" articles, no inventors were identified. In the "laser printer" article, a rather male-sounding Gary Starkweather was identified as the first person to create a laser printer. Finally, the "windscreen wiper" article made me happy by identifying Mary Anderson as the inventor and holder of the original patent for windscreen wipers. Thank you Mary! What would we do without you?

What of Wikipedia? Like many before me, I'm forced to concede that as wonderful as Wikipedia is, it is not a particularly authoritative source of information and probably never will be. Someone — probably Jimbo himself — once said "Wikipedia is not an experiment in democracy", but to me that's really what it is and what makes is so much fun to both read and edit. And, yes, of course it's a nice resource to begin one's search for information on a given topic, but not the best place to end it.

At this point I should say I have three things to say about Wikipedia, which are partly in response to some top notch writings about Wikipedia that I've recently come across.

Dice1. Wikipedia articles may have a great probability of being correct: First, a recent post by one of Mary's descendants*, Chris Anderson, recently made a post on his book blog titled The Probabilistic Age. Because it explains and articulates some of the excitement around Wikipedia and blogging, it's gained a lot of attention. I'm just going to quote the start of his post, but Anderson goes a long way with this, and it's well worth reading the whole post plus the trackbacks.
Q: Why are people so uncomfortable with Wikipedia? And Google? And, well, that whole blog thing? A: Because these systems operate on the alien logic of probabilistic statistics, which sacrifices perfection at the microscale for optimization at the macroscale.
In essence this idea is pretty easy to grasp: every item about a given topic (let's say cheese) on the Internet has a certain probability of being correct, some are more likely than others, but if ten items on the net state that Camembert is a creamy cheese, then it becomes much more likely that this is true.

This relies heavily on the artificial intelligence of various mechanisms of the web (search engines, page rankings, keyword and page hit statistics — which we see on Google and Technorati among other sites) as well as the assumption that a huge number of people are continually consuming, evaluating and contributing to the web. In a sense, this is a triumph for those who insist that online communities are self-regulating when a good set of tools are provided to individuals in the community to do the regulating.

Of course, finding out information according to probabilistic models is NOT what we were taught at school. We were told we must look for primary or secondary resources to discover the truth and preferably back up our statements with more than one reputable resource. Unfortunately much of what we find on the net are tertiary or quarternary resources, although, ten years on from the WWW's big bang, the ubiquity of the web means this trend is changing and we see more and more original work duplicated or published solely online.

This seemingly old-fashioned model is still quite valid, however and should be still applied in schools today; that is read widely, ensure the source is reputable, verify the primary and secondary sources, etc.; but special attention should be given to explaining how this process applies to web resources. It should be explained that even if some article has a million edits from people all over the world and links from thousands of websites, and thus has 99% probability of being correct, the original academic resources are listed at the end and are the ones you should read. The question of the validity of Internet-based resources is simple. The Internet is just another medium. The content itself should be judged for its value individually. If the source is Wikipedia, then read the article but go to the article references before putting the information into your own work.

Of course, at the same time I do really take Anderson's point. Personal and collaborative blogs are certainly a great source to find out what's making world society tick at any given moment, especially given many of them are indeed primary resources, but their real value is in collective ideas and knowledge derived by reading many blogs. Hopefully this new and inexact science might find its way into school curricula too!

2. When Wikipedia is incorrect the error propagates: Now to my second point. Something I'm afraid of is the way Wikipedia can be just the blind leading the blind. Because Google and Wikipedia have some agreement (I should find out the details but suffice to say they DO have some kind of agreement and Google has been a big source of funds for Wikipedia). So Google places Wikipedia matches to a search term very high on the list of results, regardless of how well-developed the article may be. Some of these tertiary or quarternary resources we find on the net turn out to be referenced from Wikipedia — in particular misspellings. I did it once. I wrote something incorrect about a word etymology in my early days of contributing to Wikipedia. Only later did I discover (when I tried to check my facts by looking around the web) that the disinformation — whether sourced from the Wikipedia page or from my original web-based source — had propagated to several other pages on the web.

The problem is that Wikipedia by being so well-designed and with good stylesheets and apparent accountability and tracability for every character it contains, appears quite authoritative. The temptation to lift quotes from Wikipedia (whether citing the source page or not) no doubt overwhelms many students and special interest website editors. What's worse, the arrangement between Google and Wikipedia means that since these new pages either link to or duplicate the information on the Wikipedia article, the disinformation spreads exponentially if no counter-information exists or is created on the web.

Wikipedism 3. Wikipedism: Which brings me to my third point. Wikipedism is the new religion of the new society called Web 2.0, but I confess I'm a skeptic. A trackback to Chris Anderson's post led me to a post, "Have faith", by Nicholas Carr. Again, it's worth reading all Carr's recent posts.
Maybe it's just the Christmas season, but all this talk of omniscience and inscrutability and the insufficiency of our mammalian brains brings to mind the classic explanation for why God's ways remain mysterious to mere mortals: "Man's finite mind is incapable of comprehending the infinite mind of God." Chris presents the web's alien intelligence as something of a secular godhead, a higher power beyond human understanding. Noting that "the weave of statistical mechanics" is "the only logic that such really large systems understand," he concludes on a prayerful note: "Perhaps someday we will, too." In the meantime, we must have faith.
Indeed I can see how defending Wikipedia has become like a religious devotion to some. Every religion needs its prophet and Jimbo Wales does a fantastic job spreading the good news that Wikipedia will one day be the authoritative source of all knowledge. I'd rather we didn't take this blind approach to Wikipedia. Certainly it is a very interesting and fascinating experiment in democracy, but in Wikipedia's case, the question of whether God exists comes down to probabilistic statistics.

* By the way, I just made that up about Chris Anderson being Mary's descendant, so don't quote me on Wikipedia!

Technorati Tags: , , , ,
Wikipedism image derived from the Wikipedia logo, 2005


Blogger Nicholas Gruen said...

This post is an illustration of the fact that some of the best posts get the least comments. Thx for the post Lisa.

January 12, 2006 12:39 pm  
Blogger geoff said...

Ditto - good post Lisa. But you've got me reaching for my pack of Wiki-panadol!

January 12, 2006 2:33 pm  

Post a Comment

<< Home