Friday, December 30, 2005

Wikipedia: Tool for the classroom or boardroom?

Wikipedia dollars: What is the monetary value of Wikipedia? Yesterday's post about Wikipedia and windscreen wipers was largely inspired by Chris Anderson's statement that the value of Wikipedia is in it's contribution as a probabilistic system (that is, being used as a whole, not accessing just a handful of articles) and how this might not necessarily make Wikipedia useful in the classroom. I want to add this this though, because later in the day Lisa Lynch pointed out how the same phenomenal characteristics of Wikipedia make it valuable to parties which specifically need tools to organise and categorise information according to probabilistic statistics.
Ultimately, the question of whether or not Wikipedia should be used in the classroom might be less important than whether — or how — it is used in the boardroom, by companies whose function is to repackage, reorganize and return "the people's knowledge" back to the people at a tidy profit.
To put this statement in context, Lynch has quoted Dan Cohen, who provided an example of the way in which Wikipedia could contribute to microscale information organisation and retrieval:
... Google and Yahoo have additional reasons for supporting Wikipedia that have more to do with the methodologies behind complex search and data-mining algorithms, algorithms that need full, free access to fairly reliable (though not necessarily perfect) encyclopedia entries.

Let me provide a brief example that I hope will show the value of having such a free resource when you are trying to scan, sort, and mine enormous corpora of text. Let's say you have a billion unstructured, untagged, unsorted documents related to the American presidency in the last twenty years. How would you differentiate between documents that were about George H. W. Bush (Sr.) and George W. Bush (Jr.)? This is a tough information retrieval problem because both presidents are often referred to as just "George Bush" or "Bush." Using data-mining algorithms such as Yahoo's remarkable Term Extraction service, you could pull out of the Wikipedia entries for the two Bushes the most common words and phrases that were likely to show up in documents about each (e.g., "Berlin Wall" and "Barbara" vs. "September 11" and "Laura"). You would still run into some disambiguation problems ("Saddam Hussein," "Iraq," "Dick Cheney" would show up a lot for both), but this method is actually quite a powerful start to document categorization.
To me this makes a lot of sense and no doubt there are other reasons why having a large, free, organically authored space is useful to all of us. At the same time, I wonder how this benefit translates into dollars for big companies that are able to derive uses from a free resource like Wikipedia? I wonder if Wikipedians will be as keen to contribute if and when this dollar value becomes more evident? I wonder if Wikipedia will still receive as many private donations when it becomes more evident?

Thursday, December 29, 2005

Windscreen wipers in the probabilistic age

The windscreen wiper, patented by Mary Anderson, 1905 You've most likely seen those "did you know?" factoids on packaging for countless disposable consumer items; those often useless pieces of information printed on the matchboxes, milk cartons, beermats, lolly wrappers... Well one such factoid series can be seen on the packaging for certain feminine products and, for these, the "did you know" takes on a bit of a feminist spin. As a result, I recently learnt that the inventors of fire escapes, bulletproof vests, laser printers and windscreen wipers were all women.

Considering these factoids are bound to be correct, based on their source, I thought I would give Wikipedia a little test. In the "fire escape" and "bulletproof vest" articles, no inventors were identified. In the "laser printer" article, a rather male-sounding Gary Starkweather was identified as the first person to create a laser printer. Finally, the "windscreen wiper" article made me happy by identifying Mary Anderson as the inventor and holder of the original patent for windscreen wipers. Thank you Mary! What would we do without you?

What of Wikipedia? Like many before me, I'm forced to concede that as wonderful as Wikipedia is, it is not a particularly authoritative source of information and probably never will be. Someone — probably Jimbo himself — once said "Wikipedia is not an experiment in democracy", but to me that's really what it is and what makes is so much fun to both read and edit. And, yes, of course it's a nice resource to begin one's search for information on a given topic, but not the best place to end it.

At this point I should say I have three things to say about Wikipedia, which are partly in response to some top notch writings about Wikipedia that I've recently come across.

Dice1. Wikipedia articles may have a great probability of being correct: First, a recent post by one of Mary's descendants*, Chris Anderson, recently made a post on his book blog titled The Probabilistic Age. Because it explains and articulates some of the excitement around Wikipedia and blogging, it's gained a lot of attention. I'm just going to quote the start of his post, but Anderson goes a long way with this, and it's well worth reading the whole post plus the trackbacks.
Q: Why are people so uncomfortable with Wikipedia? And Google? And, well, that whole blog thing? A: Because these systems operate on the alien logic of probabilistic statistics, which sacrifices perfection at the microscale for optimization at the macroscale.
In essence this idea is pretty easy to grasp: every item about a given topic (let's say cheese) on the Internet has a certain probability of being correct, some are more likely than others, but if ten items on the net state that Camembert is a creamy cheese, then it becomes much more likely that this is true.

This relies heavily on the artificial intelligence of various mechanisms of the web (search engines, page rankings, keyword and page hit statistics — which we see on Google and Technorati among other sites) as well as the assumption that a huge number of people are continually consuming, evaluating and contributing to the web. In a sense, this is a triumph for those who insist that online communities are self-regulating when a good set of tools are provided to individuals in the community to do the regulating.

Of course, finding out information according to probabilistic models is NOT what we were taught at school. We were told we must look for primary or secondary resources to discover the truth and preferably back up our statements with more than one reputable resource. Unfortunately much of what we find on the net are tertiary or quarternary resources, although, ten years on from the WWW's big bang, the ubiquity of the web means this trend is changing and we see more and more original work duplicated or published solely online.

This seemingly old-fashioned model is still quite valid, however and should be still applied in schools today; that is read widely, ensure the source is reputable, verify the primary and secondary sources, etc.; but special attention should be given to explaining how this process applies to web resources. It should be explained that even if some article has a million edits from people all over the world and links from thousands of websites, and thus has 99% probability of being correct, the original academic resources are listed at the end and are the ones you should read. The question of the validity of Internet-based resources is simple. The Internet is just another medium. The content itself should be judged for its value individually. If the source is Wikipedia, then read the article but go to the article references before putting the information into your own work.

Of course, at the same time I do really take Anderson's point. Personal and collaborative blogs are certainly a great source to find out what's making world society tick at any given moment, especially given many of them are indeed primary resources, but their real value is in collective ideas and knowledge derived by reading many blogs. Hopefully this new and inexact science might find its way into school curricula too!

2. When Wikipedia is incorrect the error propagates: Now to my second point. Something I'm afraid of is the way Wikipedia can be just the blind leading the blind. Because Google and Wikipedia have some agreement (I should find out the details but suffice to say they DO have some kind of agreement and Google has been a big source of funds for Wikipedia). So Google places Wikipedia matches to a search term very high on the list of results, regardless of how well-developed the article may be. Some of these tertiary or quarternary resources we find on the net turn out to be referenced from Wikipedia — in particular misspellings. I did it once. I wrote something incorrect about a word etymology in my early days of contributing to Wikipedia. Only later did I discover (when I tried to check my facts by looking around the web) that the disinformation — whether sourced from the Wikipedia page or from my original web-based source — had propagated to several other pages on the web.

The problem is that Wikipedia by being so well-designed and with good stylesheets and apparent accountability and tracability for every character it contains, appears quite authoritative. The temptation to lift quotes from Wikipedia (whether citing the source page or not) no doubt overwhelms many students and special interest website editors. What's worse, the arrangement between Google and Wikipedia means that since these new pages either link to or duplicate the information on the Wikipedia article, the disinformation spreads exponentially if no counter-information exists or is created on the web.

Wikipedism 3. Wikipedism: Which brings me to my third point. Wikipedism is the new religion of the new society called Web 2.0, but I confess I'm a skeptic. A trackback to Chris Anderson's post led me to a post, "Have faith", by Nicholas Carr. Again, it's worth reading all Carr's recent posts.
Maybe it's just the Christmas season, but all this talk of omniscience and inscrutability and the insufficiency of our mammalian brains brings to mind the classic explanation for why God's ways remain mysterious to mere mortals: "Man's finite mind is incapable of comprehending the infinite mind of God." Chris presents the web's alien intelligence as something of a secular godhead, a higher power beyond human understanding. Noting that "the weave of statistical mechanics" is "the only logic that such really large systems understand," he concludes on a prayerful note: "Perhaps someday we will, too." In the meantime, we must have faith.
Indeed I can see how defending Wikipedia has become like a religious devotion to some. Every religion needs its prophet and Jimbo Wales does a fantastic job spreading the good news that Wikipedia will one day be the authoritative source of all knowledge. I'd rather we didn't take this blind approach to Wikipedia. Certainly it is a very interesting and fascinating experiment in democracy, but in Wikipedia's case, the question of whether God exists comes down to probabilistic statistics.

* By the way, I just made that up about Chris Anderson being Mary's descendant, so don't quote me on Wikipedia!

Tuesday, December 13, 2005

Cronulla: Overt racism feeds on underlying racism

Tim, on his blog, says:
Racism exists in all countries. Australia is no worse than any other. The question isn't whether there is underlying racism somewhere, but how any given country deals with it. And the fact is, by world standards, Australia has dealt with it better than most. Multiculturalism has been part of this success, but it wouldn't have taken as well as it has without the general willingness of most Australians to embrace the notion of democractic diversity and tolerance.
I agree with his assessment that there is racism underlying every country and culture, but if talkback radio host Alan Jones is allowed to be publicly and openly inciting racism then that's no longer 'underlying' in my estimation and I think this public airing of racist comments is what made the difference on the weekend.

Howard has clearly been irresponsible in trying to simplify these two issues too, because they are certainly connected.

Regardless of the fact that there are small minority of thoughtless morons on talkback radio in Australia who incapable of careful consideration, there are going to be some people who do listen to them and hear things that add credibility to the idea that Lebanese people should be taught a lesson and not be allowed to share Cronulla beach!

The SMS call to arms for "skips to come to Cronulla and give the Lebs a hiding" is obviously overt racism, but Howard is buries his head in the sand and pretends it's completely separate from everyone else in Australia. At its grassroots overt racism is supported by the very underlying racism he denies exists. The 'underlying racism' certainly exists and let's not fool ourselves into believing otherwise.

Monday, December 12, 2005

Cronulla: From all the lands on earth we come

Cronulla riots So we had to whip up a bit of hysteria to pass the anti-terror laws and this what we get as the result?

I feel sorry for the thinking residents of the Sydney beach surburb, Cronulla, who will have to bear the ignominy of yesterday's events. Unfortunately for them, the word Cronulla will now become synonymous with racial violence in Australia.

For me this becomes the first time I've observed such blatantly vile use of Australian icons, songs and images and for me it hits like a sickening punch in the gut because I see it not just in the newspapers and on TV but vented everywhere all over the blogosphere too! I hate to do it, but I have to show you the twisted song I just read which is disturbing not just at an intellectual level but also emotional. Upsetting because "We are Australian" used to be a song about unity and shared opportunity in Australia regardless of the "all the lands on earth we come [from]".

I'm also astounded at how slow both Howard and Beazley have been to condemn the violence on both sides as racism and an abhorrent use of Australian icons. Obviously neither is able to care that in Sydney this weekend something happened that looked like it was out of pre-WW2 Germany. They had to check first with their spin-doctors to find out what level of condemnation the general populace would be comfortable with; how they can spin it so all us Anglo-Sax Aussies won't feel upset... There is no question here though. The violence on either side is equally unjustifiable. What happened in Cronulla yesterday was an truly shameful thing and let's not ignore it or forget it.

Addendum: I and many others have noticed 'Cronulla' has been the top search term at Technorati for at least the last 24 hours. Compared to when I wrote this post, I can now see the posts responding to the riots even further outnumbering the posts of those involved in the action themselves (perhaps by an order of 100). Amid the fallout from the riots I can't help but wonder what this means as a communications phenomenon and what the role of blogs and the Internet in general might be in such conflicts and the understanding of the issues. In the recent French riots, the rioters' blogs received bad press, but not here. Apparently the Cronulla action was largely arranged with the ancient technology of SMS, apparently widely disseminated on talkback radio, thanks to Alan Jones, and by word of mouth.

Tuesday, December 06, 2005

Terrorists born out of "internal jihad"?

Abu Ghraib 'Scream' Trying to blog on an emotive topic, especially one that involves religion and politics is always going to be difficult. This post is to explore further the concept of 'root causes' of terrorism.

I already raised the point that assuming 'root causes' of terrorism could be a worrying because it might fail to take into account both sides of a conflict, but Anthony Daniels (AKA Theodore Dalrymple) in an article in City Journal (USA-based urban policy magazine) makes a strong case for the 'root causes' theory. He suggests that Islamic suicide bombers carry out attacks to resolve their "internal jihad". That is, they accept martydom as a way to remove very real conflicts in their psyche (at least in the case of the London bombings).
...the term "jihad" has two meanings: inner struggle and holy war. While the political meaning connotes violence, though with such supposed justifications as the defense of Islam and the spread of the faith among the heathen, the personal meaning generally suggests something peaceful and inward-looking. The struggle this kind of jihad entails is spiritual; it is the effort to overcome the internal obstacles—above all, forbidden desires—that prevent the good Muslim from achieving complete submission to God's will...

...however, these two forms of jihad have coalesced in a most murderous fashion. Those who died in the London bombings were sacrificial victims to the need of four young men to resolve a conflict deep within themselves (and within many young Muslims), and they imagined they could do so only by the most extreme possible interpretation of their ancestral religion.
Dalrymple goes on to describe this internal conflict and how it is formed when British male Muslims embrace both their traditional values and contemporary Western "popular culture" values in London. This inevitably leads to a conflict, especially concerning the role of women, as Dalrymple points out:
However similar young Muslim men might be in their tastes to young white men, they would be horrified, and indeed turn extremely violent, if their sisters comported themselves as young white women [of British ethnicity] do.
If this internal conflict is as widespread in young male British muslims as Dalrymple purports, then it gives credibility to the idea that there is a 'root cause' for the suicide-bomber mentality.
Muslims who reject the West are therefore engaged in a losing and impossible inner jihad, or struggle, to expunge everything that is not Muslim from their breasts. It can't be done: for their technological and scientific dependence is necessarily also a cultural one. You can't believe in a return to seventh-century Arabia as being all-sufficient for human requirements, and at the same time drive around in a brand-new red Mercedes, as one of the London bombers did shortly before his murderous suicide. An awareness of the contradiction must gnaw in even the dullest fundamentalist brain.
I'm interested to know what other people think of this idea. Your thoughts?

Image (top) derived from The Scream by Edvard Munch, 1893, and images of the Abu Ghraib tortures from 2003-2005.
