Wikipedia: Tool for the classroom or boardroom?
Yesterday's post about Wikipedia and windscreen wipers was largely inspired by Chris Anderson's statement that the value of Wikipedia is in it's contribution as a probabilistic system (that is, being used as a whole, not accessing just a handful of articles) and how this might not necessarily make Wikipedia useful in the classroom. I want to add this this though, because later in the day Lisa Lynch pointed out how the same phenomenal characteristics of Wikipedia make it valuable to parties which specifically need tools to organise and categorise information according to probabilistic statistics.
Technorati Tags: Wikipedia, probabilistic systems, Google, Yahoo, data mining, smart queries
Wikipedia dollars image derived from the Wikipedia logo, 2005
Ultimately, the question of whether or not Wikipedia should be used in the classroom might be less important than whether — or how — it is used in the boardroom, by companies whose function is to repackage, reorganize and return "the people's knowledge" back to the people at a tidy profit.To put this statement in context, Lynch has quoted Dan Cohen, who provided an example of the way in which Wikipedia could contribute to microscale information organisation and retrieval:
... Google and Yahoo have additional reasons for supporting Wikipedia that have more to do with the methodologies behind complex search and data-mining algorithms, algorithms that need full, free access to fairly reliable (though not necessarily perfect) encyclopedia entries.To me this makes a lot of sense and no doubt there are other reasons why having a large, free, organically authored space is useful to all of us. At the same time, I wonder how this benefit translates into dollars for big companies that are able to derive uses from a free resource like Wikipedia? I wonder if Wikipedians will be as keen to contribute if and when this dollar value becomes more evident? I wonder if Wikipedia will still receive as many private donations when it becomes more evident?
Let me provide a brief example that I hope will show the value of having such a free resource when you are trying to scan, sort, and mine enormous corpora of text. Let's say you have a billion unstructured, untagged, unsorted documents related to the American presidency in the last twenty years. How would you differentiate between documents that were about George H. W. Bush (Sr.) and George W. Bush (Jr.)? This is a tough information retrieval problem because both presidents are often referred to as just "George Bush" or "Bush." Using data-mining algorithms such as Yahoo's remarkable Term Extraction service, you could pull out of the Wikipedia entries for the two Bushes the most common words and phrases that were likely to show up in documents about each (e.g., "Berlin Wall" and "Barbara" vs. "September 11" and "Laura"). You would still run into some disambiguation problems ("Saddam Hussein," "Iraq," "Dick Cheney" would show up a lot for both), but this method is actually quite a powerful start to document categorization.
Technorati Tags: Wikipedia, probabilistic systems, Google, Yahoo, data mining, smart queries
Wikipedia dollars image derived from the Wikipedia logo, 2005
0 Comments:
Post a Comment
<< Home