how to search online: March 2010

How to Evaluate a Clustering Search Engine
Many enterprise search vendors have announced that clustering of search results is now part of their product and user experience. The most recent case is Google (press center, blog post, blogosphere reaction). Microsoft researchers have also experimented with clustering, without these experiments finding their way yet into Microsoft’s products.

By definition, a clustering engine analyzes the top (say 200-500) search results from a query and displays the main themes, typically as folders that may consist of subfolders.

The spread of clustering engines is gratifying, since Vivisimo was founded on a breakthrough clustering algorithm, has been refining the approach and educating and selling into the search market since 2001, and has evolved into a complete enterprise search provider.

Just as with search results, or as with any other designed product, judging the quality of a clustering engine requires some skill. Before judging quality, let’s first explore clustering’s end user value, that is, how it enhances knowledge worker productivity.

Clustering enhances end user productivity in at least three ways:

At a glance, users gain an easy overview of the main distinct themes that are present in the top search results.
By clicking on clusters that satisfy their needs or interests, users can quickly arrive at search results that are valuable but low ranked, say, #73 or even #429 in the results list, and so would never be noticed otherwise. The user’s visibility into the content is greatly enhanced.
After arriving at a cluster or sub-cluster, related results are placed together (“clustered”) rather than scattered throughout the ranked list. This expedites finding related or the best content.
In short, clustering lets users overview, find, discover, and compare information more productively. How does clustering quality enhance or detract from this productivity gain?

To grasp an overview of the main themes, the cluster labels should be concise and natural-looking. Also, the clusters shouldn’t overlap too much in their contents, otherwise the user will be overloaded with too many clusters expressing overly related themes. If the clusters don’t overlap too much, so that on average a search result appears in only 1.2 to 1.5 clusters, then the main distinct themes will be shown, rather than similar/duplicate themes on overlapping content. Also, the cluster labels shouldn’t be artificially limited to labels that contain the query word, or labels that have two or more words in them, or some other artifact of an inferior clustering approach. Finally, the underlying search engine snippets (aka excerpts or dynamic summaries) should be full enough so that clustering has enough input text to work with.
To arrive at low-ranked but valuable results, the clustering engine should be fast enough so that 200-500 results or more can be clustered within an acceptable response time. If user authentication is an issue (note discussion here), then the response time should include the time for the search engine to verify that the user can view these documents. Also, the cluster labels should accurately express its contents, otherwise the user wastes time on wild goose chases.
In order for similar results to be placed together in the same clusters, the clustering software should possess the linguistic knowledge needed to correctly handle cases like the following:
sort out the meanings in middle ages, middle aged, and medieval, and in news release, new release, and press release.
realize that king and kingship are very related, unlike gun and gunship.
realize that unfearful and fearless are synonymous, but not unhelpful and helpless.
plus many thousands of other linguistic relationships that take time and background knowledge to learn, whether by humans in school or by computers.
There are endless other subtleties, but enough: what’s the bottom line? Here are some questions to ask about the quality of a clustering search engine:

Are the cluster folders determined by analyzing the top search results? If not, then no overview of the major themes is being given. Instead, the “folders” are probably based on query logs.
Does clicking on a cluster cause a new search to be done? If so, it’s not clustering but something else, likely query refinement, which leads to a discontinuous user experience.
Is the clustering engine able to handle 200-500 results or even more?
Are the cluster names concise and natural-looking, and do they correctly handle numbers, punctuation, diacritics, foreign words, etc.?
Is there evidence that the clustering engine possesses considerable linguistic knowledge? And in other languages besides English, if needed? For example, are many (30-40% or more) of the search results left unclustered into an Other category? This suggests a deficiency in detecting related meanings.

http://searchdoneright.com/2007/03/how-to-evaluate-a-clustering-search-engine/

The above post shows how to search on cluster search engines which is another way of deciding on the appropriate sites

Lateral Multidirectional Literacy means searching (Surfing) for information staying on one level of the sources. Lateral multidirectional literacy enhances your learning /reading power, gives better ideas and builds more creativity to see a bigger picture of a changing situation. Branching literacy has changed today’s world. Many people cannot count how many times they have been required or encouraged to use the internet or different databases to collect information. This shows how dependant people are on the websites. Gathering information in a non linear manner often requires researchers to use numerous sources in order to find complete information. This allows everyone to gather information and combine it in order to create knowledge. With your past experience and senses you could take a decision on the correct web site you need.

Tuesday, March 30, 2010

How to Search the Internet Effectively

How to Search Effectively Online

How Search Engines Work

How to search online

Monday, March 29, 2010

Basic Search Tips

Let Google Narrow Your Search

Sunday, March 28, 2010

Friday, March 26, 2010

Lateral Multidirectional Literacy

Lateral Multidirectional Literacy

how to search online

Followers

Blog Archive

Contributors