What if web searches used the relationships between words as part of the search criteria?
A web search typically treats the search request as a bag of words. When you search for ‘yard debris’, your will get essentially the same results as for ‘debris yard’. Google returns about 1.7 million hits in either case.
Searching for ‘blog review’ or ‘review blog’ returns the same basic results of about 95 million hits.
As always I am trying to find ways to improve how computers works for us. Imagine asking the computer, “Where do I take my yard debris?” Ideally, the result would be 3 to 5 hits for locations close to your home that can take your leaves and branches. (We are having a wind storm today, so I’ll be picking up the yard debris tomorrow).
What would it take for the search to return 5 pages, instead of 1.7 million pages?
Could word clusters help with this improvement? There are many research groups looking a word clusters for ways to extract semantic information. An example of a word cluster is “push the spring”. This is a sentence fragment that has spring as the object of the verb push. The word cluster in this case could be:
’spring’->object_of->’push’
Imagine all of the verbs that could have a ’spring’ (in the sense of a coil) as an object. Now, imagine all of the verbs that could have ’spring’ (in the sense of a season) as an object. These two lists of verbs will be different. There will be some overlap, but there will also be many verbs that are unique to the two sentences.
These two distinct lists of verbs that select between different senses of ’spring’ are an example of how semantics might be used to improve how computers interact with people. Many researchers are digging into various facets of word clustering and semantic relationships. Here are a few references.
Gamallo, Agustini, Lopes. 2005. Clustering Syntactic Positions with Similar Semantic Requirements. Computational Linguistics, Vol 31,1 pp. 107-145
Lin. 1998. Automatic Retrieval and clustering of similar words. COLING-ACL’98, pp. 768-774, Montreal.
http://www.cs.ualberta.ca/~lindek/papers/acl98.pdf
Green, Rebecca, Bonnie J. Dorr, and Philip Resnik, “Inducing Frame Semantic Verb Classes from WordNet and LDOCE”, in Proceedings of the Association for Computational Linguistics, Barcelona, Spain, 2004.
ftp://ftp.umiacs.umd.edu/pub/bonnie/green-dorr-resnik.pdf
Let me hear your comments on this subject.