Archive for February, 2006

Inside-Out Puzzles

Monday, February 20th, 2006

Imagine a picture puzzle that is inside-out.  The edges of the puzzle start in a small square with an empty space in the middle.  The smooth edges face in to the center.  As pieces are added to the puzzle, the area gets larger, but there is no finish to it.  Each row that is added around the outside has larger total area than the previous row.

This is how I see computational lingusitics.  We are learning some of the basics of language and how computers can interact with humans using it.  But we are still at the middle of the picture.

There is lots of research occuring in areas that are developing the pieces:

  • Voice recognition and generation.
  • Word semantics – finding relationships between words.
  • Document extraction and summarization.
  • Question Answering.
  • Information retrieval.
  • Analysis of syntax, discourse.

Much of this research is providing some amount of useful result.  There are real world applications that are being created based on these results that are helping users get their job done.

But we have not yet arrived at the part of the puzzle solving process where large sections fall together and a whirlwind of activity suddenly explodes out of the pieces.  We have not yet achieved the level of mastery where suddenly ‘aha’ occurs and we see how syntax, semantics and a theory of meaning form the basis for a system that learns meaning from the words it encounters without being supervised.

Eventually we will get computers to the point of the Enterprise mainframe.

  • We will be able to interact with the computer by voice with ambiguity of context, sentence structure, and word sense. 
  • The computer will acknowledge our individuality and adjust its responses to match.
  • The computer will have access to a large body of knowledge to use for responses to questions.
  • Answers to questions will be generated in context from the deep structure of the underlying knowledge.  Answers will be generated in the language of the questioner even though the knowledge may have been extracted from a source in a different language.
  • When you ask a question, the computer will answer with a single result.  You won’t need to sift through 198,235 pages to find what you are looking for.

We are not there yet.  :-)  

In the mean time, the small advances being made toward that vision with today’s real world applications are the places where we are learning.   Each researcher that attempts to solve the puzzle in a different way adds to our total understanding.  It is the incremental steps that eventually get us to the larger results.

Gary Larson Effect

Monday, February 20th, 2006

The Gary Larson effect is from a cartoon he drew regarding pets.

The pet-owner says: “Okay, Ginger! I’ve had it!  You stay out of the garbage!  Understand, Ginger?  Stay out of the garbage, or else!”

The pet hears: “Blah blah Ginger blah blah blah blah blah blah blah blah blah blah Ginger  blah blah blah blah blah blah blah.”

In other words, pets understand a few words and ignore everything else.  Most pets learn a few more words besides their name.  Our kittens (see Search Engines) understand ‘tuna’ for any food treat and are starting to learn ‘down’.

At about 12 months, children can hear and speak one word at a time.  At about 18 months, they start hearing and using words in pairs.   The interesting thing about this that was reported by Roger Brown is that the word pairs usually appeared in the order that matches ‘agent action recipient object location’ order.  In other words they are acquiring the word order of the language of their parents.  Children learning words in another language will begin using the word order differently.

How Nouns are similar

Saturday, February 18th, 2006

I found this quote in Hindle (1990) (Reference Page).

“So for example, a car is an object that you can have and sell, like wine and beer, but you do not typically drink a car.”

In other words, car, wine and beer can be owned, bought and sold.  But beer and wine can be drank as beverages.  This means the association between beer and wine is different that the association between car and beer.

Comparing Word Mutual Information Scores

Saturday, February 18th, 2006

=== See Entry for April 1, 2006.  ===

=== I found the problem with the computation. ===

Below is a table that shows the computed mutual information for noun-verb pairs. The table displays 5 different ways of numerically sorting the noun-verb pairs.

The first column gives the noun that is used for the pairs. The number under the noun is the number of noun-verb pairs that were detected by the parser and clustering algorithm. (See entry for Feb 11, 2006 – Streaming Clusters).

The rest of the columns show the verbs having high mutual information with the noun. These words are sorted, so the top verb in each list has the highest mutual information value with the noun. The numbers in column 2 to 6 are the instance count for the given noun-verb pair.

The definition for the mutual information was taken from Hindle (1990), Lin (1998) and Gamallo, et. al. (2005) (See the reference page). Hindle and Lin each use the same definition. Gamallo, et. al., use a slightly different definition.

Columns 4 and 5 show a variation on the Hindle definition. In column 4, after computing the value according to Lin, the result is then multiplied by the number of instances of the given word pair (wrw). The term ‘wrw’ comes from Lin.

‘Door opened’ was found as a pair 526 times, so the result is

MI_Modified = MI * 526.

In column 5, the Hindle definition is multiplied by the log of the number of instances,

MI_Modified = MI * log(526).

The final column 6 shows the instance count for the noun-verb pairs.

 

 

Hindle
or Lin
Gamallo
Lin
With *wrw
Lin
With *log(wrw)
Instance
Count
Door
3090 
2 creaminess 1 reflectors
1 musquitoes
1 azeglio
1 atlantis’
26 slammed
5 flinched
14 banged
2 hughes
2 thornie
2 unbarred
1 munich
2 burby
Musquitos Azeglio
Atlantis
Slammed
Fliched
Banged
Hughes
Thornie
Unbarred
Munich
Burby
opened
526 opened 207 open
127 closed
511 was
80 locked
87 shut
54 leading
39 swung
26 slammed
36 bell
26 flung
42 stood
14 banged
22 opening
526 opened 80 locked
207 open
127 closed
26 slammed
87 shut
54 leading
39 swung
14 banged
36 bell
26 flung
22 opening
11 neighbor
11 unlocked
10 barred
8 creaked
526 opened 511 was
207 open
127 clsoed
87 shut
80 locked
77 had
72 is
54 leading
42 stood
39 swung
36 bell
26 slammed
26 flung
25 be
22 opening
21 being
Death
1736
2 deowe 2 distilling
2 furens
1 supervenes
1 titmouse
1 fouras
6 tristram
1 copperplate
2 usurp
1 resplendently
1 suffe
1 bonnemains
deowe distilling
furens
supervenes
titmouse
fouras
tristram
copperplate
96 bed 88 rate
27 blow
127 is
159 was
15 warrant
15 wound
10 rates
13 struggle
37 like
96 had
6 stistram
9 sentence
13 comes
62 be
6 beds
8 occurred
96 bed 88 rate
27 blow
15 warrant
10 rates
15 wound
6 tristram
13 struggle
9 sentence
6 beds
4 maximus
6 trap
4 overtake
8 occurred
13 comes
6 brings
3 overweight
159 was 127 is
96 bed
96 had
88 rate
62 be
37 like
35 will
34 have
27 blow
27 are
24 were
22 has
17 come
15 warrant
15 wound
15 came
Child
2682
1 topknots 1 humouring
2 coughs
1 develish
1 stiddy
2 sneezed
1 cons
1 antagonized
5 stope
1 playhouses
1 ashiel
1 riposte
1 praisest
humouring coughs
develish
stiddy
sneezed
cons
antagonized
stope
playhouses
84 born 269 is
233 was
98 said
69 has
103 be
44 like
11 bearing
10 birth
10 skillful
7 learns
5 stope
12 sitting
7 asks
52 will
84 born 5 stope
7 learns
10 birth
11 bearing
10 skillful
7 asks
6 dies
269 is
12 sitting
3 drogo
6 feels
9 learn
69 has
7 study
269 is 233 was
103 be
98 said
93 had
84 born
69 has
52 will
44 like
40 have
27 can
15 up
15 do
12 sitting
12 looked

 

There is something amiss with columns 2 and 3. Some of the verbs selected by the Hindle and Gamallo definitions are intuitively associated with the noun, but many of the words do not seem to be particularly related. And when compared to columns 4 and 5, columns 2 and 3 do not seem very close to target.

I suspect that this is a result of steps in the process that occur earlier in the process. The parsing algorithm that I use is substantially different from that desribed by Hindle, Lin or Gamallo. Similarly, the cluster gathering algorithm is different.

Another potential cause for these results to be different from the works referenced is the interpretation of the word pairs. Keeping track of which instances represent a cluster starting with a given noun and ending with a given verb is a difficult task. I have been optimizing the storage of the instances to speed up the computation. I need to do another pass on the code to see if I am correctly managing everything.

 

Synonyms and Symmetry

Wednesday, February 15th, 2006

One of the means of judging the similarity between words is how well they can be substituted for each other.  When word A can be substituted for word B in a sentence without changing the meaning, then they are considered synonyms.  For example, house can sometimes be substituted for dwelling.

There are several issues that complicate matters.

  • Part of speech.  House can be either a noun or a verb.  So can dwelling.  But when considered as verbs, house and dwell do not substitute as well.   ‘He houses the dogs in a kennel’ does not seem correct when changed to ’He dwells the dogs in a kennel’.
  • Word sense.  House as a noun can mean different things.  Dwelling, building, assembly of a legislature, social unit or a building for theatrical performances are all examples of different meanings house.
  • Scope.  Although two words may mean the same thing in a given sentence, one of the words may be more general than the other.  For example, in WordNet, House is defined as a specialization of dwelling.  Dwelling can mean more things than house.  A dwelling could be a cave, but typically house implies a structure.
  • Frequency.  In the corpus I am working with, house is used 104,000 times.  Dwelling is used 3738 times.

When using a corpus to measure similarity between words, the term distributional similarity is applied (Weeds, Weir, 2005).  For word A ‘house’ and word B ‘dwelling’, a group of features is measured (or counted).   House will be found with one group of words, dwelling in a different group.

Here are some co-occurrences for the word dwelling.

  • erected dwelling
  • corner dwelling
  • occupied dwelling
  • log dwelling

Here are some co-occurrences for the word house.

  • boarding house
  • banking house
  • log house
  • lodging house
  • summer house
  • corner house
  • guard house

These are just examples that were picked manually, but they show how some of the co-occurrences happen for both words, and some are unique.  When the lists of co-occurences have overlap as displayed by these two lists, then the target words are said to have distributional similarity.

References

Saturday, February 11th, 2006

A list of references that are used on this site.

  • Copestake, Ann.  2002.  Implementing Type Feature Structure Grammers.  CSLI Publications, Stanford.
  • Gamallo, Pablo, Alexandre Agustini, and Gabriel P. Lopes.  2005.  Clustering Syntactic Positions with Similar Semantic Requirements.   Computational Linguistics, Vol 31,1  pp. 107-145.
  • Green, Rebecca, Bonnie J. Dorr, and Philip Resnik.  2004. “Inducing Frame Semantic Verb Classes from WordNet and LDOCE”, in Proceedings of the Association for Computational Linguistics, Barcelona, Spain, 2004.
    ftp://ftp.umiacs.umd.edu/pub/bonnie/green-dorr-resnik.pdf
  • Hindle, Donald. 1990. Noun Classification from predicate-argument structures.  in Proceedings of the Association for Computational Linguistics, Pittsburg, Pennsylvania, 1990.  http://www.cs.mu.oz.au/acl/P/P90/P90-1034.pdf
  • Lin, Dekang. 1998.  Automatic Retrieval and clustering of similar words.  COLING-ACL’98, pp. 768-774, Montreal.
    http://www.cs.ualberta.ca/~lindek/papers/acl98.pdf
  • O’Grady, William, John Archibald, Mark Aronoff, Janie Rees-Miller. 2005.  Contemporary Linguistics  An Introduction.  Bedford/St. Martin’s, New York. 
  • Sag, Ivan A., Thomas Wasow, Emily M. Bender.  Syntactic Theory.  A Formal Introduction.  CSLI Publications.  Stanford.
  • Weeds, Julie, and David Weir.  2005.  “Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity”, Computational Linguistics, Vol 31,4 pp. 439-475.

Streaming Clusters

Saturday, February 11th, 2006

As part of my work with a corpus, I have been extracting word clusters from the sentences.  The goal is to collect word clusters that occur in sentences and then use those clusters to ‘learn’ the relationships between words.  For example, the word pair ‘abandoned hope’ occurs 12 times in the corpus I am using.

These are not necessarily bi-grams.  Bi-grams are words that occur next to each other.  Clusters are word pairs that appear as phrase heads.  So ‘abandoned all hope’ is collected as the cluster ‘abandoned hope’.  These phrase heads are extracted using a context free grammar (CFG) that picks out the phrases and identifies the heads of each phrase.

The steps used to collect the clusters are:

  • Read the sentences one at a time.
  • Parse the sentences into their phrase constituents.
  • Select the head words of successive phrases.
  • For each pair of head words, create a cluster entry.
  • Write each cluster to a file in serial fashion.
  • After all sentences are processed, reread the cluster file and collect instances of identical clusters.

A sentence with three phrases (p1, p2, p3) and three head words (w1, w2, w3) creates two clusters: w1-w2 and w2-w3.  Clusters are categorized into groups depending on the part of speech (noun, verb, preposition, adjective).   A sentence with three phrases and three head words which are [noun verb noun] creates two clusters noun-verb and verb-noun.

For each cluster, a number is computed that represents how important that cluster is compared to all other clusters.  This means that when that cluster is found in a sentence, is it likely to be significant to the meaning of the sentence.  For example ‘abandoned him’ occurs 17 times, but gets a lower score of importance than ‘abandoned hope’.  The word ‘him’ occurs many more times in the corpus, so the fact that it occurs with ‘abandoned’ is less significant than when ‘abandoned hope’ is discovered.

My corpus has a total of 140 million words (7.4 million sentences).  From these sentences, about 95 million clusters are collected.  After all identical clusters were grouped, there are about 35 million unique clusters.

 

How do children do it?

Sunday, February 5th, 2006

Just how is it that a child can do what cannot yet be done on a computer? 

Here is a time table for an average child learning its first language:

  • 12- months – Understand single words.
  • 12+ months – Produce single words.
  • 18 months – Learn a new word every two hours (on average).  Combine words into pairs – most of the time the words are in the correct order.
  • 24-36 months – Start understanding and using complex sentences.

Source: The Language Instinct, Steven Pinker

Children are capable of learning the language of their parents without any overt training.  Even if the child is totally ignored and never spoken to directly, he/she will still learn to understand and speak.

As Steven Pinker and many other people have expressed, this learning ability probably implies the existence of structures in the brain that are prepared for this steep learning curve.  By the age of 12 months, a child’s brain is prepped to begin associating words with things, people and actions.  Something in the brain is searching for recurring patterns in words heard and the things meant by those words.

Clearly, a child learns about his environment before it learns words for it.  Children recognize their caretakers, they learn routine, they gain knowledge of objects.  They learn how they are constrained by the laws of physics, for example, they learn how objects falling when dropped.

Is learning these environmental items a prerequisite for learning a language?  Before a child can learn the word ball, is it a requirement to have previously experienced a ball?  Is there a knowledge structure in the brain that represents the ball in the child’s thoughts that is underlying the ability to speak the word ball and eventually to express ideas such as ‘drop the ball’?

 

Searching For Relationships

Saturday, February 4th, 2006

What if web searches used the relationships between words as part of the search criteria?

A web search typically treats the search request as a bag of words.  When you search for ‘yard debris’, your will get essentially the same results as for ‘debris yard’.  Google returns about 1.7 million hits in either case.

Searching for ‘blog review’ or ‘review blog’ returns the same basic results of about 95 million hits.

As always I am trying to find ways to improve how computers works for us.  Imagine asking the computer, “Where do I take my yard debris?”  Ideally, the result would be 3 to 5 hits for locations close to your home that can take your leaves and branches.  (We are having a wind storm today, so I’ll be picking up the yard debris tomorrow).

What would it take for the search to return 5 pages, instead of 1.7 million pages?

Could word clusters help with this improvement?  There are many research groups looking a word clusters for ways to extract semantic information.  An example of a word cluster is “push the spring”.  This is a sentence fragment that has spring as the object of the verb push.  The word cluster in this case could be:

’spring’->object_of->’push’

Imagine all of the verbs that could have a ’spring’ (in the sense of a coil) as an object.  Now, imagine all of the verbs that could have ’spring’ (in the sense of a season) as an object.  These two lists of verbs will be different.  There will be some overlap, but there will also be many verbs that are unique to the two sentences.

These two distinct lists of verbs that select between different senses of ’spring’ are an example of how semantics might be used to improve how computers interact with people.  Many researchers are digging into various facets of word clustering and semantic relationships.   Here are a few references.

Gamallo, Agustini, Lopes.  2005.  Clustering Syntactic Positions with Similar Semantic Requirements.   Computational Linguistics, Vol 31,1  pp. 107-145

Lin. 1998.  Automatic Retrieval and clustering of similar words.  COLING-ACL’98, pp. 768-774, Montreal.
http://www.cs.ualberta.ca/~lindek/papers/acl98.pdf

Green, Rebecca, Bonnie J. Dorr, and Philip Resnik, “Inducing Frame Semantic Verb Classes from WordNet and LDOCE”, in Proceedings of the Association for Computational Linguistics, Barcelona, Spain, 2004.
ftp://ftp.umiacs.umd.edu/pub/bonnie/green-dorr-resnik.pdf

Let me hear your comments on this subject.

 

Search Engines

Wednesday, February 1st, 2006

We have two kittens, brother and sister that were rescued from living in the wild.  We named them Yahoo and Google.  Yahoo is the sister, she is reserved and cautious.  Google is the brother, he is ready for anything.

They are both very good search engines.

Our Search Engines