This paper by Yoav Seginer is very exciting. It covers a method of learning a language grammar that resonates with my mental model of how a young child learns language. These are some aspects of the Seginer algorithm.
Incremental Learning. The system adds information to the grammar with each new sentence. In other approaches (CCM by Manning and Klein, UDOP by Bod), the entire corpus is processed as a block to collect the parameters of the grammar and the entire corpus is repeatedly processed until the learning converges on final result. In Seginer’s approach, the corpus is processed one sentence at a time. After each sentence, the grammar weights are updated. That sentence is not revisited for further training.
Simpler Math. The approach by Seginer uses much simpler math for computing the parameters of the resulting grammar. There are a few ratios, some accumulation of values into other, and some comparison of weights to choose which one to apply for a given step. There are no long chains of probabilities to compute the best parse. I don’t have any principled reason why this makes more sense for a model of how the brain works – it just fits my gut feeling better.
Not Restricted to Binary Trees. The CMM approach and others approaches that are referenced in Seginer’s paper all give binary parse trees as the result. But natural languages aren’t limited to binary trees. Although it is possible to represent any non-binary tree as a binary tree, forcing a binary representation onto natural language is not adding to the value of the result. Seginer’s process gives parse trees that include constructions of more than two nodes.
Exocentric Constructions. A phrase like ‘the boy’ has links going both directions between the two words. Either word can be considered the head of the phrase – which in some way matches the disagreement between linguists about which word is actually the head of the phrase.
Link depths. His algorithm result include a depth value on dependency links. This value can be used to distinguish between internal and external arguments to a phrase.
Incremental Parsing. For each word read into the sentence, links can only be added to or from the new word.
I know the boy (sleeps)
When the algorithm encounters the word sleeps it can only add a link to or from this word. No links that were generated at earlier steps can be affected. This reduces the search space and contributes to the overall speed of the parser. Another advantage is that the links in ‘I know the boy’ are an exact subset of the links in ‘I know the boy sleeps’. There is no decision necessary about dropping links that were previously discovered.
No Clustering. Seginer’s approach groups words with class labels that in some respect are part of speech classes. However, the approach does not require clustering of words into specific classes. It finds similarities between the and a but it does not require that all determiners are grouped together in a well defined class. This makes the approach better able to deal with noise in the training data.
Homophony. When a word has more than one meaning, it can confuse any machine algorithm. Seginer’s algorithm deals with homophony by comparing labels between target pairs of words. In the example that follows, words in brackets are possible labels for the target words. When an underscore appears in the label, it means look at the label of the preceding or following word.
This[the] year[_the] (this is a determiner)
This[is_] was [is] (this is a pronoun).
In the example given above, in one case this is a determiner. One of its labels is [the] because this and the occur in similar places. In the second line a label for this is [is_] which means that this is frequently followed by is. The algorithm finds pairs of labels such as [the] and [_the] or [is_] and [is] to decide where to put links.
I think Seginer’s approach has some real merit. It is a greedy algorithm – it learns as it goes and does not need to revisit previous sentences. It uses simple math. It has results that seem to match psycholinguistic models. It would be great to see this approach extended to take advantage of other linguistic phenomena such as morphology.
A link to the paper can be found here. Paper
A link to a video presentation on the approach can be found here. Video