More Ideas from Yoav Seginer

Yoav Seginer wrote his dissertation on the Incremental Parser.  The paper is pretty easy to read – accessible.  The introduction is an especially well written introduction to unsupervised grammar induction.

I was surprised to read what he had to say about substitutability.   Substitutability is the capacity to replace phrases with other phrases that are of the same type.  For example ‘the dog’ in ‘the dog ran to town’ can be replaced with ‘it’.   So in some sense, the phrase ‘the dog’ and ‘it’ can be substituted for each other.  This is one of the cornerstones of linguistic theory and is used as a basis of many parsing techniques.  The PCFG parser uses probabilities for a phrase type that can be traded out in a given context.

However, Seginer makes the claim that substitutability is not required for his incremental parser.

Substitutability, the essential idea of the Harris method, which has been seen as a starting point for the induction process for so long, turns out to be unnecessary in unsupervised parsing.

 …unlabeled parsing which only requires the parser to identify the constituents (or dependency links) but does not require them to be labeled, is purely syntagmatic (by definition).   A parser induction algorithm can therefore focus on learning to detect syntactic units while ignoring substitutability. (p20)

Another fresh idea (to me) from Seginer’s paper is the skewness of language structure.

The syntactic structure of natural language is skewed. This simply means that when the syntactic structure of an utterance is represented by a tree, each node in the tree has at least one short branch. The shorter the shortest branch is, the greater the skewness.  (p22)

Essentially how the incremental parser takes advantage of skewness is to expect skewness in the parse result.  This reduces the search size and thereby make the parsing process more efficient.

…context free grammars, allow (a-priori) any tree structure and, therefore, a learning algorithm for such representations must discover by itself the skewness property of syntactic trees. However, if this property is indeed universal, there is no need to burden the learning algorithm with its discovery and it is possible to code skewness directly into the parser.  (p23)

He claims that ‘coding the skewness’ into the syntactic representation and the parser, i.e., expecting branches to be of mixed depths, does not retract from the accuracy of the parse result.

Here is a link to Yoav’s dissertation.  Dissertation

Since graduating in 2007, it looks like Yoav Seginer is working at a  small company in Amsterdam, Mondria Technologies Ltd (according to LinkedIn.com).  The company website doesn’t say anything yet.  I wonder if they are working on a project that uses the incremental parser.

Comments are closed.