After spending some time with (Sag, Wasow, Bender, 2003) and (Copestake, 2002), I wanted to see just what is involved in implementing a processor for feature structures and the grammar for parsing sentences. The two texts mentioned helped me know where this was going, and having LKB (http://wiki.delph-in.net/moin/LkbTop) as a model meant I didn’t have to venture into uncharted territory.
So, I wrote a program that works with feature structures based on the descriptions in the two texts listed. There are several pieces involved in this program:
- TDL parser. Read in a file that contains Type Definitions. The syntax of these definitions are taken from Copestake.
- Feature Structure processor. Unification is the main task for managing feature structures. This means combining two or more feature structures into a combined structure while ensuring that rules for unification are not broken.
- Earley style Chart Parser. The Earley algorithm (http://en.wikipedia.org/wiki/Earley_parser) manages a list of potential partial solutions to the parse and continues to expand these partial solutions until one or more complete solutions are found.
- Viewer. The feature structures are displayed in an Attribute Value Matrix (AVM) viewer that allows the user to selectively minimize portions of a structure to maximize screen usage. These tend to be large images, so the selective minimization is a necessity even for relatively short sentences.
So, the program reads in the type definition file, manages a list of feature structures for grammar rules as well as word definitions, parses sentences, and displays the feature structure results for the sentence parses.
This is an image of a sentence that was parsed using feature structures. This is the minimized form.
This image shows the structure of the sentence. ‘the dogs’ is a noun phrase, as is ‘the cats’. ‘chased’ is a verb with ‘the dogs’ as its specifier and ‘the cat’ as its complement. ’start’ is the term used here for the top level container of the sentence.
Here is the same sentence expanded to show the ‘feature structures’ of each part of the sentence.

The feature structures are the various portions shown in square brackets and angle brackets. It is a nested structure which means there are pieces inside other pieces. So ‘the’ and ‘dogs’ are inside the np-rule. The np-rule and the vp-rule-intrans are inside ’start’ which is the sentence.
The features are the items in square brackets. For example ‘CATEG’, ‘NUMAGR’ and ‘ORTH’ are all features in the structure for ‘dogs’. ‘CATEG N’ means that this stucture is a noun. ‘NUMAGR pl’ means the structure requires agreement for number which in this case is ‘pl’ (plural).
The numbers inside squares are coreferences. These mean that whereever a number is repeated, those features must have the same value. So there are 6 places that show a ‘3′ inside a square. Each of these places must ‘allow’ the same value. So for example ‘the’ and ‘dogs’ both allow plural. Whereas “a dogs” would not have worked because ‘a’ can only be used with singular nouns. Similarly ‘ran’ allows plural, whereas ’sleeps’ doesn’t allow plural – you don’t say “the dogs sleeps”.
This has been a pretty involved project. I am glad to have made it this far. I plan to describe a few of the components of the program in a little more detail. As for future work, I am currently working on making the parser more efficient. Making the parser run faster is required for it to be useful in future work. After that, I want to see how to apply this to solve the entailment problem (see earlier posts).
