------------------------------------------------------------------------------- * NOTE LOCATION * * NOTE LOCATION * ------------------------------------------------------------------------------- B O S T O N U N I V E R S I T Y Computer Science Department P H D D E F E N S E Wednesday June 14, 1995 3:00pm Stone Science Building, Room B50 675 Commonwealth Avenue Major Advisor: Prof. Joyce Friedman ------------------------------------------------------------------------------- PRESERVATION OF STRUCTURE SHARING IN PARSE FORESTS Marwan Shaban Abstract Some context-free parsing algorithms produce parse forests as a compact representation of the union of a set of parse trees. Although the generation of parse forests by such algorithms has been investigated, not much work has been done on the practical use of features in parse forests. Several techniques are utilized in order to preserve structure sharing in a parse forest. Among the techniques this work develops are: 1. An algorithm for splitting an arbitrary set of virtual trees away from the rest of a shared-packed forest. 2. A mechanism for delayed execution of changes to a forest by syntactic modules. 3. A unique and general method of encoding long-distance pointers from some subnodes to others. 4. The use of relative addressing for short-distance pointers from subnodes to others (rather than using hard-pointers). 5. The output of packed nodes from "local generators". To implement the above ideas, a GB parser of the generate-and-test variety is used. The structure-recovery component of the parser is a variant of Tomita's Generalized LR parser (Tomita 1986). The goal is to enable the parser (through the use of the above techniques) to preserve structure sharing throughout the GB parse process. We compare the parser's performance with that of a version that does not preserve structure sharing. The results indicate that the structure-preserving version of the parser does in fact produce substantial space savings over the non-structure-preserving version. This work has implications not just for GB parsing, but for any generate-and-test process that suffers from the classic "overgeneration problem". ------------------------------------------------------------------------------- For colloquium info, including directions, see http://cs-www.bu.edu/colloquium For more information contact Prof. Mark Crovella -------------------------------------------------------------------------------