next up previous
Next: Constraining Terminal Length Consistency. Up: Probabilistic parsing Previous: Extensions to uncertain terminals

Spurious Symbol Deletion.

Figure 3: Example of the lattice parse input. The dashed line shows a possible parse. The two rectangles drawn around samples 2 and 4 show the ``spurious symbols'' for this parse and which need to be ignored for the derivation to be contiguous. We can see that if the spurious symbols are simply removed from the stream, an alternative derivation for the sequence, shown by a dotted line, will be interrupted. Sample 3 contains two concurrent symbols which are handled by the lattice parsing.
\begin{figure}
\psfig{figure=Figures/deletionfig_2.ps,width=6in}\end{figure}

Parsing needs to be performed on a lattice, where the symbols which we need to consider for inclusion in the string, come at random times. This results in appearance of ``spurious'' (i.e. ungrammatical) symbols in the input stream. We need to be able to consider these inputs separately, at different time steps and disregard them if their appearance in the input stream for some derivation is found to be ungrammatical. At the same time, we need to preserve the symbol in the stream for considering it in other possible derivations, perhaps even of a completely different string. The problem is illustrated by figure 3.

One simple approach is to convert the grammar G to a robust grammar $\hat{G}$ by first introducing the SKIP non-terminal which is allowed to match any input symbol 4, and then by making the following modifications:

.
Each terminal in productions of G is replaced by a pre-terminal in $\hat{G}$:

     G: $\Rightarrow$ $\hat{G}:$
     $A \rightarrow bC$   $A \rightarrow \hat{B} C$

.
For each pre-terminal of $\hat{G}$ a skip rule is formed:
     $\hat{G}:$  
     $\hat{B} \quad$ $\rightarrow \quad b \quad \vert \quad SKIP \quad
b \quad SKIP$

This process can be performed automatically as a pre-processing step, when the grammar is read in by the parser, so that no modifications to the grammar are explicitly written.


next up previous
Next: Constraining Terminal Length Consistency. Up: Probabilistic parsing Previous: Extensions to uncertain terminals
yuri ivanov
1999-02-06