next up previous
Next: Prediction Up: Probabilistic Parsing in Action Previous: Stochastic Context-Free Grammars

Earley-Stolcke Parsing Algorithm

The method most generally and conveniently used in stochastic parsing is based on an Earley parser ([11]), extended in such a way as to accept probabilities.

In parsing stochastic sentences we adopt a slightly modified notation of [20]. The notion of a state is an important part of the Earley parsing algorithm. A state is denoted as:


\begin{displaymath}
i: X_{k} \rightarrow \lambda . Y \mu
\end{displaymath}

where '.' is the marker of the current position in the input stream, i is the index of the marker, and k is the starting index of the substring denoted by nonterminal X. Nonterminal X is said to dominate substring wk...wi...wl, where, in the case of the above state, wl is the last terminal of substring $\mu$.

In cases where the position of the dot and structure of the state is not important, for compactness we will denote a state as:


\begin{displaymath}
S_{k}^{i} \equiv i: X_{k} \rightarrow \lambda . Y \mu
\end{displaymath}

Parsing proceeds as an iterative process sequentially switching between three steps - prediction, scanning and completion. For each position of the input stream, an Earley parser keeps a set of states, which denote all pending derivations. States produced by each of the parsing steps are called, respectively, predicted, scanned and completed. A state is called complete (not to be confused with completed), if the dot is located in the rightmost position of the state. A complete state is the one that ``passed'' the grammaticality check and can now be used as a ``ground truth'' for further abstraction. A state ``explains'' a string that it dominates as a possible interpretation of symbols wk...wi, ``suggesting'' a possible continuation of the string if the state is not complete.




next up previous
Next: Prediction Up: Probabilistic Parsing in Action Previous: Stochastic Context-Free Grammars
yuri ivanov
1999-02-06