next up previous
Next: Extensions to uncertain terminals Up: Probabilistic parsing Previous: Probabilistic parsing

Notation and algorithms

The notion of a State, is an important part of the Earley parsing algorithm. A state is denoted as:


\begin{displaymath}
i: X_{k} \rightarrow \lambda . Y \mu
\end{displaymath}

where '.' is the marker of the current position in the corresponding production, i is the position in the input stream, and k is the starting index of the substring to which the above production is being applied. Nonterminal X is said to dominate substring wk...wi...wl, where, in the case of the above state, wl is the last terminal of substring $\mu$. The parser generates a state set for each position in the input. This also means that in our notation all the states with the same index i belong to the same i-th state set corresponding to i-th position in the input stream. A state ``explains'' a string which it dominates giving a possible interpretation of symbols wk...wi.

In our further discussion we will use expressions $A \Rightarrow B$ in context of state generation to read ``A generates B''.

Scanning

Scanning step simply reads the input symbol and matches it against all pending states for the next iteration:

$\begin{array}{l}
i: X_{k} \rightarrow \lambda . a \mu \quad [\alpha, \gamma] \\
\end{array}$ $\Longrightarrow$ $i+1: X_{k} \rightarrow \lambda a . \mu \quad [\alpha, \gamma] $

where $\alpha$ and $\gamma$ are forward and inner probabilities.

Note the increase in i index. This signifies the fact that scanning step inserts the states into the new state set for the next iteration of the algorithm.

Completion

Completion uses the results of scanning to advance positions in the parse tree. In its simplified probabilistic form 2, completion step generates following states:

$\left\{
\begin{array}{l}
j: X_{k} \rightarrow \lambda . Y \mu \quad [\alpha, \gamma]\\
i: Y_{j} \rightarrow \nu . \quad [\alpha'', \gamma'']
\end{array}\right.$ $\Longrightarrow$ $i: X_{k} \rightarrow \lambda Y . \mu \quad [\alpha', \gamma']$
$\begin{array}{rcl}
\alpha' & = & \sum_{\forall \lambda, \mu}
\alpha(i: X_{k} ...
...ightarrow \lambda . Y \mu)
\gamma''(i: Y_{j} \rightarrow \nu .)
\end{array} $

Prediction

Prediction step is used to hypothesize the possible continuation of the input based on current position in the parse tree:

$\left\{
\begin{array}{l}
i: X_{k} \rightarrow \lambda . Y \mu \quad [\alpha, \gamma]\\
Y \rightarrow \nu
\end{array}\right.$ $\Longrightarrow$ $i: Y \rightarrow .\nu \quad [\alpha', \gamma'] $
$\begin{array}{rcl}
\alpha' & = & \sum_{\forall \lambda, \mu}\alpha(i: X_{k} \r...
...Y \mu)P(Y \rightarrow \nu) \\
\gamma' & = & P(Y \rightarrow \nu)
\end{array}$

The prediction step introduces the rule probabilities $P(Y \rightarrow
\nu)$ associated with productions $Y \rightarrow \nu$ into the parsing process 3.


next up previous
Next: Extensions to uncertain terminals Up: Probabilistic parsing Previous: Probabilistic parsing
yuri ivanov
1999-02-06