The notion of a State, is an important part of the Earley parsing algorithm. A state is denoted as:
where '.' is the marker of the current position in the
corresponding production, i is the position in the input stream, and
k is the starting index of the substring to which the above
production is being applied. Nonterminal X is said
to dominate substring
wk...wi...wl, where, in the case of the
above state, wl is the last terminal of substring
.
The parser
generates a state set for each position in the input. This also
means that in our notation all the states with the same index i belong to the
same i-th state set corresponding to i-th position in the input stream.
A state ``explains'' a string which it dominates giving a possible
interpretation of symbols wk...wi.
In our further discussion we will use expressions
in
context of state generation to read ``A generates B''.
Scanning
Scanning step simply reads the input symbol and matches it against all pending states for the next iteration:
|
|
where
and
are forward and inner probabilities.
Note the increase in i index. This signifies the fact that scanning step inserts the states into the new state set for the next iteration of the algorithm.
Completion
Completion uses the results of scanning to advance positions in the parse tree. In its simplified probabilistic form 2, completion step generates following states:
|
![]() |
Prediction
Prediction step is used to hypothesize the possible continuation of the input based on current position in the parse tree:
|
|
The prediction step introduces the rule probabilities
associated with productions
into the parsing
process 3.