com.ibm.icu.text

Class RuleBasedBreakIterator_Old.Builder

protected class RuleBasedBreakIterator_Old.Builder extends Object

The Builder class has the job of constructing a RuleBasedBreakIterator_Old from a textual description. A Builder is constructed by RuleBasedBreakIterator_Old's constructor, which uses it to construct the iterator itself and then throws it away.

The construction logic is separated out into its own class for two primary reasons:

It'd be really nice if this could be an independent class rather than an inner class, because that would shorten the source file considerably, but making Builder an inner class of RuleBasedBreakIterator_Old allows it direct access to RuleBasedBreakIterator_Old's private members, which saves us from having to provide some kind of "back door" to the Builder class that could then also be used by other classes.

UNKNOWN:

Field Summary
protected static intALL_FLAGS
A bit mask representing the union of the mask values listed above.
protected Vectorcategories
A temporary holding place used for calculating the character categories.
protected booleanclearLoopingStates
A flag that is used to indicate when the list of looping states can be reset.
protected VectordecisionPointList
A list of all the states that have to be filled in with transitions to the next state that is created.
protected StackdecisionPointStack
A stack for holding decision point lists.
protected static intDONT_LOOP_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as one the builder shouldn't loop to any looping states
protected Hashtableexpressions
A table used to map parts of regexp text to lists of character categories, rather than having to figure them out from scratch each time
protected static intEND_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as an accepting state.
protected UnicodeSetignoreChars
A temporary holding place for the list of ignore characters
protected VectorloopingStates
A list of states that loop back on themselves.
protected static intLOOKAHEAD_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as a lookahead state.
protected VectormergeList
A list mapping pairs of state numbers for states that are to be combined to the state number of the state representing their combination.
protected VectorstatesToBackfill
Looping states actually have to be backfilled later in the process than everything else.
protected VectortempStateTable
A temporary holding place where the forward state table is built
Constructor Summary
Builder()
No special construction is required for the Builder.
Method Summary
voidbuildBreakIterator()
This is the main function for setting up the BreakIterator's tables.
protected voidbuildCharCategories(Vector tempRuleList)
This function builds the character category table.
protected voiddebugPrintTempStateTable()
protected voiddebugPrintVector(String label, Vector v)
protected voiddebugPrintVectorOfVectors(String label1, String label2, Vector v)
protected voiderror(String message, int position, String context)
Throws an IllegalArgumentException representing a syntax error in the rule description.
protected voidhandleSpecialSubstitution(String replace, String replaceWith, int startPos, String description)
This function defines a protocol for handling substitution names that are "special," i.e., that have some property beyond just being substitutions.
protected voidmungeExpressionList(Hashtable expressions)
protected StringprocessSubstitution(String substitutionRule, String description, int startPos)
This function performs variable-name substitutions.

Field Detail

ALL_FLAGS

protected static final int ALL_FLAGS
A bit mask representing the union of the mask values listed above. Used for clearing or masking off the flag bits.

UNKNOWN:

categories

protected Vector categories
A temporary holding place used for calculating the character categories. This object contains UnicodeSet objects.

UNKNOWN:

clearLoopingStates

protected boolean clearLoopingStates
A flag that is used to indicate when the list of looping states can be reset.

UNKNOWN:

decisionPointList

protected Vector decisionPointList
A list of all the states that have to be filled in with transitions to the next state that is created. Used when building the state table from the regular expressions.

UNKNOWN:

decisionPointStack

protected Stack decisionPointStack
A stack for holding decision point lists. This is used to handle nested parentheses and braces in regexps.

UNKNOWN:

DONT_LOOP_FLAG

protected static final int DONT_LOOP_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as one the builder shouldn't loop to any looping states

UNKNOWN:

expressions

protected Hashtable expressions
A table used to map parts of regexp text to lists of character categories, rather than having to figure them out from scratch each time

UNKNOWN:

END_STATE_FLAG

protected static final int END_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as an accepting state.

UNKNOWN:

ignoreChars

protected UnicodeSet ignoreChars
A temporary holding place for the list of ignore characters

UNKNOWN:

loopingStates

protected Vector loopingStates
A list of states that loop back on themselves. Used to handle .*?

UNKNOWN:

LOOKAHEAD_STATE_FLAG

protected static final int LOOKAHEAD_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as a lookahead state.

UNKNOWN:

mergeList

protected Vector mergeList
A list mapping pairs of state numbers for states that are to be combined to the state number of the state representing their combination. Used in the process of making the state table deterministic to prevent infinite recursion.

UNKNOWN:

statesToBackfill

protected Vector statesToBackfill
Looping states actually have to be backfilled later in the process than everything else. This is where a the list of states to backfill is accumulated. This is also used to handle .*?

UNKNOWN:

tempStateTable

protected Vector tempStateTable
A temporary holding place where the forward state table is built

UNKNOWN:

Constructor Detail

Builder

public Builder()
No special construction is required for the Builder.

UNKNOWN:

Method Detail

buildBreakIterator

public void buildBreakIterator()
This is the main function for setting up the BreakIterator's tables. It just vectors different parts of the job off to other functions.

UNKNOWN:

buildCharCategories

protected void buildCharCategories(Vector tempRuleList)
This function builds the character category table. On entry, tempRuleList is a vector of break rules that has had variable names substituted. On exit, the charCategoryTable data member has been initialized to hold the character category table, and tempRuleList's rules have been munged to contain character category numbers everywhere a literal character or a [] expression originally occurred.

UNKNOWN:

debugPrintTempStateTable

protected void debugPrintTempStateTable()

UNKNOWN:

debugPrintVector

protected void debugPrintVector(String label, Vector v)

UNKNOWN:

debugPrintVectorOfVectors

protected void debugPrintVectorOfVectors(String label1, String label2, Vector v)

UNKNOWN:

error

protected void error(String message, int position, String context)
Throws an IllegalArgumentException representing a syntax error in the rule description. The exception's message contains some debugging information.

Parameters: message A message describing the problem position The position in the description where the problem was discovered context The string containing the error

UNKNOWN:

handleSpecialSubstitution

protected void handleSpecialSubstitution(String replace, String replaceWith, int startPos, String description)
This function defines a protocol for handling substitution names that are "special," i.e., that have some property beyond just being substitutions. At the RuleBasedBreakIterator_Old level, we have one special substitution name, IGNORE_VAR. Subclasses can override this function to add more. Any special processing that has to go on beyond that which is done by the normal substitution-processing code is done here.

UNKNOWN:

mungeExpressionList

protected void mungeExpressionList(Hashtable expressions)

UNKNOWN:

processSubstitution

protected String processSubstitution(String substitutionRule, String description, int startPos)
This function performs variable-name substitutions. First it does syntax checking on the variable-name definition. If it's syntactically valid, it then goes through the remainder of the description and does a simple find-and-replace of the variable name with its text. (The variable text must be enclosed in either [] or () for this to work.)

UNKNOWN: