reglibcpp
1.3.0
(Naïve) C++ implementation of models for regular languages
|
Parses regular expressions. More...
Classes | |
struct | tree |
Represents the table entries as binary trees. More... | |
Public Types | |
enum | token : unsigned char { A, B, C, K, E, F, Σ, P, L, R, S, END } |
Tokens the grammar deals with. More... | |
typedef bitset< token::END > | tokens |
Tokens don't usually come alone. More... | |
Public Member Functions | |
parser (u32string const &re, literals const &lits) | |
Initializes with a string to parse and literals to parse for. More... | |
exptr | operator() (bool optimized, bool aggressive) |
Gives the RE resulting from parsing. More... | |
Static Public Member Functions | |
static tokens | getUClosure (tokens const &m) |
Constructs the reflexive-transitive closure of the inverse unit relation for a given set of symbols. More... | |
static bool | canDerive (token symbol, tokens const &left, tokens const &right) |
Checks if a token could derive a pair of tokens from two other entries. More... | |
static void | compileTableEntry (size_t row, size_t diag, vector< vector< tokens >> &table) |
Fills a table entry. More... | |
Public Attributes | |
literals | lits |
Stores interpretations for characters encountered in the parsed string. More... | |
vector< char32_t > | symbolMapping |
Stores the actual symbols encountered in the RE while parsing. More... | |
size_t | symbolMappingIndex |
Index for when symbols have to be extracted from the mapping. More... | |
vector< vector< tokens > > | table |
The table of sets of symbols that derive a subsentence. More... | |
bool | badRegularExpression = false |
Signifies that the RE used to initialize this object was invalid. More... | |
Static Public Attributes | |
static array< token, token::END > const | inverseUnitGraph |
Maps symbols that may be derived in-place to their predecessing symbols. More... | |
static array< array< tokens, token::END >, token::END > const | inverseBinaryRules |
Maps pairs of symbols to the symbols that derive them. More... | |
Parses regular expressions.
Recognizes REs generated by the following context-free grammar, where P
stands for the alternative symbol (literal "+
" or "|
"), L
for a left parenthesis (literal "(
"), R
for a right parenthesis (literal ")
") and S
for the Kleene star (literal "*
"). Σ
is a placeholder for any symbol that makes up the regular language described by the RE:
GRE=( {A, B, C, K, E, F}, {Σ, P, L, R, S}, A, { (A,C), (A,AB), (B,PC), (C,K), (C,CK), (K,E), (K,ES), (E,Σ), (E,LF), (F,AR) } )
This parser is based on the article To CNF or not to CNF? An Efficient Yet Presentable Version of the CYK Algorithm by Martin Lange and Hans Leiß, published in informatica didactica No. 8. Conventions will be based on this article.
Definition at line 319 of file expression.cpp.
typedef bitset<token::END> reg::expression::parser::tokens |
Tokens don't usually come alone.
Definition at line 337 of file expression.cpp.
enum reg::expression::parser::token : unsigned char |
Tokens the grammar deals with.
Definition at line 321 of file expression.cpp.
|
inline |
Initializes with a string to parse and literals to parse for.
Definition at line 429 of file expression.cpp.
|
inlinestatic |
Checks if a token could derive a pair of tokens from two other entries.
symbol | the original token |
left | the set of tokens allowed for the first half of the result pair |
right | the set of tokens allowed for the second half of the result pair |
true
if the original token may lead to any such pair, false
else Definition at line 373 of file expression.cpp.
|
inlinestatic |
Fills a table entry.
The entry will be filled with the set of symbols deriving the subsentence beginning at the diagonal entry in the same row and ending at the diagonal entry in the same column.
row | the row of the entry to fill |
diag | the diagonal of the entry to fill |
table | the table to operate on |
Definition at line 406 of file expression.cpp.
Constructs the reflexive-transitive closure of the inverse unit relation for a given set of symbols.
See Lemma 5 in the article.
m | the set of symbols that may be derived in-place |
Definition at line 354 of file expression.cpp.
|
inline |
Gives the RE resulting from parsing.
exptr
to the RE represented by the initialization string Definition at line 540 of file expression.cpp.
bool reg::expression::parser::badRegularExpression = false |
Signifies that the RE used to initialize this object was invalid.
Definition at line 426 of file expression.cpp.
|
static |
Maps pairs of symbols to the symbols that derive them.
Definition at line 343 of file expression.cpp.
|
static |
Maps symbols that may be derived in-place to their predecessing symbols.
See Definition 2 and Lemma 4 in the article.
Definition at line 339 of file expression.cpp.
literals reg::expression::parser::lits |
Stores interpretations for characters encountered in the parsed string.
Definition at line 422 of file expression.cpp.
vector<char32_t> reg::expression::parser::symbolMapping |
Stores the actual symbols encountered in the RE while parsing.
Definition at line 423 of file expression.cpp.
|
mutable |
Index for when symbols have to be extracted from the mapping.
Definition at line 424 of file expression.cpp.
vector<vector<tokens> > reg::expression::parser::table |
The table of sets of symbols that derive a subsentence.
Definition at line 425 of file expression.cpp.