reglibcpp  1.3.0
(Naïve) C++ implementation of models for regular languages
Classes | Public Types | Public Member Functions | Static Public Member Functions | List of all members
reg::expression Class Reference

Represents formal regular expressions. More...

#include <expression.h>

Classes

struct  literals
 Token literals as used in Introduction to Automata Theory, Languages, and Computation by Hopcroft, Motwani and Ullman. More...
 
struct  parser
 Parses regular expressions. More...
 

Public Types

enum  operation {
  empty, symbol, kleene, concatenation,
  alternation
}
 The different purposes an RE may fulfill. More...
 
typedef std::shared_ptr< expression const > exptr
 This is the type used to handle regular expressions. More...
 

Public Member Functions

size_t size () const
 Reports the size of this RE's tree representation. More...
 
operation getOperation () const
 Reports this RE's function. More...
 
bool operator== (expression const &r) const
 Checks whether this RE is semantically equivalent to another one. More...
 
bool operator!= (expression const &r) const
 Checks whether this RE is semantically different from another one. More...
 
char32_t extractSymbol () const
 Reports this symbol expression's UTF-32-encoded symbol. More...
 
std::string extractUtf8Symbol () const
 Reports this symbol expression's UTF-8-encoded symbol. More...
 
std::u32string to_u32string () const
 Describes this RE in UTF-32-encoded human-readable form. More...
 
std::string to_string () const
 Describes this RE in UTF-8-encoded human-readable form. More...
 
std::vector< exptr >::const_iterator begin () const
 Returns an iterator pointing to this RE's first subexpression. More...
 
std::vector< exptr >::const_iterator end () const
 Returns an iterator pointing behind this RE's last subexpression. More...
 

Static Public Member Functions

static exptr const & spawnEmptySet ()
 Gives an RE representing the empty set ∅. More...
 
static exptr const & spawnEmptyString ()
 Gives an RE representing the empty string ε. More...
 
static exptr const & spawnSymbol (char32_t symbol)
 Gives an RE representing the given UTF-32-encoded symbol. More...
 
static exptr const & spawnSymbol (std::string const &utf8Symbol)
 Same as above for a UTF-8-encoded symbol. More...
 
static exptr spawnKleene (exptr const &b, bool optimized=true, bool aggressive=false)
 Gives an RE representing the Kleene closure of a given RE. More...
 
static exptr spawnConcatenation (exptr const &l, exptr const &r, bool optimized=true, bool aggressive=false)
 Gives an RE representing the concatenation of two given REs. More...
 
static exptr spawnAlternation (exptr const &l, exptr const &r, bool optimized=true, bool aggressive=false)
 Gives an RE representing the alternation of two given REs. More...
 
static exptr spawnFromString (std::u32string const &re, literals lits=literals(), bool optimized=false, bool aggressive=false)
 Gives an RE encoded in a given string. More...
 
static exptr spawnFromString (std::string const &utf8Re, literals lits=literals(), bool optimized=false, bool aggressive=false)
 Same as above for a UTF-8-encoded string. More...
 

Detailed Description

Represents formal regular expressions.

One should never need to handle such an object directly, however, much less copy or move it and therefore copy and move constructors are deleted.

To work with regular expressions, one should use expression::exptr, which aliases a shared_ptr to an actual object and can be copied and moved to one's heart's content. To access member functions, one might dereference exptrs temporarily or, better yet, use the arrow -> operator.

See also
expression::exptr

Definition at line 27 of file expression.h.

Member Typedef Documentation

◆ exptr

typedef std::shared_ptr<expression const> reg::expression::exptr

This is the type used to handle regular expressions.

Every method works on shared_ptrs to the actual regular expressions, to help with basic comparisons and to save memory.

For example, every symbol's (and the empty string's and the empty set's) regular expression is only instantiated once and then pointed to by as many exptrs as one likes.

Definition at line 39 of file expression.h.

Member Enumeration Documentation

◆ operation

The different purposes an RE may fulfill.

See also
spawnEmptySet
spawnEmptyString
spawnSymbol
spawnKleene
spawnConcatenation
spawnAlternation

Definition at line 79 of file expression.h.

79 { empty, symbol, kleene, concatenation, alternation };

Member Function Documentation

◆ begin()

vector< expression::exptr >::const_iterator reg::expression::begin ( ) const

Returns an iterator pointing to this RE's first subexpression.

Definition at line 284 of file expression.cpp.

284  {
285  return subExpressions.cbegin();
286 }

◆ end()

vector< expression::exptr >::const_iterator reg::expression::end ( ) const

Returns an iterator pointing behind this RE's last subexpression.

Definition at line 289 of file expression.cpp.

289  {
290  return subExpressions.cend();
291 }

◆ extractSymbol()

char32_t reg::expression::extractSymbol ( ) const

Reports this symbol expression's UTF-32-encoded symbol.

This method should only be called on an object whose function is confirmed to be that of a symbol!

Returns
the char encoded within this symbol expression

Definition at line 212 of file expression.cpp.

212  {
213  auto it = std::find_if(
214  expression::symbols.begin(),
215  expression::symbols.end(),
216  [&](pair<char32_t,exptr> const& entry)->bool{return entry.second.get() == this;}
217  );
218  #ifdef __EXCEPTIONS
219  if (it == expression::symbols.end()) {
220  throw std::logic_error("This RE does not seem to be a valid symbol expression.");
221  }
222  #endif
223  return it->first;
224 }
std::vector< exptr >::const_iterator begin() const
Returns an iterator pointing to this RE's first subexpression.
Definition: expression.cpp:284
std::vector< exptr >::const_iterator end() const
Returns an iterator pointing behind this RE's last subexpression.
Definition: expression.cpp:289

◆ extractUtf8Symbol()

string reg::expression::extractUtf8Symbol ( ) const

Reports this symbol expression's UTF-8-encoded symbol.

This method should only be called on an object whose function is confirmed to be that of a symbol!

Returns
the char encoded within this symbol expression

Definition at line 231 of file expression.cpp.

231  {
232  char32_t symbol(extractSymbol());
233  if (symbol == U'\0') {
234  return string();
235  } else {
236  return converter.to_bytes(extractSymbol());
237  }
238 }
char32_t extractSymbol() const
Reports this symbol expression's UTF-32-encoded symbol.
Definition: expression.cpp:212
std::wstring_convert< std::codecvt_utf8< char32_t >, char32_t > converter
Converts between UTF-8-encoded and UTF-32-encoded strings.
Definition: dfa.cpp:1001

◆ getOperation()

expression::operation reg::expression::getOperation ( ) const

Reports this RE's function.

Note that the empty string's function is technically that of a symbol.

Returns
the expression::operation best describing this RE's purpose

Definition at line 181 of file expression.cpp.

181  {
182  return op;
183 }

◆ operator!=()

bool reg::expression::operator!= ( expression const &  r) const

Checks whether this RE is semantically different from another one.

Returns
false if this RE's language is exactly the same as the other one's, true else

Definition at line 203 of file expression.cpp.

203  {
204  return !operator==(r);
205 }
bool operator==(expression const &r) const
Checks whether this RE is semantically equivalent to another one.
Definition: expression.cpp:189

◆ operator==()

bool reg::expression::operator== ( expression const &  r) const

Checks whether this RE is semantically equivalent to another one.

Returns
true if this RE's language is exactly the same as the other one's, false else

Definition at line 189 of file expression.cpp.

189  {
190  if (!acceptingDfa) {
191  acceptingDfa = make_unique<dfa const>(dfa::builder(gnfa(*this).splitAllTransitions()).minimize());
192  }
193  if (!r.acceptingDfa) {
194  r.acceptingDfa = make_unique<dfa const>(dfa::builder(gnfa(r).splitAllTransitions()).minimize());
195  }
196  return *acceptingDfa == *r.acceptingDfa;
197 }

◆ size()

size_t reg::expression::size ( ) const

Reports the size of this RE's tree representation.

In this context, an RE's size will be defined recursively as follows:

  • .size() = 0
  • ε.size() = 1
  • <symbol>.size() = 1
  • (l+r).size() = 1 + l.size() + r.size()
  • (lr).size() = 1 + l.size() + r.size()
  • (b*).size() = 1 + b.size()
    Returns
    a measure of how many subexpressions this RE consists of

Definition at line 168 of file expression.cpp.

168  {
169  size_t s(op != operation::empty);
170  for (exptr const& re : subExpressions) {
171  s += re->size();
172  }
173  return s;
174 }
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:39

◆ spawnAlternation()

expression::exptr reg::expression::spawnAlternation ( expression::exptr const &  l,
expression::exptr const &  r,
bool  optimized = true,
bool  aggressive = false 
)
static

Gives an RE representing the alternation of two given REs.

More formally, the RE's language will be L(l+r) = L(l) ∪ L(r).

Parameters
lexptr to one of the REs
rexptr to the other RE
optimizedwhether simplifications on the syntax level should be applied
aggressivewhether the simplifications should check the semantic level
Returns
exptr to the RE representing the alternation of l and r

Definition at line 106 of file expression.cpp.

106  {
107  if (optimized) {
108  if (l == expression::spawnEmptySet()) {
109  return r;
110  }
111  if (r == expression::spawnEmptySet()) {
112  return l;
113  }
114  if (l == r) {
115  return l;
116  }
117  if (aggressive) {
118  if (*l == *expression::spawnEmptySet()) {
119  return r;
120  }
121  if (*r == *expression::spawnEmptySet()) {
122  return l;
123  }
124  if (*l == *r) {
125  return l->size() > r->size() ? r : l;
126  }
127  }
128  }
129  return expression::exptr(new expression(l, r, operation::alternation));
130 }
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:39
static exptr const & spawnEmptySet()
Gives an RE representing the empty set ∅.
Definition: expression.cpp:33

◆ spawnConcatenation()

expression::exptr reg::expression::spawnConcatenation ( expression::exptr const &  l,
expression::exptr const &  r,
bool  optimized = true,
bool  aggressive = false 
)
static

Gives an RE representing the concatenation of two given REs.

More formally, the RE's language will be L(lr) = L(l) • L(r).

Parameters
lexptr to the first RE
rexptr to the second RE
optimizedwhether simplifications on the syntax level should be applied
aggressivewhether the simplifications should check the semantic level
Returns
exptr to the RE representing the concatenation of l and r

Definition at line 71 of file expression.cpp.

71  {
72  if (optimized) {
73  if (l == expression::spawnEmptyString()) {
74  return r;
75  }
76  if (r == expression::spawnEmptyString()) {
77  return l;
78  }
81  }
82  if (aggressive) {
83  if (*l == *expression::spawnEmptyString()) {
84  return r;
85  }
86  if (*r == *expression::spawnEmptyString()) {
87  return l;
88  }
89  if (*r == *expression::spawnEmptySet() || *l == *expression::spawnEmptySet()) {
91  }
92  }
93  }
94  return expression::exptr(new expression(l, r, operation::concatenation));
95 }
static exptr const & spawnEmptyString()
Gives an RE representing the empty string ε.
Definition: expression.cpp:42
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:39
static exptr const & spawnEmptySet()
Gives an RE representing the empty set ∅.
Definition: expression.cpp:33

◆ spawnEmptySet()

expression::exptr const & reg::expression::spawnEmptySet ( )
static

Gives an RE representing the empty set ∅.

More formally, the RE's language will be {}.

Returns
exptr to the RE representing the empty set ∅

Definition at line 33 of file expression.cpp.

33  {
34  return expression::empty;
35 }

◆ spawnEmptyString()

expression::exptr const & reg::expression::spawnEmptyString ( )
static

Gives an RE representing the empty string ε.

More formally, the RE's language will be {ε}.

Returns
exptr to the RE representing the empty string ε

Definition at line 42 of file expression.cpp.

42  {
43  return expression::spawnSymbol(U'\0');
44 }
static exptr const & spawnSymbol(char32_t symbol)
Gives an RE representing the given UTF-32-encoded symbol.
Definition: expression.cpp:52

◆ spawnFromString() [1/2]

expression::exptr reg::expression::spawnFromString ( std::u32string const &  re,
literals  lits = literals(),
bool  optimized = false,
bool  aggressive = false 
)
static

Gives an RE encoded in a given string.

Parameters
rethe RE in text form
litsliterals of operators in re
optimizedwhether simplifications on the syntax level should be applied
aggressivewhether the simplifications should check the semantic level
Returns
exptr to the RE represented by the given string or to nullptr if it is invalid

Definition at line 587 of file expression.cpp.

587  {
588  parser stringParser(re, lits);
589  #ifdef __EXCEPTIONS
590  try {
591  return stringParser(optimized, aggressive);
592  } catch (std::bad_function_call e) {
593  throw std::invalid_argument("Invalid regular expression.");
594  }
595  #else
596  if (stringParser.badRegularExpression) {
597  return exptr();
598  }
599  return stringParser(optimized, aggressive);
600  #endif
601 }
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:39

◆ spawnFromString() [2/2]

expression::exptr reg::expression::spawnFromString ( std::string const &  utf8Re,
literals  lits = literals(),
bool  optimized = false,
bool  aggressive = false 
)
static

Same as above for a UTF-8-encoded string.

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Definition at line 605 of file expression.cpp.

605  {
606  return spawnFromString(converter.from_bytes(utf8Re), lits, optimized, aggressive);
607 }
static exptr spawnFromString(std::u32string const &re, literals lits=literals(), bool optimized=false, bool aggressive=false)
Gives an RE encoded in a given string.
Definition: expression.cpp:587
std::wstring_convert< std::codecvt_utf8< char32_t >, char32_t > converter
Converts between UTF-8-encoded and UTF-32-encoded strings.
Definition: dfa.cpp:1001

◆ spawnKleene()

expression::exptr reg::expression::spawnKleene ( expression::exptr const &  b,
bool  optimized = true,
bool  aggressive = false 
)
static

Gives an RE representing the Kleene closure of a given RE.

More formally, the RE's language will be L(b*) = L(b)*.

Parameters
bexptr to the RE
optimizedwhether simplifications on the syntax level should be applied
aggressivewhether the simplifications should check the semantic level
Returns
exptr to the RE representing the Kleene closure of l

Definition at line 140 of file expression.cpp.

140  {
141  if (optimized) {
144  }
145  if (aggressive) {
146  if (*b == *expression::spawnEmptySet() || *b == *expression::spawnEmptyString()) {
148  }
149  if (*b == expression(b)) {
150  return b;
151  }
152  }
153  }
154  return expression::exptr(new expression(b));
155 }
static exptr const & spawnEmptyString()
Gives an RE representing the empty string ε.
Definition: expression.cpp:42
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:39
static exptr const & spawnEmptySet()
Gives an RE representing the empty set ∅.
Definition: expression.cpp:33

◆ spawnSymbol() [1/2]

expression::exptr const & reg::expression::spawnSymbol ( char32_t  symbol)
static

Gives an RE representing the given UTF-32-encoded symbol.

More formally, the RE's language will be {<symbol>}.

Parameters
symbolthe symbol the RE should represent or "" for the empty string ε
Returns
exptr to the RE representing the symbol

Definition at line 52 of file expression.cpp.

52  {
53  return expression::symbols.insert(make_pair(symbol, exptr(new expression(symbol)))).first->second;
54 }
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:39

◆ spawnSymbol() [2/2]

expression::exptr const & reg::expression::spawnSymbol ( std::string const &  utf8Symbol)
static

Same as above for a UTF-8-encoded symbol.

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Definition at line 57 of file expression.cpp.

57  {
58  char32_t u32Symbol(converter.from_bytes(utf8Symbol)[0]);
59  return expression::symbols.insert(make_pair(u32Symbol, exptr(new expression(u32Symbol)))).first->second;
60 }
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:39
std::wstring_convert< std::codecvt_utf8< char32_t >, char32_t > converter
Converts between UTF-8-encoded and UTF-32-encoded strings.
Definition: dfa.cpp:1001

◆ to_string()

string reg::expression::to_string ( ) const

Describes this RE in UTF-8-encoded human-readable form.

Definition at line 279 of file expression.cpp.

279  {
280  return converter.to_bytes(to_u32string());
281 }
std::u32string to_u32string() const
Describes this RE in UTF-32-encoded human-readable form.
Definition: expression.cpp:241
std::wstring_convert< std::codecvt_utf8< char32_t >, char32_t > converter
Converts between UTF-8-encoded and UTF-32-encoded strings.
Definition: dfa.cpp:1001

◆ to_u32string()

u32string reg::expression::to_u32string ( ) const

Describes this RE in UTF-32-encoded human-readable form.

Definition at line 241 of file expression.cpp.

241  {
242  switch (op) {
243  case operation::alternation : return subExpressions[0]->to_u32string().append(U"+").append(subExpressions[1]->to_u32string());
244  case operation::concatenation : {
245  u32string concat;
246  if (subExpressions[0]->op >= operation::alternation) {
247  concat.append(1, '(');
248  concat.append(subExpressions[0]->to_u32string());
249  concat.append(1, ')');
250  } else {
251  concat.append(subExpressions[0]->to_u32string());
252  }
253  if (subExpressions[1]->op >= operation::alternation) {
254  concat.append(1, '(');
255  concat.append(subExpressions[1]->to_u32string());
256  concat.append(1, ')');
257  } else {
258  concat.append(subExpressions[1]->to_u32string());
259  }
260  return concat;
261  }
262  case operation::kleene : {
263  if (subExpressions[0]->op >= operation::concatenation) {
264  return u32string(1, U'(').append(subExpressions[0]->to_u32string()).append(U")*");
265  } else {
266  return subExpressions[0]->to_u32string().append(1, '*');
267  }
268  }
269  case operation::symbol : {
270  char32_t symbol = extractSymbol();
271  return symbol == U'\0' ? u32string(U"ε") : u32string(1, symbol);
272  }
273  case operation::empty : return u32string(U"∅");
274  default : return u32string();
275  }
276 }
std::u32string to_u32string() const
Describes this RE in UTF-32-encoded human-readable form.
Definition: expression.cpp:241
char32_t extractSymbol() const
Reports this symbol expression's UTF-32-encoded symbol.
Definition: expression.cpp:212

The documentation for this class was generated from the following files: