reglibcpp  1.0.0
(Naïve) C++ implementation of models for regular languages
Classes | Public Types | Public Member Functions | Static Public Member Functions | Static Public Attributes | List of all members
reg::expression Class Reference

Represents formal regular expressions. More...

#include <expression.h>

Classes

struct  literals
 Token literals as used in Introduction to Automata Theory, Languages, and Computation by Hopcroft, Motwani and Ullman. More...
 
struct  parser
 Parses regular expressions. More...
 

Public Types

enum  operation {
  empty, symbol, kleene, concatenation,
  alternation
}
 The different purposes an RE may fulfill. More...
 
typedef std::shared_ptr< expression const > exptr
 This is the type used to handle regular expressions. More...
 

Public Member Functions

size_t size () const
 Reports the size of this RE's tree representation. More...
 
operation getOperation () const
 Reports this RE's function. More...
 
bool operator== (expression const &r) const
 Checks whether this RE is semantically equivalent to another one. More...
 
bool operator!= (expression const &r) const
 Checks whether this RE is semantically different from another one. More...
 
char32_t extractSymbol () const
 Reports this symbol expression's UTF-32-encoded symbol. More...
 
std::string extractUtf8Symbol () const
 Reports this symbol expression's UTF-8-encoded symbol. More...
 
std::u32string to_u32string () const
 Describes this RE in UTF-32-encoded human-readable form. More...
 
std::string to_string () const
 Describes this RE in UTF-32-encoded human-readable form. More...
 
std::vector< exptr >::const_iterator begin () const
 Returns an iterator pointing to this RE's first subexpression. More...
 
std::vector< exptr >::const_iterator end () const
 Returns an iterator pointing behind this RE's last subexpression. More...
 

Static Public Member Functions

static exptr const & spawnEmptySet ()
 Gives an RE representing the empty set ∅. More...
 
static exptr const & spawnEmptyString ()
 Gives an RE representing the empty string ε. More...
 
static exptr const & spawnSymbol (char32_t symbol)
 Gives an RE representing the given UTF-32-encoded symbol. More...
 
static exptr const & spawnSymbol (std::string utf8Symbol)
 Same as above for a UTF-8-encoded symbol. More...
 
static exptr spawnKleene (exptr const &b, bool optimized=true, bool aggressive=false)
 Gives an RE representing the Kleene closure of a given RE. More...
 
static exptr spawnConcatenation (exptr const &l, exptr const &r, bool optimized=true, bool aggressive=false)
 Gives an RE representing the concatenation of two given REs. More...
 
static exptr spawnAlternation (exptr const &l, exptr const &r, bool optimized=true, bool aggressive=false)
 Gives an RE representing the alternation of two given REs. More...
 
static exptr spawnFromString (std::u32string const &re, literals lits=literals(), bool optimized=false, bool aggressive=false)
 Gives an RE encoded in a given string. More...
 
static exptr spawnFromString (std::string const &utf8Re, literals lits=literals(), bool optimized=false, bool aggressive=false)
 Same as above for a UTF-8-encoded string. More...
 

Static Public Attributes

static std::unique_ptr< std::wstring_convert< std::codecvt_utf8< char32_t >, char32_t > > const converter
 Converts between UTF-8-encoded and UTF-32-encoded strings. More...
 

Detailed Description

Represents formal regular expressions.

One should never need to handle such an object directly, however, much less copy or move it and therefore copy and move constructors are deleted.

To work with regular expressions, one should use expression::exptr, which aliases a shared_ptr to an actual object and can be copied and moved to one's heart's content. To access member functions, one might dereference exptrs temporarily or, better yet, use the arrow -> operator.

See also
expression::exptr

Definition at line 31 of file expression.h.

Member Typedef Documentation

◆ exptr

typedef std::shared_ptr<expression const> reg::expression::exptr

This is the type used to handle regular expressions.

Every method works on shared_ptrs to the actual regular expressions, to help with basic comparisons and to save memory.

For example, every symbol's (and the empty string's and the empty set's) regular expression is only instantiated once and then pointed to by as many exptrs as one likes.

Definition at line 43 of file expression.h.

Member Enumeration Documentation

◆ operation

The different purposes an RE may fulfill.

See also
spawnEmptySet
spawnEmptyString
spawnSymbol
spawnKleene
spawnConcatenation
spawnAlternation

Definition at line 84 of file expression.h.

84 { empty, symbol, kleene, concatenation, alternation };

Member Function Documentation

◆ begin()

vector< expression::exptr >::const_iterator reg::expression::begin ( ) const

Returns an iterator pointing to this RE's first subexpression.

Definition at line 302 of file expression.cpp.

302  {
303  return subExpressions.cbegin();
304 }

◆ end()

vector< expression::exptr >::const_iterator reg::expression::end ( ) const

Returns an iterator pointing behind this RE's last subexpression.

Definition at line 307 of file expression.cpp.

307  {
308  return subExpressions.cend();
309 }

◆ extractSymbol()

char32_t reg::expression::extractSymbol ( ) const

Reports this symbol expression's UTF-32-encoded symbol.

This method should only be called on an object whose function is confirmed to be that of a symbol!

Returns
the char encoded within this symbol expression

Definition at line 229 of file expression.cpp.

229  {
230  auto it = find_if(
231  expression::symbols.begin(),
232  expression::symbols.end(),
233  [&](pair<char32_t,exptr> const& entry)->bool{return entry.second.get() == this;}
234  );
235  #ifdef __EXCEPTIONS
236  if (it == expression::symbols.end()) {
237  throw std::logic_error("This RE does not seem to be a valid symbol expression.");
238  }
239  #endif
240  return it->first;
241 }
std::vector< exptr >::const_iterator begin() const
Returns an iterator pointing to this RE&#39;s first subexpression.
Definition: expression.cpp:302
std::vector< exptr >::const_iterator end() const
Returns an iterator pointing behind this RE&#39;s last subexpression.
Definition: expression.cpp:307

◆ extractUtf8Symbol()

string reg::expression::extractUtf8Symbol ( ) const

Reports this symbol expression's UTF-8-encoded symbol.

This method should only be called on an object whose function is confirmed to be that of a symbol!

Returns
the char encoded within this symbol expression

Definition at line 249 of file expression.cpp.

249  {
250  char32_t symbol(extractSymbol());
251  if (symbol == U'\0') {
252  return string();
253  } else {
254  return converter->to_bytes(extractSymbol());
255  }
256 }
static std::unique_ptr< std::wstring_convert< std::codecvt_utf8< char32_t >, char32_t > > const converter
Converts between UTF-8-encoded and UTF-32-encoded strings.
Definition: expression.h:97
char32_t extractSymbol() const
Reports this symbol expression&#39;s UTF-32-encoded symbol.
Definition: expression.cpp:229

◆ getOperation()

expression::operation reg::expression::getOperation ( ) const

Reports this RE's function.

Note that the empty string's function is technically that of a symbol.

Returns
the expression::operation best describing this RE's purpose

Definition at line 195 of file expression.cpp.

195  {
196  return op;
197 }

◆ operator!=()

bool reg::expression::operator!= ( expression const &  r) const

Checks whether this RE is semantically different from another one.

Returns
false if this RE's language is exactly the same as the other one's, true else

Definition at line 219 of file expression.cpp.

219  {
220  return !operator==(r);
221 }
bool operator==(expression const &r) const
Checks whether this RE is semantically equivalent to another one.
Definition: expression.cpp:204

◆ operator==()

bool reg::expression::operator== ( expression const &  r) const

Checks whether this RE is semantically equivalent to another one.

Returns
true if this RE's language is exactly the same as the other one's, false else

Definition at line 204 of file expression.cpp.

204  {
205  if (!acceptingDfa) {
206  acceptingDfa = unique_ptr<dfa const>(new dfa(dfa::builder(gnfa(*this).splitAllTransitions()).purge().merge()));
207  }
208  if (!r.acceptingDfa) {
209  r.acceptingDfa = unique_ptr<dfa const>(new dfa(dfa::builder(gnfa(r).splitAllTransitions()).purge().merge()));
210  }
211  return *acceptingDfa == *r.acceptingDfa;
212 }

◆ size()

size_t reg::expression::size ( ) const

Reports the size of this RE's tree representation.

In this context, an RE's size will be defined recursively as follows:

  • .size() = 0
  • ε.size() = 1
  • <symbol>.size() = 1
  • (l+r).size() = 1 + l.size() + r.size()
  • (lr).size() = 1 + l.size() + r.size()
  • (b*).size() = 1 + b.size()
    Returns
    a measure of how many subexpressions this RE consists of

Definition at line 181 of file expression.cpp.

181  {
182  size_t s(op != operation::empty);
183  for (exptr const& re : subExpressions) {
184  s += re->size();
185  }
186  return s;
187 }
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:43

◆ spawnAlternation()

expression::exptr reg::expression::spawnAlternation ( expression::exptr const &  l,
expression::exptr const &  r,
bool  optimized = true,
bool  aggressive = false 
)
static

Gives an RE representing the alternation of two given REs.

More formally, the RE's language will be L(l+r) = L(l) ∪ L(r).

Parameters
lexptr to one of the REs
rexptr to the other RE
optimizedwhether simplifications on the syntax level should be applied
aggressivewhether the simplifications should check the semantic level
Returns
exptr to the RE representing the alternation of l and r

Definition at line 117 of file expression.cpp.

117  {
118  if (optimized) {
119  if (l == expression::spawnEmptySet()) {
120  return r;
121  }
122  if (r == expression::spawnEmptySet()) {
123  return l;
124  }
125  if (l == r) {
126  return l;
127  }
128  if (aggressive) {
129  if (*l == *expression::spawnEmptySet()) {
130  return r;
131  }
132  if (*r == *expression::spawnEmptySet()) {
133  return l;
134  }
135  if (*l == *r) {
136  return l->size() > r->size() ? r : l;
137  }
138  }
139  }
140  return expression::exptr(new expression(l, r, operation::alternation));
141 }
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:43
static exptr const & spawnEmptySet()
Gives an RE representing the empty set ∅.
Definition: expression.cpp:40

◆ spawnConcatenation()

expression::exptr reg::expression::spawnConcatenation ( expression::exptr const &  l,
expression::exptr const &  r,
bool  optimized = true,
bool  aggressive = false 
)
static

Gives an RE representing the concatenation of two given REs.

More formally, the RE's language will be L(lr) = L(l) • L(r).

Parameters
lexptr to the first RE
rexptr to the second RE
optimizedwhether simplifications on the syntax level should be applied
aggressivewhether the simplifications should check the semantic level
Returns
exptr to the RE representing the concatenation of l and r

Definition at line 81 of file expression.cpp.

81  {
82  if (optimized) {
83  if (l == expression::spawnEmptyString()) {
84  return r;
85  }
86  if (r == expression::spawnEmptyString()) {
87  return l;
88  }
91  }
92  if (aggressive) {
93  if (*l == *expression::spawnEmptyString()) {
94  return r;
95  }
96  if (*r == *expression::spawnEmptyString()) {
97  return l;
98  }
99  if (*r == *expression::spawnEmptySet() || *l == *expression::spawnEmptySet()) {
100  return expression::spawnEmptySet();
101  }
102  }
103  }
104  return expression::exptr(new expression(l, r, operation::concatenation));
105 }
static exptr const & spawnEmptyString()
Gives an RE representing the empty string ε.
Definition: expression.cpp:50
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:43
static exptr const & spawnEmptySet()
Gives an RE representing the empty set ∅.
Definition: expression.cpp:40

◆ spawnEmptySet()

expression::exptr const & reg::expression::spawnEmptySet ( )
static

Gives an RE representing the empty set ∅.

More formally, the RE's language will be {}.

Returns
exptr to the RE representing the empty set ∅

Definition at line 40 of file expression.cpp.

40  {
41  return expression::empty;
42 }

◆ spawnEmptyString()

expression::exptr const & reg::expression::spawnEmptyString ( )
static

Gives an RE representing the empty string ε.

More formally, the RE's language will be {ε}.

Returns
exptr to the RE representing the empty string ε

Definition at line 50 of file expression.cpp.

50  {
51  return expression::spawnSymbol(U'\0');
52 }
static exptr const & spawnSymbol(char32_t symbol)
Gives an RE representing the given UTF-32-encoded symbol.
Definition: expression.cpp:61

◆ spawnFromString() [1/2]

expression::exptr reg::expression::spawnFromString ( std::u32string const &  re,
literals  lits = literals(),
bool  optimized = false,
bool  aggressive = false 
)
static

Gives an RE encoded in a given string.

Parameters
rethe RE in text form
litsliterals of operators in re
optimizedwhether simplifications on the syntax level should be applied
aggressivewhether the simplifications should check the semantic level
Returns
exptr to the RE represented by the given string or to nullptr if it is invalid

Definition at line 613 of file expression.cpp.

613  {
614  parser stringParser(re, lits);
615  #ifdef __EXCEPTIONS
616  try {
617  return stringParser(optimized, aggressive);
618  } catch (std::bad_function_call e) {
619  throw std::invalid_argument("Invalid regular expression.");
620  }
621  #else
622  if (stringParser.badRegularExpression) {
623  return exptr();
624  }
625  return stringParser(optimized, aggressive);
626  #endif
627 }
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:43

◆ spawnFromString() [2/2]

expression::exptr reg::expression::spawnFromString ( std::string const &  utf8Re,
literals  lits = literals(),
bool  optimized = false,
bool  aggressive = false 
)
static

Same as above for a UTF-8-encoded string.

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Definition at line 631 of file expression.cpp.

631  {
632  return spawnFromString(converter->from_bytes(utf8Re), lits, optimized, aggressive);
633 }
static std::unique_ptr< std::wstring_convert< std::codecvt_utf8< char32_t >, char32_t > > const converter
Converts between UTF-8-encoded and UTF-32-encoded strings.
Definition: expression.h:97
static exptr spawnFromString(std::u32string const &re, literals lits=literals(), bool optimized=false, bool aggressive=false)
Gives an RE encoded in a given string.
Definition: expression.cpp:613

◆ spawnKleene()

expression::exptr reg::expression::spawnKleene ( expression::exptr const &  b,
bool  optimized = true,
bool  aggressive = false 
)
static

Gives an RE representing the Kleene closure of a given RE.

More formally, the RE's language will be L(b*) = L(b)*.

Parameters
bexptr to the RE
optimizedwhether simplifications on the syntax level should be applied
aggressivewhether the simplifications should check the semantic level
Returns
exptr to the RE representing the Kleene closure of l

Definition at line 152 of file expression.cpp.

152  {
153  if (optimized) {
156  }
157  if (aggressive) {
158  if (*b == *expression::spawnEmptySet() || *b == *expression::spawnEmptyString()) {
160  }
161  if (*b == expression(b)) {
162  return b;
163  }
164  }
165  }
166  return expression::exptr(new expression(b));
167 }
static exptr const & spawnEmptyString()
Gives an RE representing the empty string ε.
Definition: expression.cpp:50
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:43
static exptr const & spawnEmptySet()
Gives an RE representing the empty set ∅.
Definition: expression.cpp:40

◆ spawnSymbol() [1/2]

expression::exptr const & reg::expression::spawnSymbol ( char32_t  symbol)
static

Gives an RE representing the given UTF-32-encoded symbol.

More formally, the RE's language will be {<symbol>}.

Parameters
symbolthe symbol the RE should represent or "" for the empty string ε
Returns
exptr to the RE representing the symbol

Definition at line 61 of file expression.cpp.

61  {
62  return expression::symbols.insert(pair<char32_t,exptr>(symbol, exptr(new expression(symbol)))).first->second;
63 }
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:43

◆ spawnSymbol() [2/2]

expression::exptr const & reg::expression::spawnSymbol ( std::string  utf8Symbol)
static

Same as above for a UTF-8-encoded symbol.

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Definition at line 66 of file expression.cpp.

66  {
67  char32_t u32Symbol(converter->from_bytes(utf8Symbol)[0]);
68  return expression::symbols.insert(pair<char32_t,exptr>(u32Symbol, exptr(new expression(u32Symbol)))).first->second;
69 }
static std::unique_ptr< std::wstring_convert< std::codecvt_utf8< char32_t >, char32_t > > const converter
Converts between UTF-8-encoded and UTF-32-encoded strings.
Definition: expression.h:97
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:43

◆ to_string()

string reg::expression::to_string ( ) const

Describes this RE in UTF-32-encoded human-readable form.

Definition at line 297 of file expression.cpp.

297  {
298  return converter->to_bytes(to_u32string());
299 }
static std::unique_ptr< std::wstring_convert< std::codecvt_utf8< char32_t >, char32_t > > const converter
Converts between UTF-8-encoded and UTF-32-encoded strings.
Definition: expression.h:97
std::u32string to_u32string() const
Describes this RE in UTF-32-encoded human-readable form.
Definition: expression.cpp:259

◆ to_u32string()

u32string reg::expression::to_u32string ( ) const

Describes this RE in UTF-32-encoded human-readable form.

Definition at line 259 of file expression.cpp.

259  {
260  switch (op) {
261  case operation::alternation : return subExpressions[0]->to_u32string().append(U"+").append(subExpressions[1]->to_u32string());
262  case operation::concatenation : {
263  u32string concat;
264  if (subExpressions[0]->op >= operation::alternation) {
265  concat.append(1, '(');
266  concat.append(subExpressions[0]->to_u32string());
267  concat.append(1, ')');
268  } else {
269  concat.append(subExpressions[0]->to_u32string());
270  }
271  if (subExpressions[1]->op >= operation::alternation) {
272  concat.append(1, '(');
273  concat.append(subExpressions[1]->to_u32string());
274  concat.append(1, ')');
275  } else {
276  concat.append(subExpressions[1]->to_u32string());
277  }
278  return concat;
279  }
280  case operation::kleene : {
281  if (subExpressions[0]->op >= operation::concatenation) {
282  return u32string(1, U'(').append(subExpressions[0]->to_u32string()).append(U")*");
283  } else {
284  return subExpressions[0]->to_u32string().append(1, '*');
285  }
286  }
287  case operation::symbol : {
288  char32_t symbol = extractSymbol();
289  return symbol == U'\0' ? u32string(U"ε") : u32string(1, symbol);
290  }
291  case operation::empty : return u32string(U"∅");
292  default : return u32string();
293  }
294 }
std::u32string to_u32string() const
Describes this RE in UTF-32-encoded human-readable form.
Definition: expression.cpp:259
char32_t extractSymbol() const
Reports this symbol expression&#39;s UTF-32-encoded symbol.
Definition: expression.cpp:229

Member Data Documentation

◆ converter

auto const reg::expression::converter
static
Initial value:
= unique_ptr<std::wstring_convert<std::codecvt_utf8<char32_t>,char32_t>>(
new std::wstring_convert<std::codecvt_utf8<char32_t>,char32_t>()
)

Converts between UTF-8-encoded and UTF-32-encoded strings.

Definition at line 97 of file expression.h.


The documentation for this class was generated from the following files: