ada/VSS

mirror of https://github.com/AdaCore/VSS.git synced 2026-02-12 13:06:25 -08:00

Files

Maxim Reznik e85309344c RegExp performance optimization

* Move Ada.Containers.Vectors instances out of local scope to avoid
  instantance initialization on each Match call
* Replace State_Vectors with an array to avoid overheads on container
  operations
* Create a copy of VSS.Implementation.Strings.Cursor without
  initialization to avoid initialization of a possible big state vector
* Replace Instruction_Vectors with arrays in Match to avoid overheads
  on container operations
* Move Character_Iterator and Pos to outer scope in Match to avoid
  overheads on class-wide casts
* Unroll Append_State manualy because compiler can't make tail calls

2023-05-16 13:08:23 +03:00

implementation

RegExp performance optimization

2023-05-16 13:08:23 +03:00

README.md

RegExp: implement \d, \s and \w escapes.

2023-03-27 17:55:45 +03:00

vss-regular_expressions-utilities.ads

W216-024: Change file headers to Apache 2.0 with LLVM exceptions

2023-03-10 09:36:11 +01:00

vss-regular_expressions.ads

Fix compiler warnings.

2023-05-11 19:07:52 +04:00

README.md

RegExp engine

This regexp engine should implement ECMAScript Regular Expression (Unicode Mode), but currently only part of specification is implemented.

For now we have:

RegExp	Description
x y	Match the x then y
x \| y	Match either the x or y
x *	Match the x zero or more times
x +	Match the x one or more times
x ?	Match the x zero or one times
(:? x )	Non-capturing group
( x )	Capturing group
\p{ N }	Char of the general category N
\P{ N }	Char not of the general category N
[ x ]	Character class x
[^ x ]	Character not in the class x
[ x - y ]	Character in range x..y
[\p{ N }]	Char of the general category N
[\P{ N }]	Char not of the general category N
^	Start of line assertion
$	End of line assertion
\b	Word boundary assertion
\B	Not a word boundary assertion
\d	A digit (like `[0-9]`)
\D	Not a digit (like `[^0-9]`)
\s	A whitespace (like `[\p{z}\r\n\t\f\v]`)
\S	Not a whitespace
\w	A word character (like `[A-Za-z0-9_]`)
\W	Not a word char (like `[^A-Za-z0-9_]`)
x	Character literal x, not in `^$.*+?]{}`
\ x	Character literal x in `^$.*+?]{}`
\n \r \t	New line, tabulation and other controls
[\n\r]	The same in a character class

Useful articles

https://swtch.com/~rsc/regexp/