40 lines
2.5 KiB
Plaintext
40 lines
2.5 KiB
Plaintext
|
TODO:
|
||
|
|
||
|
* Need to go through everything and square it with RightToLeft matching.
|
||
|
The support for this was built into an early version, and lots of things built
|
||
|
afterwards are not savvy about bi-directional matching. Things that spring to
|
||
|
mind: Regex match methods should start at 0 or text.Length depending on
|
||
|
direction. Do split and replace need changes? Match should be aware of its
|
||
|
direction (already applied some of this to NextMatch logic). The interpreter
|
||
|
needs to check left and right bounds. Anchoring and substring discovery need
|
||
|
to be reworked. RTL matches are going to have anchors on the right - ie $, \Z
|
||
|
and \z. This should be added to the anchor logic. QuickSearch needs to work in
|
||
|
reverse. There may be other stuff.... work through the code.
|
||
|
|
||
|
* Add ECMAScript support to the parser. For example, [.\w\s\d] map to ECMA
|
||
|
categories instead of canonical ones [DONE]. There's different behaviour on
|
||
|
backreference/octal disambiguation. Find out what the runtime behavioural
|
||
|
difference is for cyclic backreferences eg (?(1)abc\1) - this is only briefly
|
||
|
mentioned in the spec. I couldn't find much on this in the ECMAScript
|
||
|
specification either.
|
||
|
|
||
|
* Octal/backreference parsing needs a big fix. The rules are ridiculously complex.
|
||
|
|
||
|
* Improve the perl test suite. Run under MS runtime to generate checksums for
|
||
|
each trial. Checksums should incorporate: all captures (index, length) for all
|
||
|
groups; names of explicit capturing groups, and the numbers they map to. Any
|
||
|
other state? RegexTrial.Execute() will then compare result and checksum.
|
||
|
|
||
|
* The pattern (?(1?)a|b). It should fail: Perl fails, the MS implementation
|
||
|
fails, but I pass. The documentation says that the construct (?(X)...) can be
|
||
|
processed in two ways. If X is a valid group number, or a valid group name,
|
||
|
then the expression becomes a capture conditional - the (...) part is
|
||
|
executed only if X has been captured. If X is not a group number or name, then
|
||
|
it is treated as a positive lookahead., and (...) is only executed if the
|
||
|
lookahead succeeds. My code does the latter, but on further investigation it
|
||
|
appears that both Perl and MS fail to recognize an expression assertion if the
|
||
|
first character of the assertion is a number - which instead suggests a
|
||
|
capture conditional. The exception raised is something like "invalid group
|
||
|
number". I get the feeling the my behaviour seems more correct, but it's not
|
||
|
consistent with the other implementations, so it should probably be changed.
|