Jo Shields a575963da9 Imported Upstream version 3.6.0
Former-commit-id: da6be194a6b1221998fc28233f2503bd61dd9d14
2014-08-13 10:39:27 +01:00

40 lines
2.5 KiB
Plaintext

TODO:
* Need to go through everything and square it with RightToLeft matching.
The support for this was built into an early version, and lots of things built
afterwards are not savvy about bi-directional matching. Things that spring to
mind: Regex match methods should start at 0 or text.Length depending on
direction. Do split and replace need changes? Match should be aware of its
direction (already applied some of this to NextMatch logic). The interpreter
needs to check left and right bounds. Anchoring and substring discovery need
to be reworked. RTL matches are going to have anchors on the right - ie $, \Z
and \z. This should be added to the anchor logic. QuickSearch needs to work in
reverse. There may be other stuff.... work through the code.
* Add ECMAScript support to the parser. For example, [.\w\s\d] map to ECMA
categories instead of canonical ones [DONE]. There's different behaviour on
backreference/octal disambiguation. Find out what the runtime behavioural
difference is for cyclic backreferences eg (?(1)abc\1) - this is only briefly
mentioned in the spec. I couldn't find much on this in the ECMAScript
specification either.
* Octal/backreference parsing needs a big fix. The rules are ridiculously complex.
* Improve the perl test suite. Run under MS runtime to generate checksums for
each trial. Checksums should incorporate: all captures (index, length) for all
groups; names of explicit capturing groups, and the numbers they map to. Any
other state? RegexTrial.Execute() will then compare result and checksum.
* The pattern (?(1?)a|b). It should fail: Perl fails, the MS implementation
fails, but I pass. The documentation says that the construct (?(X)...) can be
processed in two ways. If X is a valid group number, or a valid group name,
then the expression becomes a capture conditional - the (...) part is
executed only if X has been captured. If X is not a group number or name, then
it is treated as a positive lookahead., and (...) is only executed if the
lookahead succeeds. My code does the latter, but on further investigation it
appears that both Perl and MS fail to recognize an expression assertion if the
first character of the assertion is a number - which instead suggests a
capture conditional. The exception raised is something like "invalid group
number". I get the feeling the my behaviour seems more correct, but it's not
consistent with the other implementations, so it should probably be changed.