However if regex is case insensitive alphabetic characters do have a
special meaning of "this character or same character in opposite case"
and such special meaning is not ignored in \x sequences.
You're reading too much into what constitutes "special meaning." In this case, it refers to single characters that direct interpretation of the input such as .
and *
. Escaping those characters in any form treats them as literals. In other words, .
means match any character but \.
or \x2e
means match a period. (There are exceptions, such as \d
, which is fine because there's never a need to escape a lowercase d
.)
Special meaning does not include behavior brought on by using modifiers. Using i
tells the engine to use a different algorithm for comparing single characters, which happens long after the expression itself has been parsed. The \x
escape will have been applied when the expression itself was processed, meaning that \x41
will already have been interpreted as A
.
Escape sequences have "raw byte" semantics, but not always. Does it seem to you like a design awkwardness?
Escapes originally came to exist as a way to embed characters in files that could not otherwise be represented using printable characters. If embedded as-is, these characters might be interpreted as control sequences or would simply be invisible when looking at them in an editor or on a printed page. Regular expressions co-opted this concept and added all sorts of additional escapes that had semantic rather than literal meaning, such as the \(...\)
construct with which you're probably familiar.
How awkward that is would be a matter of opinion, but there really isn't any way to denote special meaning in strings without selecting one of the otherwise-valid characters to do it.
Is there a way to cause pcre to treat \x sequences as raw bytes that ignore case sensitivity?
No, there isn't. Perl doesn't allow it, so PCRE won't, either.
Perl regular expressions do have a way to enable or disable modifiers within spans of characters (see the Extended Patters section of perlre):
/(?i:case-insensitive)case-sensitive/
/(?-i:case-sensitive)case-insensitive/i
If you have your regular expression available as a string, you could develop a function that does the equivalent of s/((?<!\\)\\x[[:xdigit:]]{2})/(?-i:$1)/g
, which would spare you having to modify PCRE.
As non standard patch? If no how if at all pcre engine code can be
modified to provide such functionality?
Anything's possible, but I'm not going to tear into PCRE and figure it out for you. You'd have to alter RE parser so that it stores literals specified with \x
as a case-sensitive span instead of a literal. You would run the risk of breaking expressions that depend on the standard behavior, so you'd also have to add a modifier that explicitly enables it.
By the way, if you're simply searching for one string inside another, using m//
is overkill. Calling index()
would be simpler and much more efficient.