0

It often happens that after designing my regexp (on regex101.com) I want to paste it in my program. Consider this regexp that matches numbers and string (but keep in mind this is general question!):

^(\"(?:[^\"]|\\\")*\"|\-?[0-9]+(?:\.[0-9]+)?)$

I overlined all characters that need to be escaped before pasting them into languages that use " for strings.

Needless to say, doing this manually drives me crazy. I face this problem both at work with C++ project and at home with Java and JavaScript projects.

How can I deal with this efficiently?

Tomáš Zato
  • 853
  • 1
  • 10
  • 20
  • 2
    Many regexp engines have `\Q` ... `\E` for exactly this purpose. Look at the documentation of the one you're using. – Kilian Foth May 25 '16 at 11:54
  • Call an escaping function compatible with the language you're using? Or use a raw-string-literal feature if your language offers it. – CodesInChaos May 25 '16 at 11:54
  • 1
    @KilianFoth I think this is about the language in which you embed the regex as a string literal requiring further escapes, not about the escape sequences regex needs. – CodesInChaos May 25 '16 at 11:56
  • Sure, it's about the language. I will specify that I am asking about languages that use double quotes (`"`) for strings and backslashes (\) for escaping. – Tomáš Zato May 25 '16 at 11:59
  • 4
    Many IDEs (e.g. Eclipse) have a setting "add escapes when pasting into literals" that does exactly this. Just write `""` and then paste your regex into that. – Kilian Foth May 25 '16 at 12:04
  • 4
    regex101.com has a "code generator" button, which generates PHP, Javascript and Python and escapes the regex. There's no C++ or Java but you could probably just copy the regex from the PHP string. – kapex May 25 '16 at 16:43
  • @kapep I think that's the answer I need. – Tomáš Zato May 25 '16 at 17:39
  • Snakry: Use a good language like python which supports both. Or an editor which offers to escape on pasting. I was unware thas an editor without that feature exists... – Christian Sauer Oct 22 '18 at 09:25
  • On a side note, a lot of regex engines support `\d` which is equivalent to `[0-9]`. This can remove a bit of noise from your pattern. – JimmyJames May 20 '19 at 14:09
  • Search for "verbatim strings" and/or "here strings" for the language at hand. That should make your life easier. – Martin Maat Mar 10 '21 at 18:10

4 Answers4

1

If you feel it to be worth make your small DSL (or maybe it already exists) so you can do (java):

// ^(\"(?:[^\"]|\\\")*\"|\-?[0-9]+(?:\.[0-9]+)?)$
// @formatter:off
Pattern pattern = Patterning.start() // ^
    .group()
    .lookahead()
        ...
        .set("0-9").plus()
        .string("E=m.c^2") // \Q ... \E
    .lookaheadEnd()
    .groupEnd();
    .end()                               // $
    .build();
// @formatter:on

class Patterning { ... }

Though most people know regex; or at least it is worth learning regex, if only to do powerfull replaces in the editor.

Joop Eggen
  • 2,011
  • 12
  • 10
1

In C++, use raw string literals (added in C++11). Nothing between the delimiter sequences is treated as an escape:

const char *regex = R"-regexp-(^(\"(?:[^\"]|\\\")*\"|\-?[0-9]+(?:\.[0-9]+)?)$)-regexp-";

in this case the delimiters are the literal strings -regexp-( and )-regexp-

Useless
  • 12,380
  • 2
  • 34
  • 46
  • This feature is getting very popular, also available in Swift and Java and probably coming soon to other languages. – gnasher729 Jul 09 '21 at 19:15
0

Use Unicode character escapes instead of literals. For example:

  • Java

    boolean b = Pattern.matches("\u0022", '"');
    
  • JavaScript

    /\u0022/.test('"');
    
  • Perl

    '"' =~ /\N{U+0022}/;
    

In addition, strings that are compiled to regular expressions can use line breaks for added clarity:

  • Java

    boolean phone_mask = Pattern.matches("^[^0-9]*"/* Optional non-numeric characters */ +
                            "\\+9{3}" /* Followed by a plus sign and three nines */ +
                            "\\s9"    /* Followed by a space and one nine */  +
                            "\\s9{3}" /* Followed by a space and three nines */ +
                            "\\s9{4}" /* Followed by a space and four nines */ +
                            "$", "Phone: +999 9 999 9999");
    
  • JavaScript

    var phone_mask = RegExp("^[^0-9]*"/* Optional non-numeric characters */ +
                            "\\+9{3}" /* Followed by a plus sign and three nines */ +
                            "\\s9"    /* Followed by a space and one nine */  +
                            "\\s9{3}" /* Followed by a space and three nines */ +
                            "\\s9{4}" /* Followed by a space and four nines */ +
                            "$").test("Phone: +999 9 999 9999");
    

References

Deduplicator
  • 8,591
  • 5
  • 31
  • 50
Paul Sweatte
  • 382
  • 2
  • 15
  • While this does solve the problem at hand, it leaves behind the problem that it makes the regular expression even more cryptic than it was to start with. Of course, [you already have two problems anyway](https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/), so maybe this third isn't a huge issue... – Jules Aug 18 '18 at 10:59
-2

Write a program that interprets and escapes your regexps for you. You can either use this to generate the code needed to paste into your source or have it work on the fly having your regexp in a separate file.

For the file version, a big downside is: not having your logic with your source.

Pieter B
  • 12,867
  • 1
  • 40
  • 65
  • [Now you have three problems...](https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/) – user949300 Nov 10 '20 at 18:07