## Extended Regular Expressions: Succinctness and Decidability

(Dominik D. Freydenberger)

### Abstract

Most modern implementations of regular expression engines allow the use of variables (also called backreferences). The resulting extended regular expressions (which, in the literature, are also called practical regular expressions, rewbr, or regex) are able to express non-regular languages.

The present paper demonstrates that extended regular-expressions cannot be minimized effectively (neither with respect to length, nor number of variables), and that the tradeoff in size between extended and “classical” regular expressions is not bounded by any recursive function. In addition to this, we prove the undecidability of several decision problems (universality, regularity, and cofiniteness) for extended regular expressions. Furthermore, we show that all these results hold even if the ex- tended regular expressions contain only a single variable.

### Versions of the Paper

`^(((ba{`n-1`}b[ab]*)|(ba{`n+1`}[ab]*)|(a[ab]*)|([ab]*a)|([ab]*bb[ab]*)|(b)|([ab]*aab)|([ab]*b(aa)*ab(aaaaaa)*(()|(a|aa|aaa|aaaaa))b[ab]*))|([ab]*b(?P<x>(a*))(((?P=x)b(?P=x)a)|((?P=x)ab(?P=x){6}aaaaa)|((?P=x)(aa)+b(?P=x)b)|((?P=x)(aa)+ab(?P=x){6}aaaab))[ab]*))\$`
By choosing a value for n (and replacing n-1 and n+1 appropriately), we obtain a regular expression that is equivalent to `^[ab]*\$` if and only if iterating the Collatz function on starting value n leads to the cycle 4, 2, 1. This is quite easy to see after reading both papers — but considering the length of those, I would be willing to provide a more detailed explanation, if someone asks nicely.