sore-inf

Introduction

sore-inf is a tool that learns deterministic single occurrence regular expressions (SOREs) from positive examples. This is an implementation of the learning algorithms from the paper that Timo Kötzing and I published at ICDT 2013, containing bugfixes from the journal version (see there for links to both versions). For further explanation, see the paper. If you want to learn DTDs, use the dtd-inf tool.

Installation

You need to install Python 3 on your computer (I do not know or care whether Python 2 will work). Download the package, unpack it. You can then run python3 sore-inf.py --help. (Depending on your system, you can give sore-inf.py executable rights and run it directly.)

Example usage

./sore-inf.py abc acb c
Computes a deterministic SORE for the sample consisting of the words abc, acb, and c. Note that the prettifcation algorithm uses character classes, e.g., instead of (a|b|c), it writes [a-c].

Implementation notes

Authors and license

The core inference algorithm was implemented by Dominik D. Freydenberger and uses this implementation of Tarjan's Algorithm by Dries Verdegem (which, to our knowledge, is in the public domain). The prettification algorithm is a part of the M.O.D.O.D. library, which was designed (only for DREs) by Dominik D. Freydenberger and implemented by Christoph Burschka. The creation of the M.O.D.O.D. library was generously supported by the program "Nachwuchswissenschaftler/innen im Fokus" (Goethe University). We put this stuff under the MIT License, and the source code is already included.