1

So I have to write a filtering system, which may apply preprocessed ruleset to data - and trigger some actions defined in the ruleset while continuing its evaluation.

<ruleset name="1">
 <rule>
  <conditions>
   <and>
    <or>
     <match image="aceofspades" level="70" />
     <match image="aceofdiamonds" level="66" />
    </or>
    <not>
     <match image="aceofclubs" level="60" />
    </not>
   </and>
  <action name="apply_ruleset" ruleset="2">
 </rule>
 <rule>
  ...
 </rule>
</ruleset>
<ruleset name="2">
</ruleset>

(Good example of what it feels to be alike: sendmail configs; some Apache configs; IPTables ruleset, with multiple tables, JOIN/ACCEPT/DROP, etc).

What bothers me with every current design I can think of:

  • keeping state of and-or nodes
  • being able to jump between main and sub- rulesets easily
  • tracking circular references
  • keeping a vast set of possible conditions
  • runtime speed: parsing/hashing may take time, but each data packet must be passed through dataset as fast as possible.

What should I read/think of to be sure that my parser will be nice enough to work with large sets and don't mess the speed/flexibility too much? (Not to mention iptables code).


Target language is C++ (however would like to implement a similar system in Perl later..), was reading through this - but still not see a good solution for switching rulesets / keeping with boolean logics within rules.

kagali-san
  • 49
  • 11
  • 1
    Isn't the speed of parsing irrelevant? Parse the config once into a data structure that can be executed efficiently (effectively some kind of opcodes), don't literally interpret the config. – amon Feb 27 '14 at 22:34
  • Why do you need to keep state of and/or nodes? – Jonno Mar 06 '14 at 14:41

1 Answers1

2

This is why they used to recommend taking the Artificial Intelligence class.

Back in the 1970s and 1980s, there was a HUMONGOUS amount of work done on Rule-Based Systems, mostly for Expert Systems work. That included huge amounts of work done on rules engines. A fair amount of this work found its way into undergraduate AI textbooks.

Take a look at Winston's AI textbook. Winston & Horn "LISP" has good information. Buchanan and Shortliffe "Rule-Based Expert Systems" should have useful information.

Now, much of what is out there focuses on LISP. There are any number of small LISP and Scheme interpreters that can be tacked onto your C++ code. If you're working in Free Software (as in GPL), GNU GUILE would be a really good place to start. There's also Tiny Scheme, which is BSD license.

And there's CLIPS. From the Wikipedia article: "CLIPS is probably the most widely used expert system tool. CLIPS incorporates a complete object-oriented language COOL for writing expert systems. CLIPS itself is written in C, extensions can be written in C, and CLIPS can be called from C. Its user interface closely resembles that of the programming language Lisp. COOL combines the programming paradigms of procedural, object oriented and logical (theorem proving) languages."

---added later---

Something else you might find interesting. Don Batory did some work 20 years ago, on compiling rule bases into efficient C. It was a reimplementation of some earlier work. The paper is "The LEAPS Algorithms". From the first paragraph of the paper: "OPS5 is a forward-chaining expert system [McD78, For81]. LEAPS (Lazy Evaluation Algorithm for Production Systems) is a state-of-the-art production system compiler for OPS5 rule sets [Mir90].2 Experimental results have shown that LEAPS produces the fastest sequential executables of OPS5 rule sets; execution times can be over two orders of magnitude faster than OPS5 interpreters. This phenomenal speedup is due to the reliance of LEAPS on complex data structures and search algorithms to greatly increase rule firing rates. It is well-known that LEAPS data structures and algorithms are difficult to comprehend; this is due, in part, to the inability of relational database concepts (i.e., relations and select-project-join operators) to capture critical lazy-evaluation features of LEAPS algorithms."

John R. Strohm
  • 18,043
  • 5
  • 46
  • 56
  • /me ain't a big fan of applying large complex theories to small projects; however, thanks for mentioning the Winston's book. – kagali-san Mar 08 '14 at 01:08