8

Note: Yes, I know that storing functions in databases should be punishable by law.

We are developing a financial web application using PostgreSQL, Sinatra and AngularJS. As you may have guessed, a financial application may need some calculations.

There is a certain database model (called 'Hypothesis') that has a certain function that needs to be executed overnight. The problem is that this function is different for each instance of the model. That being said, whenever a new hypothesis is added a corresponding function also needs to be added.

Because of the nature of financial applications, the function can either do simple arithmetic operations or integrate over an area. This leaves us with the following options:

  1. Implement a DSL (domain-specific language): This would be ideal, but adding branching and looping logic to a DSL seems more like creating a new programming language.

  2. STDIO pipe: Simply allow the execution of a program in a sandbox on a server. This is most flexible, but the gods of software security would not be merciful. Docker I guess?

  3. Pure sandboxed ruby code: Create a sandbox that allows only functionally pure Ruby code to be executed in it. Then simply evaluate the function inside the sandbox whenever required. This would be a security flaw as well, but not half as much as options 2. For now this option seems to be implementable using trusted-sandbox and pure.

  4. Use the octave-ruby gem: Beautiful idea. But it seems that nobody cares much about the project and we will probably need to fork it and work on it. This seem like the best option, since writing complex math expression is trivial in Octave.

  5. ???

  6. Profit

Another concern would be implementing the ability to test the functions with mock data in the admin panel, but this is not mandatory.

Can somebody suggest a better/more flexible/more secure option? Or at least one that doesn't store a function string in the database? I hope that I have explained the problem well enough.

Ianis Vasilev
  • 107
  • 1
  • 1
  • 4
  • The use of functions in DB is one of those holy wars that will run and run. Yes, there can be a hit on the database if they're not used correctly but they can make the code far more concise and easier to read. – Robbie Dee Feb 18 '15 at 12:43
  • I'm far from being a security freak, but making good software design decisions is very important (at least for me) – Ianis Vasilev Feb 18 '15 at 13:00
  • Could you implement the function in PgSQL and run it as a DB task? – glenatron Feb 18 '15 at 14:07
  • The idea is NOT to limit the possibilities (implementing numerical integration in PgSQL is simply too much of a pain). – Ianis Vasilev Feb 18 '15 at 14:39
  • How do those functions get created and enter your system? Who decides what the functions should compute? How do you asses their fitness for purpose? Do they need versioning? Do they have to run overnight for business reasons or would instant results be preferred? How do you handle functions that don't terminate? What is the data size of the inputs and outputs? Is the structure of the input and/or output identical for all functions? – Patrick Sep 04 '15 at 18:27

4 Answers4

2

I don't know how different the functions you want to store are, but if you can isolate few different "types of functions" you could do something like this :

  • Create several generic methods in your model which can make every special calculus needed with the help of some arguments :

    def integrate_over_an_area(args={})
      #Do your calculus with your args.
    end
    
    def simple_arithmetics(args={})
      #Do your calculus with your args.
    end
    
  • Create - for each model instance - a hash describing which generic function to use and the parameters :

    { 
     function: "integrate_over_an_area",
     args: {
       arg1: "value",
       arg2: "value",
       ...
      }
    }
    
  • Store the hashes in a column named "special_function_hash" or something like this, in your model's table

  • Create a "special_function" method in your model which calls the function described in the hash, with the arguments described in the hash
Caillou
  • 256
  • 1
  • 7
  • This is actually the path we have chosen. While I appreciate the answer, I would prefer a more elegant approach in order to accept it. – Ianis Vasilev Sep 04 '15 at 17:18
  • I think that if you want something more elegant, you don't have a choice and have to sandbox the execution of a stored function, as you wrote in your post. Maybe it's the most elegant way to do it. I'll follow this post with great interest, if someone comes up with something better I'm interested. – Caillou Sep 06 '15 at 09:30
1

Let's note that how the code is stored is inconsequential; how it is executed is essential.

Most probably your functions have a limited repertoire. Implement them properly, without storing in the DB. (I leave alone the question if a numeric integration implemented in Ruby a good idea. Let's assume you use something like Numpy of the Ruby world.)

Now you can have proper code revisions, tests, etc.

Then you can store a very limited form of a program in your database: which functions to call with which field values. The choice is limited by your blessed set of functions.

What if you need to store simple but unpredictable formulas?

I'd limit this feature to formulas that do simple arithmetics with no side effects, no loops, and no function definitions. It's pretty easy to write a parser for the basic arithmetic expressions with parentheses; most probably ready-made gems exist.

Then I'd store these expressions in the DB as a text. I'd only let in the expressions that can be parsed (no code constructs beyond the tiny allowed subset), and only execute the expressions that again can be parsed (so that putting exploit code directly into the DB won't work). This is apparently what you call the DSL approach.

If the above is not enough in your case, I'd pick a well-known embeddable language that can be stripped of unneeded parts and only allow functions you say it should allow. Lua is an obvious choice, Python is a slightly less obvious choice. Be prepared: it would take quite some work.

Docker could be a simpler approach, though: you can hide anything not strictly needed, like network and disk access, from the process.

But learn first if you are going to need these complexities. Chances are high that you don't.

9000
  • 24,162
  • 4
  • 51
  • 79
  • You basically repeated everything I said in the question and added a suggestion to have a limited repertoire of functions (which is actually the approach we have taken). While I appreciate the answer, I would prefer a more elegant approach in order to accept it. – Ianis Vasilev Sep 04 '15 at 17:25
  • 1
    Unfortunately I don't know about a nice, elegant solution to your problem: sandboxing code is always a set of trade-offs. I don't exactly understand your requirements and limitations you can and cannot impose on your code. What I tried to do is to order possible approaches from easiest (and hopefully more likely) to hardest (and hopefully not required). – 9000 Sep 04 '15 at 18:36
0

You could embed an interpreter (like Lua or GNU guile) into the program using the database.

(I am not sure that coding your entire software in Ruby is best; coding it in C++ might be worthwhile)

You would either store in the database expressions (that is string containing Guile or Lua source) in Lua or Guile, or you might store some bytecode or some serialized closures.

Basile Starynkevitch
  • 32,434
  • 6
  • 84
  • 125
0

I would use Jupyter as my backend execution engine & you then can write your functions in several languages e.g. Julia, python or octave. https://github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages

Bon Ami
  • 317
  • 2
  • 7