Best practices for execution of untrusted code

Question

I have a project where I need to allow users to run arbitrary, untrusted python code (a bit like this) against my server. I'm fairly new to python and I'd like to avoid making any mistakes that introduce security holes or other vulnerabilities into the system. Are there any best-practices available, recommended reading, or other pointers you can give me make my service usable but not abusable?

Here's what I've considered so far:

Remove __builtins__ from the exec context to prohibit use of potentially dangerous packages like os. Users will only be able to use packages I provide to them.
Use threads to enforce a reasonable timeout.
I'd like to limit the total amount of memory that can be allocated within the exec context, but I'm not sure if it's even possible.

There are some alternatives to a straight exec, but I'm not sure which of these would be helpful here:

Using an ast.NodeVisitor to catch any attempt to access unsafe objects. But what objects should I prohibit?
Searching for any double-underscores in the input. (less graceful than the above option).
Using PyPy or something similar to sandbox the code.

NOTE: I'm aware that there is at least one JavaScript-based interpreter. That will not work in my scenario.

Some starting points for study: http://blog.delroth.net/2013/03/escaping-a-python-sandbox-ndh-2013-quals-writeup/, http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html, http://nedbatchelder.com/blog/201302/looking_for_python_3_builtins.html and http://nedbatchelder.com/blog/201302/finding_python_3_builtins.html about breaking out of sandboxes. — Martijn Pieters, Mar 22 '13 at 21:08
@MartijnPieters: Excellent. Probably worthy of an answer, if you summarize each one. — Robert Harvey, Mar 22 '13 at 21:14
Consider also: garbage left on the disk, network (do not let them send spam or whatever), permissions to other files (reading your files). Even eject in while loop can destroy CD mechanics... I would go for virtualization (jails or some kvm you name it) or at least user with almost no privileges. Set reasonable nice and amount of memory to advantage your own programms. — kyticka, Mar 22 '13 at 21:27
["A strange game. The only winning move is not to play."](https://en.wikiquote.org/wiki/WarGames) — Ross Patterson, Sep 08 '13 at 14:43
Try [PyPy](http://pypy.org/): > Sandboxing: PyPy provides the ability to run untrusted code in a fully secure way. — Vorac, Sep 12 '13 at 09:30
You could consider using virtualbox: you could run the python code in an unpriviledged shell in virtual os which itself is restricted. — shaunc, Dec 04 '13 at 07:44
same for Linux no language specified: http://stackoverflow.com/questions/792764/secure-way-to-run-other-people-code-sandbox-on-my-server — Ciro Santilli OurBigBook.com, Aug 05 '14 at 21:45

score 33 · Accepted Answer · edited Jun 16 '20 at 10:01

Python sandboxing is hard. Python is inherently introspectable, at multiple levels.

This also means that you can find the factory methods for specific types from those types themselves, and construct new low-level objects, which will be run directly by the interpreter without limitation.

Here are some examples of finding creative ways to break out of Python sandboxes:

Ned Batchelder starts with a demonstration how dangerous eval() really is; eval() is often used to execute Python expressions; as a primitive and naive sandbox for one-liners.

He then continued to try and apply the same principles to Python 3, eventually succeeding to break out with some helpful pointers.
Pierre Bourdon uses similar techniques to hack a python system at a hack-a-thon

The basic idea is always to find a way to create base Python types; functions and classes and break out of the shell by getting the Python interpreter to execute arbitrary (unchecked!) bytecode.

The same and more applies to the exec statement (exec() function in Python 3).

So, you want to:

Strictly control the byte compilation of the Python code, or at least post-process the bytecode to remove any access to names starting with underscores.

This requires intimate knowledge of how the Python interpreter works and how Python bytecode is structured. Code objects are nested; a module's bytecode only covers the top level of statements, each function and class consists of their own bytecode sequence plus metadata, containing other bytecode objects for nested functions and classes, for example.
You need to whitelist modules that can be used. Carefully.

A python module contains references to other modules. If you import os, there is a local name os in your module namespace that refers to the os module. This can lead a determined attacker to modules that can help them break out of the sandbox. The pickle module, for example, lets you load arbitrary code objects for example, so if any path through whitelisted modules leads to the pickle module, you have a problem still.
You need to strictly limit the time quotas. Even the most neutered code can still attempt to run forever, tying up your resources.

Take a look at RestrictedPython, which attempts to give you the strict bytecode control. RestrictedPython transforms Python code into something that lets you control what names, modules and objects are permissible in Python 2.3 through to 2.7.

If RestrictedPython is secure enough for your purposes does depend on the policies you implement. Not allowing access to names starting with an underscore and strictly whitelisting the modules would be a start.

In my opinion, the only truly robust option is to use a separate Virtual Machine, one with no network access to the outside world which you destroy after each run. Each new script is given a fresh VM instead. That way even if the code manages to break out of your Python sandbox (which is not unlikely) all the attacker gets access to is short-lived and without value.

dietbuddha · Answer 2 · 2014-04-12T21:16:05.513

TL;DR Use a chroot/jail and run as a custom user without any privileges.

The best practice for executing untrusted code is to segregate it via a system sandbox. For the most security:

create a container with only Python and it's dependencies and the container's dependencies
create a container without all devices that aren't absolutely necessary (ie. network and storage)
create a container with restrictions on memory and process usage
recreate the container with every run (or at the very least with each unique user and max time period)
run as a user with the least privilege needed
run as a user that doesn't have the permissions to write files

You also follow standard practices for running things securely in a chroot. You can rebuild the chroot's filesystem with each call as well is particularly paranoid. Typically you just make user unable to make modifications to the filesystem the chroot runs in.

This is the only thing where you are going to be even remotely sure you've got it right - give it it's own process. — Michael Kohne, Sep 08 '13 at 18:16

score 3 · Answer 3 · answered Mar 22 '13 at 21:29

3

There is no way you can do this safely.

If you wanted to do something like this safely, you'd have to start by having your own implementation of python which runs in a completely controlled environment, preferably runs in the users' browser instead of on your system. You might start with Jython (python for java) and package it as a java applet. Since it would be running in the java sandbox, on the user's machine, your system would be reasonably safe.

answered Mar 22 '13 at 21:29

ddyer

4,060
15
18

4

The question of safety was for his server, not for the client's machine. Java's potential security hazards, like those for any other web technology, are that the server could be used to deploy programs hazardous to the client. – ddyer Mar 25 '13 at 00:48
1

@grasGendarme much like new storys about plane crashes actually tell you a lot about how rare those are; stories about java security holes tell you that java is comparatively secure. You'd never get such a story about C because the response you'd get would be "well duh; if you run it it'll do whatever it wants" – Richard Tingle Sep 08 '13 at 22:10

score 0 · Answer 4 · answered Jun 09 '16 at 13:58

0

As long as performance isn't massively important to you you could always run it in Brython which effectively puts it in the JavaScript sandbox

answered Jun 09 '16 at 13:58

Big Ian

1

Best practices for execution of untrusted code

4 Answers4