Is it a good idea to call shell commands from within C?

Question

There's a unix shell command (udevadm info -q path -n /dev/ttyUSB2) that I want to call from a C program. With probably about a week of struggle, I could re-implement it myself, but I don't want to do that.

Is it widely accepted good practice for me to just call popen("my_command", "r");, or will that introduce unacceptable security problems and forwards compatibility issues? It feels wrong to me to do something like this, but I can't put my finger on why it would be bad.

I'm only calling on a string constant. How would that happen? Doesn't the MMU detect tampering with read-only memory? — John M, Jun 19 '17 at 15:27
I was mainly thinking of the case where you supplied user input as shell command arguments. — MetaFight, Jun 19 '17 at 15:28
woooahhh no that sounds like a bad idea. idk if I would ever do that. — John M, Jun 19 '17 at 15:29
Could someone put a malicious `udevadm` executable in the same directory as your program and use it to escalate privileges? I don't know much about unix security, but will your program be run with `setuid root`? — Andrew, Jun 19 '17 at 18:01
@AndrewPiliser Great point. they totally could, but I don't suspect that it will be run as root. Anyways, this software is not going to be widely used and that's an awfully specific attack. — John M, Jun 19 '17 at 18:08
@AndrewPiliser If the current directory isn't on the `PATH`, then no - it would have to be run with `./udevadm`. IIRC Windows would run same-directory executables, though. — Izkata, Jun 19 '17 at 18:57
@Izkata Right, which is why `Program Files` in Windows is protected. — Andrew, Jun 19 '17 at 19:12
Whenever possible invoke the desired program directly without going through the shell. — CodesInChaos, Jun 20 '17 at 08:02
`popen` invokes a shell. If running with different privileges than the invoking user, it's better to use `pipe` + `fork` + `execve`. Cf: shellshock. — ninjalj, Jun 20 '17 at 09:45
@CodesInChaos, a hundred times, **no!** If you need to start a program from within your program, use a shell script, that is what they are used for. Starting programs directly will mess up your `PATH` and all your other environment variables. — sleblanc, Jun 21 '17 at 03:11
@sleblanc: a hundred times, **yes!**, at least when running with privileges on behalf of a less privileged user. A shell can be subverted through many environment variables: IFS, PATH, SHELLOPTS=xtrace + PS4, any variable before shellshock was fixed, ... — ninjalj, Jun 21 '17 at 09:34
@ninjalj, now, if your shell is vulnerable, that is another can of worms. But I would rather a program use the `gs` executable that is found in my `PATH` rather than it being hardcoded to `/usr/bin/gs`. This is the Unix philosophy. An attacker with sufficient privileges will be able to use other means, such as replacing the hard-coded path or wrapping the `exec*` calls with a compromised version. — sleblanc, Jun 22 '17 at 23:06
An attacker who can wrap the `exec*` calls or replace a hard-coded path has already all the privileges. You should never depend on an unsanitized `PATH` on a program running with different privileges than the invoking user. `exec*p` on such a program implies that you need to sanitize `PATH`, preferably setting it yourself to a sensible value. `system` and `popen` should be avoided in such a program. — ninjalj, Jun 23 '17 at 10:02

Brian Agnew · Accepted Answer · 2017-06-20T09:22:20.717

59

It's not particularly bad, but there are some caveats.

how portable will your solution be? Will your chosen binary operate the same everywhere, output the results in the same format etc.? Will it output differently on settings of LANG etc.?
how much extra load does this add on your process? Forking a binary results in a lot more load and requires more resources than executing library calls (generally speaking). Is this acceptable in your scenario?
Are there security issues? Can someone substitute your chosen binary with another, and perform nefarious deeds thereafter? Do you use user-supplied args for your binary, and could they provide ;rm -rf / (for example) (note that some APIs will allow you to specify args more securely than just providing them on the command line)

I'm generally happy executing binaries when I'm in a known environment that I can predict, when the binary output is easy to parse (if required - you may just need an exit code) and I don't need to do it too often.

As you've noted, the other issue is how much work is it to replicate what the binary does? Does it use a library you can also leverage off?

edited Jun 20 '17 at 09:22

answered Jun 19 '17 at 15:05

Brian Agnew

4,676
24
19

15

`Forking a binary results in a lot more load and requires more resources than executing library calls (generally speaking).` If this were on Windows, I would agree. However, the OP specifies Unix where forking is low enough cost that it is hard to say whether dynamically loading a library is worse than forking. – wallyk Jun 20 '17 at 00:25
WRT "how portable" I would add specifically that output in different languages might scupper your assumptions on output format. – Muzer Jun 20 '17 at 09:02
1

I'd normally expect a library to be loaded *once*, whereas the binary may be executed multiple times. As ever, it's dependent upon usage scenarios, but needs to be considered (if subsequently dismissed) – Brian Agnew Jun 20 '17 at 09:23
3

There is no "forking" on Windows, so the quotation *has* to be Unix-specific. – Cody Gray - on strike Jun 20 '17 at 10:03
5

NT kernel does support forking, although it's not exposed through Win32. Anyway, I would believe the cost of running an external program would come more from the `exec` than from the `fork`. Loading sections, linking in shared objects, running init code for each. And then having to convert all data into text to be parsed on the other side, both for inputs and outputs. – spectras Jun 20 '17 at 10:37
8

To be fair, `udevadm info -q path -n /dev/ttyUSB2` isn't going to work on Windows anyway so the whole forking discussion is a bit theoretic. – MSalters Jun 20 '17 at 11:19
Despite his specific example, I've tried to address the question in a reasonably generic fashion – Brian Agnew Jun 20 '17 at 11:20
On the topic of forking a new process, it's also noteworthy that there's a variety of things that you can in fact do faster with command line utilities than other options. These tools tend to be highly optimized for what they are meant for. I'm also reminded of this popular article that I've seen linked many times: https://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html. It's a bit on a different vein, but has interesting points anyway. – Kat Jun 20 '17 at 18:52
@CodyGray: There *is* subprocess creation on Windows, but it's much easier to do the equivalent of a `fork` + `exec` combination than a `fork` by itself. – dan04 Jun 20 '17 at 23:11
1

I don't understand what your comment means, @dan04. Yes, as has been pointed out, the NT kernel supports forking, exposed in the subsystem for Unix, but that's not part of *Windows*, which everyone understands to be the Win32 layer running on top of the NT kernel. In Win32, there is literally no equivalent of `fork`. *At best*, you can do a non-COW simulation of `fork`, which is what Cygwin does on Windows, but that loses all the advantages of `fork`, and basically becomes, as you said, `fork`+`exec`. The purpose is solely Unix-compatibility. So, what is this "subprocess creation"? – Cody Gray - on strike Jun 21 '17 at 02:40
re: security: if you use an API that doesn't use a shell, but instead uses `pipe() + fork() + execve()` directly, you have a LOT more to worry about as far as shell metacharacters. You still have to worry about `$PATH` if you directly or indirectly use `udevadm` with `execlp(3)` instead of `/usr/sbin/udevadm` with `execve(2)`, though. And of course having your args as a `static const char *argv[]` instead of a string that needs to be parsed removes the possibility of problems there. But if the string is a compile-time constant, then it's fine (and a flat string is slightly more compact). – Peter Cordes Jun 21 '17 at 03:04
TL:DR: `popen` is convenient, but it introduces so many layers of parsing the command (including the shell) that it's pretty horrible. Unfortunately there isn't a libc equivalent that bypasses the shell and takes a `char *argv[]` array instead of a string, so you'd have write one yourself. – Peter Cordes Jun 21 '17 at 03:13

score 37 · Answer 2 · answered Jun 19 '17 at 23:31

37

It takes extreme care to guard against injection vulnerabilities once you've introduced a potential vector. It's in the forefront of your mind now, but later you may need the ability to select ttyUSB0-3, then that list will be used in other places so it will get factored out to follow the single responsibility principle, then a customer will have a requirement to put an arbitrary device in the list, and the developer making that change will have no idea that the list eventually gets used in an unsafe way.

In other words, code as if the most careless developer you know is making an unsafe change in a seemingly unrelated part of the code.

Second, the output of CLI tools aren't generally considered to be stable interfaces unless the documentation specifically marks them as such. You might be okay counting on them for a script you run that you can troubleshoot yourself, but not for something you deploy to a customer.

Third, if you want an easy way to extract a value from your software, chances are someone else wants it too. Look for a library that already does what you want. libudev was already installed on my system:

#include <libudev.h>
#include <sys/stat.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
    struct stat statbuf;

    if (stat("dev/ttyUSB2", &statbuf) < 0)
        return -1;
    struct udev* udev = udev_new();
    struct udev_device *dev = udev_device_new_from_devnum(udev, 'c', statbuf.st_rdev);

    printf("%s\n", udev_device_get_devpath(dev));

    udev_device_unref(dev);
    udev_unref(udev);
    return 0;
}

There is other useful functionality in that library. My guess is if you needed this, some of the related functions might come in handy too.

answered Jun 19 '17 at 23:31

Karl Bielefeldt

146,727
38
279
479

2

THANK YOU. I spent so long looking for libudev. I just couldn't find it! – John M Jun 20 '17 at 00:18
2

I don't see how "injection vulnerabilities" are an issue here unless OP's program is invoked when another user has control over its environment, which is primarily the case in the event of suid (also possibly, but to a lesser extent, things like cgi and ssh forced-command). There's no reason normal programs need to be hardened against the user they're running as. – R.. GitHub STOP HELPING ICE Jun 20 '17 at 02:37
2

@R.. : The "injection vulnerability" is when you generalize the `ttyUSB2` and use `sprintf(buf, "udevadm info -q path -n /dev/%s", argv[1])`. Now that appears fairly safe, but what if `argv[1]` is `"nul || echo OOPS"` ? – MSalters Jun 20 '17 at 11:23
argv[] is provided by the invoking user. Also I see no indication that OP's code constructs the popen command string from argv[] or any input; as written it's a string literal. – R.. GitHub STOP HELPING ICE Jun 20 '17 at 11:35
3

@R..: you just need more imagination. First it's a fixed string, then it's an argument provided by the user, then it's an argument provided by some other program run by the user but that they don't understand, then it's an argument which their browser has extracted off a web page. If things keep going "downhill", then eventually someone has to take responsibility for sanitising the input, and Karl is saying that it takes extreme care to identify that point, wherever it might come. – Steve Jessop Jun 20 '17 at 13:42
6

Although if the precautionary principle really was "assume the most careless developer you know is making an unsafe change", and I had to write code now to guarantee that cannot result in an exploit, then personally I couldn't get out of bed. The most careless developers I know make Machievelli look like he isn't even *trying*. – Steve Jessop Jun 20 '17 at 13:43
Even though your answer solved my underlying problem, Karl, I think should accept Brian Agnew's answer, as his answered the question I asked. Thank you very much for your help though. – John M Jun 20 '17 at 17:34
1

@johnny_boy: This answer also has some key points, e.g. that parsing the output of shell commands is not guaranteed to be a stable API, so future versions of the tools you invoke could produce different output that confuses your code. Or their cmdline args or defaults could change. – Peter Cordes Jun 21 '17 at 03:08

score 16 · Answer 3 · answered Jun 19 '17 at 19:54

16

In your specific case, where you want to invoke udevadm, I'd suspect you could pull in udev as a library and make the appropriate function calls as an alternative?

e.g., you could take a look at what udevadm itself is doing when invoke in "info" mode: https://github.com/gentoo/eudev/blob/master/src/udev/udevadm-info.c and make equiv calls as to those udevadm is making.

This would avoid a lot of the downside-tradeoffs enumerated in Brian Agnew's excellent answer -- e.g., not relying on the binary existing at a certain path, avoiding the expense of forking, etc.

answered Jun 19 '17 at 19:54

Adam Krouskop

169
2

Thanks, I did find that exact repo, and I ended up looking through it for a bit. – John M Jun 20 '17 at 00:32
1

…and libudev is not particularly hard to use. Its error handling is a bit lacking, but enumerating, querying and monitoring APIs are pretty straightforward. The struggle your fear might turn up being less than 2 hours if you give it a try. – spectras Jun 20 '17 at 10:40

Omnifarious · Answer 4 · 2017-06-20T16:22:28.993

Your question seemed to call for a forest answer, and the answers here seem like tree answers, so I thought I'd give you a forest answer.

This is very rarely how C programs are written. It is always how shell scripts are written, and sometimes how Python, perl or Ruby programs are written.

People typically write in C for easy use of system libraries and direct low-level access to OS system calls as well as for speed. And C is a difficult language to write in, so if people don't need those things, then they don't use C. Also C programs are typically expected to only have dependencies on shared libraries and configuration files.

Shelling out to a sub-process isn't particularly fast, and it doesn't require fine-grained and controlled access to low-level system facilities, and it introduces a possibly surprising dependency on an external executable, so it is uncommon to see in C programs.

There are some additional concerns. The security and portability concerns people mention are completely valid. They are equally valid for shell scripts of course, but people are expecting those kinds of issues in shell scripts. But C programs are not typically expected to have this class of security concern, which makes it more dangerous.

But, in my opinion, the biggest concerns have to do with the way popen will interact with the rest of your program. popen has to create a child process, read its output and collect its exit status. In the meantime, that process' stderr will be connected to the same stderr as your program, which may cause confusing output, and its stdin will be the same as your program, which might cause other interesting issues. You can solve that by including </dev/null 2>/dev/null in the string you pass to popen since it's interpreted by the shell.

And popen creates a child process. If you do anything with signal handling or forking processes yourself you may end up getting odd SIGCHLD signals. Your calls to wait may interact oddly with popen and possibly create strange race conditions.

The security and portability concerns are there of course. As they are for shell scripts or anything that starts up other executables on the system. And you have to be careful that people using your program aren't able to get shell meta-charcaters into the string you pass into popen because that string is given directly to sh with sh -c <string from popen as a single argument>.

But I do not think they are why it is strange to see a C program using popen. The reason it is strange is because C is typically a low level language, and popen is not low level. And because use of popen places design constraints on your program because it will interact strangely with your program's standard input and output and make it a pain to do your own process management or signal handling. And because C programs are not typically expected to have dependencies on external executables.

Dave · Answer 5 · 2017-06-20T13:51:53.853

0

Your program may be subject to hacking etc. One way to protect yourself from this type of activity is to create a copy of your current machine envronment and run your program using a system called chroot.

From viewpoint of your program its executing in a normal envronment, from a security aspect, if somebody breaks your program it only has access to the elements you provided when you made the copy.

Such a setup is called a chroot jail for more details see chroot jail.

Its normally used for setting up publicly accessible servers etc.

edited Jun 20 '17 at 13:51

answered Jun 20 '17 at 10:54

Dave

127
2

3

The OP is writing a program for other people to run, not playing with un-trusted binaries or source on his own system. – Peter Cordes Jun 21 '17 at 03:14

Is it a good idea to call shell commands from within C?

5 Answers5