Compiling Shell Scripts

The last few weeks I’ve been playing with the BeagleBone — an inexpensive little Linux box aimed at embedded systems. Naturally, you can program it as you would any similar board using C, C++, or a variety of other Linux languages. However, I’ve been probing into how much use the shell (especially bash) might be for some simple embedded tasks.

Although many of the tasks I’ve tackled have been pretty straightforward, bash is actually quite powerful and makes for quick prototyping and even development (yes, there is a bash debugger available). There is at least one objection to using script, though. Since the program is just a text file and it is fairly easy to understand, it might be tempting for users to modify the program with good intentions or bad.

I had this exact problem a number of years ago on a UNIX system. A very complex build system would break and the culprit would be some developer who had to “fix” something in one of the scripts. After a few late nights figuring that sort of thing out, I decided I had to do something.

Ideally, what I wanted was a shell script compiler. That’s a pretty big task, though. You have to track changes in the shell — lots of extra work if you want to support more than one shell or (as I did) more than one platform. I decided to take a shortcut, and a program I called scbind was born.

The idea behind scbind is simple. At a very high level it takes a script as input and outputs an executable file. Just like a compiler. However, it isn’t really a compiler. Instead, it simply embeds an encrypted version of the script in the executable. The executable launches the interpreter and feeds it a named pipe (something I talked about in a different way last time). As you might guess, the executable then decodes the script and feeds it to the interpreter via the pipe.

The interpreter doesn’t have to be the shell. Almost anything that you can use with the #! syntax will work. The executable, then, is really just boilerplate to do the decoding. The encoded text is simply a file with a C-language array in it. A shell script (what else?) does the work of calling the system C compiler to generate the executable, so there’s no need to worry about the back end of the “compilation.”

If you download scbind, you’ll see that the driver shell script normally discards the C source code after the compiler completes. But the -s option will save that file. You can probably install the whole thing on the BeagleBone, but you can also let the compiler give you the C file, move that to the BeagleBone, and compile it there.

The encoder/decoder is split out of the main file so you can use your own scheme. In my case, I wasn’t worried about high security. I just wanted to keep casual hackers out. My encoder is simple:

#if HAVE_MEMFROB != 1
void *memfrob(char *s, size_t len)
{
  char *s0=s;
  while (len--) {
    *s=*s^0x2A;
    s++;
  }
  return s0;
}

#endif


char *encode(char *s, size_t len, size_t *lenout)
{
  if (lenout) *lenout=len;
  memfrob(s,len);
  return s;
}

If you haven’t run into memfrob() before, it simply XORs a string with the number 42 (which, I am sure, is a nod to The Hitchhiker’s Guide to the Galaxy). Running a string through memfrob once produces seeming gibberish. Running that gibberish back through produces the original string. The examples directory shows a high-low game “compiled” from a shell script, along with examples with awk and Perl.

Of course, you don’t get the other benefits of compilation. The code doesn’t stand alone, nor will it perform better (in fact, I’d expect it to perform slightly worse). But it isn’t much different from running a script and it will keep casual, prying eyes out of your system.