22

Problem: Run commands in the form of a string.

  • command example:

    /user/files/ list all; equivalent to: /user/files/ ls -la;

  • another one:

    post tw fb "HOW DO YOU STOP THE TICKLE MONSTER?;"

equivalent to: post -tf "HOW DO YOU STOP THE TICKLE MONSTER?;"

Current solution:

tokenize string(string, array);

switch(first item in array) {
    case "command":
        if ( argument1 > stuff) {
           // do the actual work;
        }
}

The problems I see in this solution are:

  • No error checking other than nested ifs-else inside each case. The script becomes very big and hard to mantain.
  • Commands and responses are hardcoded.
  • No way of knowing if flags are correct or missing parameters.
  • Lack of intelligence to suggest "you might wanted to run $command".

And the last thing I can't address is synonyms in different encodings, example:

case command:
case command_in_hebrew:
    do stuff;
break;

The last one might be trivial, but well, what I want to see is the solid fundations of this kind of program.

I'm currently programming this in PHP but might do it in PERL.

alfa64
  • 413
  • 1
  • 4
  • 14
  • I don't see at all how this relates specifically to PHP. There are a lot of threads on this interpreter/compiler-topic on SO and SE already. – Raffael Dec 21 '11 at 10:28
  • 3
    Nobody mentioned getopt? – Anton Barkovsky Dec 21 '11 at 11:29
  • @AntonBarkovsky: I did. See my links. I think answers like Ubermensch's are just serious over-complicated for what the OP is trying to do. – quentin-starin Dec 22 '11 at 00:31
  • 1
    I have also quoted a simple approach using RegExp. Answer is also updated – Ubermensch Dec 22 '11 at 04:19
  • Didn't mention any specific progr. lang. you could add a "c" tag, "ruby" tag, "php" tag, maybe there is a opensource lib., standard lib., or "commonly used, not yet a standard lib." for your progr. lang. – umlcat Dec 22 '11 at 23:37
  • sorry, it must got lost when it got moved from stackoverflow – alfa64 Dec 23 '11 at 18:43
  • If you haven't looked at the php lib I linked to in my answer - GetOptionKit - you're missing an easy, simple, already done way to parse a fair range including required or optional or multiple arguments with long & short forms and string or integer constraints. – quentin-starin Dec 24 '11 at 01:22

12 Answers12

14

Let me admit frankly, building parser is a tedious job and comes close to compiler technology but building one would turn out to be a good adventure. And a parser comes with interpreter. So you got to build both.

A quick introduction to parser and interpreters

This is not too technical. So experts don't fret at me.

When you feed some input into a terminal, the terminal splits the input into multiple units. The input is called expression and the multiple units are called tokens. These tokens can be operators or symbols. So if you enter 4+5 in a calculator, this expression gets split into three tokens 4,+,5. The plus is considered an operator while 4 and 5 symbols. This is passed to a program (consider this as an interpreter) which contains the definition for the operators. Based on the definition (in our case, add), it adds the two symbols and returns the result to the terminal. All compilers are based on this technology. The program that splits an expression into multiple tokens is called a lexer and the program that converts these tokens into tags for further processing and execution is called parser.

Lex and Yacc are the canonical forms for building lexers and parsers based on BNF grammar under C and it is the recommended option. Most parsers are a clone of Lex and Yacc.

Steps in building a parser/intrepreter

  1. Classify your tokens into symbols, operators and keywords (keywords are operators)
  2. Build your grammar using the BNF form
  3. Write parser functions for your operations
  4. Compile it an run as a program

So in the above case your of addition tokens would be any digits and a plus sign with definition of what to do with the plus sign in the lexer

Notes and Tips

  • Choose a parser technique that evaluates from left to right LALR
  • Read this dragon book on Compilers to get a feel of it. I personally haven't finished the book
  • This link would give a super-fast insight into Lex and Yacc under Python

A simple approach

If you just need a simple parsing mechanism with limited functions, turn your requirement into a Regular Expression and just create a whole bunch of functions. To illustrate, assume a simple parser for the four arithmetic functions. So you would be the calling the operator first and then the list of functions (similar to lisp) in the style (+ 4 5) or (add [4,5]) then you could use a simple RegExp to get the list of operators and the symbols to be operated upon.

Most common cases could be easily solved by this approach. The downside is you can't have a lot of nested expressions with a clear syntax and you can't have easy higher order functions.

yannis
  • 39,547
  • 40
  • 183
  • 216
Ubermensch
  • 1,349
  • 9
  • 15
  • 2
    This is one of the hardest possible ways. Separating lexing and parsing passes, etc. - it is probably useful for implementing a high performance parser for a very complex but archaic language. In the modern world lexerless parsing is a simplest default option. Parsing combinators or eDSLs are easier to use than dedicated preprocessors like Yacc. – SK-logic Dec 21 '11 at 14:43
  • Agreed with SK-logic but since a general detailed answer is required, I suggested Lex and Yacc and some parser basics. getopts suggested by Anton is also a simpler option. – Ubermensch Dec 21 '11 at 15:44
  • that's what I've said - lex and yacc is among the hardest ways of parsing, and not even generic enough. Lexerless parsing (e.g., packrat, or simple Parsec-like) is much simpler for a general case. And the Dragon book is not a very useful introduction into parsing any more - it is too out of date. – SK-logic Dec 21 '11 at 15:54
  • @SK-logic Can you recommend a better updated book. It seem to cover all the basics for a person trying to understand parsing (at least in my perception). Regarding lex and yacc, though hard, it is widely used and a lot of programming languages provide its implementation. – Ubermensch Dec 21 '11 at 16:03
  • it was never intended to "cover all the basics". It is a way too advanced. There's no need to read thick boring books, tutorials coming with the parsing libraries are more than enough. And despite being in mainstream for ages, nowadays `yacc` is redundant and it is counterproductive to show it to the beginners. – SK-logic Dec 21 '11 at 16:16
  • @SK-logic, yes, reading the dragon book in order to get a grip on how to parse command line args is an excellent example of what is known as "Yak Shaving". One could literally fly to Afghanistan, find a yak, shave it, and fly back all in less time than it would take to figure out cmd-line parsing "the right way". – Angelo Dec 21 '11 at 18:39
  • @SK-logic have you read the [wikipedia article on PEGs](http://en.wikipedia.org/wiki/Parsing_expression_grammar)? There are times when it's inappropriate; I note the entry on subtle parsing errors in the disadvantages section. – Spencer Rathbun Dec 21 '11 at 21:49
  • The suggestion for Lex and Yacc was due to this last request "The last one might be trivial, but well, what I want to see is the solid fundations of this kind of program" and a simpler approach using RegExp is also quoted in notes. – Ubermensch Dec 22 '11 at 04:23
  • @Spencer Rathbun, I've implemented PEG grammars for dozens of languages - and I've never bumped into any of those problems. It is so much better than trying to resolve all those shift-reduce conflicts. It would be inappropriate to parse, say, XML with a PEG (due to performance reasons). Anything else is ok. – SK-logic Dec 22 '11 at 07:59
  • I thought lexers tokenized and parsers constructed ASTs? – DeadMG Dec 24 '11 at 00:26
  • @DeadMG You are right. Answer updated – Ubermensch Dec 24 '11 at 04:14
  • @Ubermensch Where's the regexp quote from? – yannis Dec 24 '11 at 04:18
  • @YannisRizos. The quote is not from a book. Its just a suggestion to parse simple expressions – Ubermensch Dec 24 '11 at 08:40
  • @Ubermensch Then perhaps you shouldn't format is as a quote? – yannis Dec 24 '11 at 08:43
  • 1
    @alfa64: be sure to let us know then when you actually code a solution based on this answer – quentin-starin Dec 29 '11 at 20:09
6

First, when it comes to grammar, or how to specify arguments, don't invent your own. The GNU-style standard is already very popular and well known.

Second, since you're using an accepted standard, don't reinvent the wheel. Use an existing library to do it for you. If you use GNU style arguments, there is almost certainly a mature library in your language of choice already. For example: c#, php, c.

A good option parsing library will even print formatted help on available options for you.

EDIT 12/27

It seems like you are making this out to be more complicated than it is.

When you look at a command line, it's really quite simple. It's just options and arguments to those options. There are very few complicating issues. Option can have aliases. Arguments can be lists of arguments.

One problem with your question is that you haven't really specified any rules for what type of command line you'd like to deal with. I've suggested GNU standard, and your examples come close to that (though I don't really understand your first example with the path as the first item?).

If we're talking GNU, any single option can have only a long form and short form (single character) as aliases. Any arguments containing a space have to be surrounded in quotes. Multiple short form options can be chained. Short form option(s) must be proceeded by a single dash, long form by two dashes. Only the last of chained short form options can have an argument.

All very straightforward. All very common. Also been implemented in every language you can find, probably five times over.

Don't write it. Use what's already written.

Unless you have something in mind other than standard command line arguments, just use one of the MANY already existing, tested libraries that do this.

What's the complication?

quentin-starin
  • 5,800
  • 27
  • 26
4

Have you already tried something like http://qntm.org/loco? This approach is much cleaner than any handwritten ad hoc, but won't require a standalone code generation tool like Lemon.

EDIT: And a general trick for handling command lines with complex syntax is to combine the arguments back into a single whitespace-separated string and then parse it properly as if it is an expression of some domain-specific language.

SK-logic
  • 8,497
  • 4
  • 25
  • 37
  • +1 nice link, I wonder if it's available on github or something else. And what about the usage terms? – hakre Dec 19 '11 at 10:18
2

If your needs are simple, and you both have the time and are interested in it, I'll go against the grain here and say dont shy away from writing your own parser. Its a good learning experience, if nothing else. If you have more complex requirements - nested function calls, arrays, etc - just be aware that doing so could take a good chunk of time. One of the big positives of rolling your own is that there wont be an issue of integrating with your system. The downside is, of course, all the screw ups are your fault.

Work against tokens, though, dont use hard coded commands. Then that problem with similar sounding commands goes away.

Everyone always recommends the dragon book, but I've always found "Writing Compilers and Interpreters" by Ronald Mak to be a better intro.

GrandmasterB
  • 37,990
  • 7
  • 78
  • 131
1

You have not given many specifics about your grammar, just some examples. What I can see is that there are some strings, whitespace and a (probably, your example is indifferent in your question) double quoted string and then one ";" at the end.

It looks like that this could be similar to PHP syntax. If so, PHP comes with a parser, you can re-use and then validate more concretely. Finally you need to deal with the tokens, but it looks like that this is simply from left to right so actually just an iteration over all tokens.

Some examples to re-use the PHP token parser (token_get_all) are given in the answers to the following questions:

Both examples contain a simple parser as well, probably something like those is fitting for your scenario.

hakre
  • 1,165
  • 12
  • 17
0

I have written programs that work like that. One was an IRC bot which has similar command syntax. There is a huge file that is a big switch statement. It works -- it works fast -- but it's somewhat difficult to maintain.

Another option, which has a more OOP spin, is to use event handlers. You create a key-value-array with commands and their dedicated functions. When a command is given, you check if the array has the given key. If it does, call the function. This would be my recommendation for new code.

Brigand
  • 236
  • 2
  • 4
  • i've read your code and it's exactly the same scheme as my code, but as i stated, if you want other people to use, you need to add error checking and stuff – alfa64 Dec 17 '11 at 01:52
  • 1
    @alfa64 Please add any clarifications to the question, instead of comments. It's not very clear what exactly you are asking for, although it's somewhat clear that you are looking for something really specific. If so, tell us _exactly_ what that is. I don't think it's very easy to go from `I think my implementation is very crude and faulty` to `but as i stated, if you want other people to use, you need to add error checking and stuff`... Tell us exactly what's crude about it and what's faulty, it would help you get better answers. – yannis Dec 21 '11 at 04:27
  • sure, i'll rework the question – alfa64 Dec 21 '11 at 04:55
0

I suggest using a tool, instead of implementing a compiler or interpreter yourself. Irony uses C# to express the target language grammar (the grammar of your command line). The description on CodePlex says: "Irony is a development kit for implementing languages on .NET platform.“

See the official homepage of Irony on CodePlex: Irony - .NET Language Implementation Kit.

0

My advice would be google for a library that solves your problem.

I've been using NodeJS a lot lately, and Optimist is what I use for command-line processing. I encourage you to search for one you can use for you own language of choice. If not..write one and open source it :D You may even read through Optimist's source code and port it to your language of choice.

ming_codes
  • 379
  • 3
  • 7
0

Check out Apache CLI, it's entire purpose seems to be doing exactly what you want to do, so even if you can't use it, you can check out its architecture and copy that.

Pierre.Vriens
  • 233
  • 1
  • 2
  • 11
0

Why dont you simplify a little bit, your requirements ?

Don't use a full parser, its too much complex, and even unnecesary for your case.

Make a loop, write a message that represents yout "prompt", can be the current path you are.

Wait for a string, "parse" the string, and do something depending on the contents of the string.

The string could "parse" like expecting a line, in which spaces are the separators ("tokenizer"), and the rest of characters are grouped.

Example.

The program outputs (and stays in the same line): /user/files/ The user writes (in the same line) list all;

Your program will generate a list, collection or array like

list

all;

or, if ";" is considered a separator like spaces

/user/files/

list

all

Your program could start by expecting one single instruction, without unix-style "pipes", neither windowze-style redirection.

Your program could make a dictionary of instructions, each instruction, may have a list of parameters.

The command design pattern applies to your case:

http://en.wikipedia.org/wiki/Command_pattern

This a "plain c" pseudocode, is not tested or finished, just an idea of how could be done.

You could also make it more object oriented, and in the programming language, you like.

Example:


// "global function" pointer type declaration
typedef
  void (*ActionProc) ();

struct Command
{
  char[512] Identifier;
  ActionProc Action; 
};

// global var declarations

list<char*> CommandList = new list<char*>();
list<char*> Tokens = new list<char*>();

void Action_ListDirectory()
{
  // code to list directory
} // Action_ListDirectory()

void Action_ChangeDirectory()
{
  // code to change directory
} // Action_ChangeDirectory()

void Action_CreateDirectory()
{
  // code to create new directory
} // Action_CreateDirectory()

void PrepareCommandList()
{
  CommandList->Add("ls", &Action_ListDirectory);
  CommandList->Add("cd", &Action_ChangeDirectory);
  CommandList->Add("mkdir", &Action_CreateDirectory);

  // register more commands
} // void PrepareCommandList()

void interpret(char* args, int *ArgIndex)
{
  char* Separator = " ";
  Tokens = YourSeparateInTokensFunction(args, Separator);

  // "LocateCommand" may be case sensitive
  int AIndex = LocateCommand(CommandList, args[ArgIndex]);
  if (AIndex >= 0)
  {
    // the command

    move to the next parameter
    *ArgIndex = (*ArgIndex + 1);

    // obtain already registered command
    Command = CommandList[AIndex];

    // execute action
    Command.Action();
  }
  else
  {
    puts("some kind of command not found error, or, error syntax");
  }  
} // void interpret()

void main(...)
{
  bool CanContinue = false;
  char* Prompt = "c\:>";

  char Buffer[512];

  // which command line parameter string is been processed
  int ArgsIndex = 0;

  PrepareCommandList();

  do
  {
    // display "prompt"
    puts(Prompt);
    // wait for user input
      fgets(Buffer, sizeof(Buffer), stdin);

    interpret(buffer, &ArgsIndex);

  } while (CanContinue);

} // void main()

You didn't mention your programming language. You can also mention any programming language, but preferably "XYZ".

umlcat
  • 2,146
  • 11
  • 16
0

you have several tasks ahead of you.

looking at your requirements...

  • You need to parse the command. That's a fairly easy task
  • You need to have an extensible command language.
  • You need to have error checking and suggestions.

The extensible command language indicates that a DSL is required. I would suggest not rolling your own but using JSON if your extensions are simple. If they are complex, an s-expression syntax is nice.

Error checking implies that your system also knows about the possible commands. That would be part of the post-command system.

If I was implementing such a system from scratch, I would use Common Lisp with a stripped-down reader. Each command token would map to a symbol, which would be specified in a s-expression RC file. After tokenization, it would be evaluated/expanded in a limited context, trapping the errors, and any recognizable error patterns would return suggestions. After that, the actual command would be dispatched to the OS.

Paul Nathan
  • 8,560
  • 1
  • 33
  • 41
0

There is a nice feature in functional programming that you might be interested to look into.

It's called pattern matching.

Here are two links for some example of pattern matching in Scala and in F#.

I agree with you that using switch structures is a bit tedious, and I particularly enjoyed using patern matching durint the implementation of a compiler in Scala.

In particular, I would recommend you to look into the lambda calculus example of the Scala website.

That's, in my opinion, the smartest way to proceed, but if you have to stick strictly with PHP, then you're stuck with the "old-school" switch.

SRKX
  • 1,939
  • 1
  • 15
  • 24