1

I know that programming languages can be defined in EBNF which can be converted into regular expressions. Right now I am working on a very simple BASIC interpreter for a project. The code has to be entered in a gui which should validate the syntax to later transfer the code to an embedded system where it is executed.

I was googling to find an article or tutorial on writing a validator for this job but I could not really find such a thing. Is it just defining the regular expressions and try to match them?

Note: the GUI part is written in Java while the embedded code is written in C++.

clambake
  • 133
  • 4
  • I wouldn't try to do it in a regular expression. The classic approach would be to define lexicals and grammar in bison and yacc (gnu) – Lord_Gestalter Jul 23 '14 at 06:44
  • I added a note. – clambake Jul 23 '14 at 06:50
  • Even then it should be possible to glue in c-code. At least this approach would be to use an existing (pre-)solution proven in use over more than 20 years instead of ... well, let me quote "I had a problem, and used regular expressions to solve it. Now I have two problems" ;-) – Lord_Gestalter Jul 23 '14 at 06:56
  • 1
    you are right I'm having a look at jflex and antlr – clambake Jul 23 '14 at 07:04
  • I believe that coding a simple BASIC interpreter is much more difficult than what you believe. And it is less fun than e.g. coding a simple Lisp or Scheme interpreter. – Basile Starynkevitch Jul 23 '14 at 08:52

2 Answers2

5

Your initial premise that an EBNF language description can be converted to regular expressions is incorrect. The set of languages that can be parsed with regular expressions is a subset of the set of languages that can be described in EBNF.
For example, it is impossible to write a regular expression to check if nested parentheses are balanced.

The best way to validate your language input is to write a parser for it. There are also parser generators (a-la yacc/bison) for Java.

Bart van Ingen Schenau
  • 71,712
  • 20
  • 110
  • 179
  • And of course, the initial premise of the initial premise is also incorrect: not all programming languages can be described by EBNF, only those whose syntax is context-free can. – Jörg W Mittag Jul 23 '14 at 14:03
0

Code validation should not only be syntactical, but more importantly should take some of the semantics, which is much harder. Read about static program analysis, type inference, etc...

For your project, did you consider embedding an existing interpreter (e.g. guile or Lua, etc...) inside your program?

If you want to write an interpreter, read about domain specific languages. See also this answer to a related question.

Basile Starynkevitch
  • 32,434
  • 6
  • 84
  • 125