I am pretty sure (but don't have a proof for this), that you can turn any static error into a syntax error by making the grammar arbitrarily more complex.
Historically, we have distinguished between lexical analysis (lexing / scanning), syntactic analysis (parsing), and semantic analysis (type checking, type inference, name resolution, etc.) for performance reasons: it was not possible to fit an entire compiler into the memory of a computer, so we broke it down into several components.
- The reader turns a stream of bits into a stream of characters (this is usually a simple mapping table, but it might be more complex in the case of variable-length encodings such as UTF-8)
- The lexer turns a stream of characters into a stream of tokens (typically, a lexer recognizes a regular language, i.e. a lexer is a Finite Automaton)
- The parser turns a stream of tokens into a parse tree (concrete syntax tree) (typically, a parser recognizes a context-free language)
- The semantic analyzer first turns a parse tree into an abstract syntax tree (using arbitrary Turing-capable computation)
- The various further components of the compiler annotate the AST with additional information, transform it into a different representation, etc.
Nowadays, this is no longer strictly necessary. The reader is typically supplied by the operating system or the standard library of the language. Scannerless parsers are thing, i.e. parsers that integrate scanning into the parsing process.
And, if we want to, we can build arbitrarily complex parsers that recognize much more than just context-free languages. Both modern lexer and parser frameworks allow you to build lexers and parsers that are more powerful than the traditional regular / context-free lexers and parsers. Plus, they allow you to attach arbitrary code in a Turing-complete language. E.g. when you generate C code from GNU Bison, you can embed arbitrary C code of your own into the generated parser which can interoperate with and influence the generated part of the code.
In your example, it would be easy to modify the grammar to only allow integers from 0–255. In a language like Java, where you can use byte
s in some places and long
s in others, expressing this as a grammar would be much more complex. And so, what the designers of Java did, was to define just one integer literal, and then define semantic rules for how this literal is to be interpreted in different contexts.
However, I do believe that it would be possible to check this in the parsing stage and thus make it a syntax error … it would just make the parser more complex, in fact, it would make the parser as complex as the entire compiler (at least the analysis part of the compiler minus optimizer and code generator).