Questions tagged [lexer]

a lexer is a program performing lexical analysis: it converts a sequence of characters into a sequence of tokens.

47 questions
119
votes
4 answers

When to use a Parser Combinator? When to use a Parser Generator?

I've taken a deep dive into the world of parsers recently, wanting to create my own programming language. However, I found out that there exist two somewhat different approaches of writing parsers: Parser Generators and Parser…
Qqwy
  • 4,709
  • 4
  • 31
  • 45
26
votes
6 answers

Why implement a lexer as a 2d array and a giant switch?

I'm slowly working to finish my degree, and this semester is Compilers 101. We're using the Dragon Book. Shortly into the course and we're talking about lexical analysis and how it can be implemented via deterministic finite automata (hereafter,…
Telastyn
  • 108,850
  • 29
  • 239
  • 365
23
votes
3 answers

What should be the datatype of the tokens a lexer returns to its parser?

As said in the title, which data type should a lexer return/give the parser? When reading the lexical analysis article that Wikipedia has, it stated that: In computer science, lexical analysis is the process of converting a sequence of characters…
Christian Dean
  • 2,790
  • 1
  • 22
  • 38
23
votes
5 answers

Are separate parsing and lexing passes good practice with parser combinators?

When I began to use parser combinators my first reaction was a sense of liberation from what felt like an artificial distinction between parsing and lexing. All of a sudden everything was just parsing! However, I recently came across this posting…
Eli Frey
  • 331
  • 2
  • 8
20
votes
4 answers

Writing a lexer in C++

What are good resources on how to write a lexer in C++ (books, tutorials, documents), what are some good techniques and practices? I have looked on the internet and everyone says to use a lexer generator like lex. I don't want to do that, I want to…
user4595
16
votes
5 answers

Coming up with tokens for a lexer

I'm writing a parser for a markup language that I have created (writing in python, but that's not really relevant to this question -- in fact if this seems like a bad idea, I'd love a suggestion for a better path). I'm reading about parsers here:…
Explosion Pills
  • 1,929
  • 16
  • 22
15
votes
3 answers

How would you test a lexer?

I'm wondering how to effectively test a lexer (tokenizer). The number of combinations of tokens in a source file can be huge, and the only way I've found is to make a batch of representative source files and expect an specific sequence of tokens for…
SuperJMN
  • 413
  • 3
  • 9
15
votes
1 answer

What is the procedure that is followed when writing a lexer based upon a grammar?

While reading through an answer to the question Clarification about Grammars , Lexers and Parsers, the answer stated that: [...] a BNF grammar contains all the rules you need for lexical analysis and parsing. This came across as somewhat odd to me…
Christian Dean
  • 2,790
  • 1
  • 22
  • 38
9
votes
5 answers

Lexical Analysis without regular expressions

I've been looking at a few lexers in various higher level langauges (Python, PHP, Javascript among others) and they all seem to use regular expressions in one form or another. While I'm sure regex's are probably the best way to do this, I was…
Blank
  • 253
  • 3
  • 7
8
votes
4 answers

When to use ANTLR and when to use a parsing library

I've always wanted to learn how to write a compiler - I've decided to use ANTLR, and am currently reading through the book (its very good by the way) I'm pretty new to this, so go easy, but the jist seems to be that you write your grammar, transform…
phatmanace
  • 2,445
  • 3
  • 14
  • 11
8
votes
3 answers

Clarification about Grammars , Lexers and Parsers

Background info (May Skip): I am working on a task we have been set at uni in which we have to design a grammar for a DSL we have been provided with. The grammar must be in BNF or EBNF. As well as other thing we are being evaluated on the Lexical…
The_Neo
  • 191
  • 7
7
votes
2 answers

Can you apply the same lexer rules to all programming languages?

I'm trying to understand the theory behind a lexer with the purpose of building one (just for my own fun and experience and to compensate for not taking proper CS courses :)). What I have yet to understand is if lexer theory is the same no matter…
JohnDoDo
  • 2,309
  • 2
  • 18
  • 32
6
votes
1 answer

Should a lexer un-escape strings?

Is it a lexer's job to undo any escaping done to a string literal? For example: "Me: \"Hello World!\"" Becomes: Me: "Hello World!" Should this conversion be done inside the lexer? I am guessing it should, because it'd allow for a more abstract and…
Jeroen
  • 613
  • 1
  • 7
  • 13
6
votes
3 answers

What is the proper way to distinguish between keywords and identifiers?

I'm aware that most modern languages use reserved words to prevent things like keywords from being used as identifiers. Reserved words aside, let's assume a language that allows keywords to be used as identifiers. (For example, in Ruby a keyword can…
jhewlett
  • 2,224
  • 1
  • 17
  • 15
6
votes
1 answer

Chosing a parser for a code beautifier

I'm in the planning stage of making a code beautifier (similar to AStyle or Uncrustify) - originally I was going to just contribute to one of those projects, but reviewing their source led me to the conclusion that I have different design goals and…
Matt Kline
  • 210
  • 1
  • 3
  • 10
1
2 3 4