Questions tagged [parsing]

Analyzing (un)structured data to convert it into a structured, normalized format.

296 questions
119
votes
4 answers

When to use a Parser Combinator? When to use a Parser Generator?

I've taken a deep dive into the world of parsers recently, wanting to create my own programming language. However, I found out that there exist two somewhat different approaches of writing parsers: Parser Generators and Parser…
Qqwy
  • 4,709
  • 4
  • 31
  • 45
102
votes
12 answers

Should I use a parser generator or should I roll my own custom lexer and parser code?

What specific advantages and disadvantages of each way to working on a programming language grammar? Why/When should I roll my own? Why/When should I use a generator?
Maniero
  • 10,826
  • 14
  • 80
  • 133
51
votes
4 answers

How exactly is an Abstract Syntax Tree created?

I think I understand the goal of an AST, and I've built a couple of tree structures before, but never an AST. I'm mostly confused because the nodes are text and not number, so I can't think of a nice way to input a token/string as I'm parsing some…
Howcan
  • 721
  • 2
  • 7
  • 7
43
votes
1 answer

C++11 includes std::stoi, why not std::itos?

I noticed to my glee that C++11 has a std::sto@ family of functions for easily unpacking ints/floats/longs whatever from strings. I'm surprised however, that the opposite isn't implemented. Why didn't the standards committee include a std::itos…
Doug T.
  • 11,642
  • 5
  • 43
  • 69
41
votes
2 answers

Do modern languages still use parser generators?

I was researching about the gcc compiler suite on wikipedia here, when this came up: GCC started out using LALR parsers generated with Bison, but gradually switched to hand-written recursive-descent parsers; for C++ in 2004, and for C and…
eatonphil
  • 561
  • 4
  • 8
39
votes
7 answers

Why was strict parsing not chosen for HTML?

I have often wondered why strict parsing was not chosen when creating HTML. For most of the Internet history, browsers have accepted any kind of markup and tried their best to parse it. The process degrades performance, permits people to write…
Shubham
  • 713
  • 7
  • 17
34
votes
5 answers

How are comments usually parsed?

How are comments generally treated in programming languages and markup? I am writing a parser for some custom markup language and want to follow the principle of least surprise, so I'm trying to determine the general convention. For example, should…
Sled
  • 1,868
  • 2
  • 17
  • 24
30
votes
1 answer

The Inglish parser (for The Hobbit 1982)

Was fascinated to read about the text adventure game The Hobbit which featured an incredibly robust parser called "Inglish": ...Inglish allowed one to type advanced sentences such as "ask Gandalf about the curious map then take sword and kill troll…
Jordan Reiter
  • 623
  • 5
  • 13
27
votes
5 answers

Name for this type of parser, OR why it doesn't exist

Conventional parsers consume their entire input and produce a single parse tree. I'm looking for one that consumes a continuous stream and produces a parse forest [edit: see discussion in comments regarding why this use of that term may be…
Kevin Krumwiede
  • 2,586
  • 1
  • 15
  • 19
27
votes
5 answers

Can the csv format be defined by a regex?

A colleague and I have recently argued over whether a pure regex is capable of fully encapsulating the csv format, such that it is capable of parsing all files with any given escape char, quote char, and separator char. The regex need not be…
Spencer Rathbun
  • 3,576
  • 1
  • 21
  • 28
26
votes
8 answers

Is it possible to statically predict when to deallocate memory---from source code only?

Memory (and resource locks) are returned to the OS at deterministic points during a program's execution. The control flow of a program by itself is enough to know where, for sure, a given resource can be deallocated. Just like how a human programmer…
zelcon
  • 565
  • 5
  • 10
26
votes
3 answers

Implementing the Visitor Pattern for an Abstract Syntax Tree

I'm in the process of creating my own programming language, which I do for learning purposes. I already wrote the lexer and a recursive descent parser for a subset of my language (I currently support mathematical expressions, such as + - * / and…
marco-fiset
  • 8,721
  • 9
  • 35
  • 46
25
votes
3 answers

In which process does syntax error occur? (tokenizing or parsing)

I'm trying to understand compilation and interpretation, step by step figuring out a total image. So I came up to a question while reading http://www.cs.man.ac.uk/~pjj/farrell/comp3.html this article It says : The next stage of the compiler is…
FZE
  • 469
  • 4
  • 12
25
votes
7 answers

What are the arguments against parsing the Cthulhu way?

I have been assigned the task of implementing a Domain Specific Language for a tool that may become quite important for the company. The language is simple but not trivial, it already allows nested loops, string concatenation, etc. and it is…
smarmy53
  • 261
  • 3
  • 6
23
votes
5 answers

Are separate parsing and lexing passes good practice with parser combinators?

When I began to use parser combinators my first reaction was a sense of liberation from what felt like an artificial distinction between parsing and lexing. All of a sudden everything was just parsing! However, I recently came across this posting…
Eli Frey
  • 331
  • 2
  • 8
1
2 3
19 20