14

To my understanding, a parser creates a parse tree, and then discards it thereafter. However, it can also pop out an abstract syntax tree, which the compiler supposedly makes use of.

I'm under the impression that both the parse tree and the abstract syntax tree are created under the parsing stage. Then could someone explain why these are different?

  • 3
    Why do they have to be different? Can't they be the same thing from two slightly different points of view? – S.Lott Feb 06 '12 at 03:20
  • 1
    Not sure I understand these two "slightly different points of view" :-( – Combinator Logic Feb 06 '12 at 03:21
  • That's the point. They're the same thing, hence the confusion. You just have different words depending on your viewpoint: Parsing vs. Code Generation (or Execution, in the case of an interpreter). – S.Lott Feb 06 '12 at 03:24
  • There is no difference. With a bit of imagination one can invent a syntax representation for any possible abstract syntax tree. With a lack of imagination, Lisp's S-expressions would be a default syntax suitable for everything. – SK-logic Feb 06 '12 at 07:50
  • 1
    You 'all should have read the answer before commenting. There is a difference, but having them separate or combined is an implementation issue. – Paul Feb 06 '12 at 13:49

2 Answers2

21

A parse tree is also known as a concrete syntax tree.

Basically, the abstract tree has less information than the concrete tree. The concrete tree contains each element in the language, whereas the abstract tree has thrown away the uninteresting pieces.

For example the expression: (2 + 5) * 8

The concrete looks like this

  ( 2  + 5 )  * 8
  |  \ | / |  | |
  |   \|/  |  | |
   \___|__/   | |
       \______|/

Whereas the abstract tree has:

2  5 
 \/   
  +  8
   \/
   *

In the concrete cases the parentheses and all pieces of the language have been incorporated into the tree. In the abstract case the parentheses are gone, because their information has been incorporated into the tree structure.

sakisk
  • 3,377
  • 2
  • 24
  • 24
Winston Ewert
  • 24,732
  • 12
  • 72
  • 103
  • You forgot to mention in which compiler for what language this is implemented this way. Because, you know, you don't have to ... the parser could as well throw away the parenthesis right away. – Ingo Feb 20 '14 at 13:41
  • 1
    @Ingo, nothing in my answer has anything to say about when a compiler throws away parentheses. It asking what the difference is between a concrete parse tree and an abstract parse tree. – Winston Ewert Feb 20 '14 at 15:09
  • So, surely, you can name a source for your claim? Or is this just your private definition? – Ingo Feb 20 '14 at 16:18
  • 1
    @Ingo, http://en.wikipedia.org/wiki/Abstract_syntax_tree and http://en.wikipedia.org/wiki/Concrete_syntax_tree. – Winston Ewert Feb 21 '14 at 05:45
  • But there they say exactly what I am saying, in short that "AST" is a word for useful data structures that appear in compilers. For example, there is a section about "Designing an AST". Therefore, it is nonsense to claim, that an AST contains this or that data (even if the Wiki article itself does this). It is simply a category error, like saying: "A carnivore has 32 teeths." - No! Carnivore is a family of species, which have sub-species, which have individuals, which have teeths (or not). A set of sets of Xs is not itself a set of X, this is or should be basic knowledge. – Ingo Feb 21 '14 at 10:37
  • 1
    @ingo, https://docs.google.com/document/d/1WP2AQoLQ2qlBSgK7x6LoS9EKaqhrBanGelZTmnSGo3k/pub – Winston Ewert Feb 22 '14 at 02:30
  • No @Winston, not so "... and that every useful data structure that appears in a compiler is an AST." Never said that. As for the orthography and biology, you're just nitpicking. Sorry for not being a native speaker. BTW, despite us having different opinions, I for my part did not downvote you. – Ingo Feb 22 '14 at 10:00
  • @Ingo, I was being a bit silly with that, as I said I was in an odd mood. – Winston Ewert Feb 22 '14 at 15:51
0

The first thing you need to understand is that nobody forces you to write a parser or compiler in a certain way. Specifically, it is not necessarily the case that the result of a parser must be a tree. It can be any data structure that is suitable to represent the input.

For example, the following language:

prog:
      definition 
    | definition ';' prog
    ;

definition: .....

can be represented as a list of definitions. (Nitpickers will point out that a list is a degenerate tree, but anyway.)

Second, there is no need to hold onto the parse tree (or whatever data structure the parser returned). To the contrary, compilers are usually constructed as a sequence of passes, that transform the results of the previous pass. Hence the overall layout of a compiler could be thus:

parser :: String             -> Maybe [Definitions]      -- parser
pass1  :: [Definitions]      -> Maybe DesugaredProg      -- desugarer
pass2  :: DesugaredProg      -> Maybe TypedProg          -- type checker
pass3  :: TypedProg          -> Maybe AbstractTargetLang -- code generation
pass4  :: AbstractTargetLang -> Maybe String             -- pretty printer

compiler :: String           -> Maybe String    -- transform source code to target code
compiler source = do
   defs  <- parser source
   desug <- pass1 defs
   typed <- pass2 desug
   targt <- pass3 typed
   pass4 targt

Bottom Line: If you hear people talk about parse trees, abstract syntac trees, concrete syntax trees etc., always substitute by data structure suitable for the given purpose, and you're fine.

Ingo
  • 3,903
  • 18
  • 23
  • 3
    How is this an answer to the question? – Winston Ewert Feb 20 '14 at 15:10
  • @WinstonEwert I claim that "parse tree" and "abstract syntax tree" are abstractions and mean nothing more than "suitable data structures". So, the answer is that the question in its generality does not make sense, **unless** you ask for the difference between the return type of the parser and the return type of some other pass in compiler X.Y of language L. – Ingo Feb 20 '14 at 16:12