6

If one would to write auto-suggest for an IDE/code editor (like IntelliSense), does one generally need to write a parser for any language that should be supported, or do compilers/runtimes provide these things?

Neither the PHP or Python binaries seem to support just parsing files and generating some kind of raw output.

Eclipse seems to roll their own parser, but is this the only way to do it? Using the "native" parser would be awesome on so many levels :)

Markus Hedlund
  • 329
  • 1
  • 6
  • Some languages do. Take a look at [GCC-XML](http://www.gccxml.org/HTML/Index.html) for C++ and maybe C. It might be able to do the other GCC languages too (like Fortran). For other languages, you could do some digging with their source code (like taking the source of the Ruby parser and turning it into a library), but that would be a lot more effort than you're looking for. – Linuxios Aug 03 '12 at 23:56
  • Actually Python does, see the `ast` module – Winston Ewert Aug 04 '12 at 15:28
  • JetBrains IntellJ uses [ANTLR](http://www.antlr.org) to parse all the supported languages for its auto complete. –  Aug 04 '12 at 18:01
  • just wanted to point out, intellisense is more than just code-completion, it's also composed of code-insight which is the informal side of VS's intellisense, basically, it tells you what's not known by simply looking at the code. (hover your mouse over a variable and see exactly what unknown object(s) the variable's holding at that time) – Tcll Sep 17 '15 at 17:38

1 Answers1

2

PHP has token_get_all to parse a file and generate the list of tokens you can use to work with code, like doing static analysis, checking the style or implementing an auto-complete feature.

Python has Abstract Syntax Trees (AST), which seems even more helpful, and has also some neat features like the compile function which compiles the tree itself.

In general, it's a very bad idea to implement your own parser (aside from learning). It's hugely difficult and error prone and will become invalid as soon as the language specification changes. Such changes are unusual with well-designed languages like C#, but are not so unusual in languages like PHP which have lots of flaws and missing features (example: the recent implementation of namespaces in PHP). Also, by reinventing your own parser, you're reinventing the wheel already invented for a compiler: instead of doing exclusively your work (the auto-complete feature), you spend time writing lots of code related to parsing, the code you have to test and maintain later.

Some hints

You may be interested in a term of "compiler as a service". For example, Microsoft is working on a compiler as a service for C#, which will let you to extract programmatically the information from the compiler; such scenario might be helpful for an auto-complete feature.

You may also search for static checkers for the language of your interest. Many are open source, so looking at how they process the code source may give you some suggestions about the parsing.

Finally, some compilers are themselves open sourced. Depending on the license they use and the one you will use for your auto-complete-enabled product, you may be unable to reuse the code, but still it can give some hints too.

Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513
  • PHP's `token_get_all` was exactly what I was looking for! Thank you for all suggestions. – Markus Hedlund Aug 04 '12 at 12:33
  • just wanted to mention, there's nothing wrong with reinventing the wheel, I do it all the time because nothing else works properly for my sophisticated needs. basically, **if it ain't broken, break it and build something better**. it really just depends on how much work you really want to go through. – Tcll Sep 18 '15 at 11:54