18

For example if I have a class like

class Foo {
    public int bar;

    public Foo(int constructor_var) {
        bar = construction_var;
    }

    public bar_plus_one() {
        return bar++;
    }
}

Foo foo = new Foo(2);

and in the IDE I type foo.ba I get bar suggested, or if I type String x = foo.bar() I get red squiggles. How does the IDE become context aware? Is there a code querying language, is it reflection, what?

To clarify my question a little, I am asking because I want to be able to query my code base. I am looking for a tool where I can (essentially) say SELECT name FROM methods WHERE signature IS 3 ints or something like that. I figure whatever something like Intellisense uses to make suggestions is where I should be looking.

Deduplicator
  • 8,591
  • 5
  • 31
  • 50
ewhiting
  • 329
  • 2
  • 11
  • 1
    Have you heard of aspect oriented programming? It won’t let you run queries like you envision but it might eliminate the need to do that. – candied_orange Apr 11 '20 at 19:52
  • @candied_orange this is actually for a research project I'm doing at school where I'm trying to infer the architecture of a code base by searching through the associated files. Doing this in an AOP environment would be easier, but not everyone develops this way. – ewhiting Apr 11 '20 at 20:43
  • 1
    Ah, in that case you may wish to look into code visualization tools. – candied_orange Apr 11 '20 at 20:46
  • 2
    Take a look at this: VSCode communicates with language servers which are written in their own language: https://code.visualstudio.com/api/language-extensions/language-server-extension-guide – knallfrosch Apr 12 '20 at 12:49
  • Not sure if this is exactly what you're looking for, but, NDepend has a LINQ syntax which allows you to query into the code metrics: https://en.wikipedia.org/wiki/NDepend#Code_rules_through_LINQ_queries_(CQLinq) – Reginald Blue Apr 12 '20 at 22:11
  • @knallfrosch This is a recent development. IDE's have done this for quite some time. – Thorbjørn Ravn Andersen Apr 13 '20 at 10:09
  • In the very old days an external program generated a TAGS file that the plain editor vi could use for autocompletion without any understanding of the file being edited. – Thorbjørn Ravn Andersen Apr 13 '20 at 10:43

4 Answers4

29

As a very high-level overview, the IDE contains a compiler. (Well, most parts of a compiler: it doesn't need to generate code or optimize, but all the rest is there, lexing, parsing, semantic analysis, type inference, type checking, macro expansion, symbol resolution, etc.)

From the information gleaned from this analysis, the IDE constructs a semantic model of the code, and then, when it encounters incomplete code, it uses sufficiently advanced magic to figure out how best to complete it. (A simple algorithm would be to offer the shortest possible completion, but normally, IDEs are much more sophisticated than that.)

Because of the code duplication between IDEs and compilers, in recent years, there have been efforts to integrate the two. E.g. Microsoft's Roslyn compiler for C# and Visual Basic.NET was explicitly designed with APIs that allow an IDE to access all the required information. Likewise, the nsc (New Scala Compiler) and dotc compilers for Scala, and the Clang compiler for C / C++ have APIs for embedding into an IDE.

Note that the compiler built into an IDE has some different requirements from a classic batch compiler: it needs to be asynchronous, reactive, concurrent, fast, incremental, and most of the time, the code it deals with will be incomplete, invalid, and have errors. However, even despite these conflicting requirements, it makes sense to merge the two into one, because this guarantees that the IDE and the compiler always have the same understanding of the code.

As a counter-example, IntelliJ IDEA uses its own compiler framework. IDEA uses a single language-agnostic semantic graph for the entire project, no matter how many different languages are used within the project. This allows it to have really cool features such as automatically converting code between different languages, or refactoring across languages in a polyglot project. But, it runs into precisely the problem I mentioned above, especially often with Scala, where IntelliJ shows errors for code that actually compiles fine with the Scala compiler or vice versa.

Microsoft has developed the Language Server Protocol, which is an API that allows IDEs to communicate with compilers using a standardized protocol. This means that compilers that implement the LSP will automatically work with every IDE that implements the LSP, and likewise, IDEs that implement the LSP will automatically support every language for which a compiler exists that implements the LSP. Nowadays, lots of compilers (e.g. the tsc TypeScript compiler, Idris, Scala) and IDEs (Visual Studio Code, Emacs, Vim) implement it.

In the same vein and based on the success of the LSP, there is now an effort by the Scala community to define a Build Server Protocol that allows IDEs to abstract over build tools (SBT, Maven, Gradle, Mill).

Addendum: Everything I wrote above, applies to "good™️" IDEs with semantic features. There are much, much, simpler IDEs that, for example, simply offer every word (even from comments) of the current file as completions, regardless of whether that word is even syntactically legal in that context.

Jörg W Mittag
  • 101,921
  • 24
  • 218
  • 318
  • 1
    I get the impression (from observation) that IntelliJ also uses history: it suggests the auto-completions that are most likely given recent editing behaviour (sometimes with uncanny accuracy). Can you comment on this? – Michael Kay Apr 13 '20 at 09:58
  • 2
    I neither use IntelliJ nor do I work for JetBrains, but I can easily imagine this being a factor in their algorithm. There is a whole continuum of sophistication from simply "splitting the file into words and suggesting the shortest match" which doesn't even require understanding the language to applying machine learning techniques. Some languages have "code inference" (it is not well-known that the mathematical principles that allow type-inference, i.e. computing types from code actually work both ways) that can even fill out the body of expressions simply from the types. – Jörg W Mittag Apr 13 '20 at 11:27
  • 3
    I once read a research paper about an experimental Eclipse plugin that would do a structural / type analysis of the code, build kind of a "fingerprint", then search a database of open source code that the researchers had indexed with such "fingerprints", download matching pieces of code, automatically rename functions and variables to match the code being edited, run the downloaded code snippets against the project's unit tests, and present the developer with a list of complete open source code snippets that fulfill the type constraints and pass the tests of the code being edited. – Jörg W Mittag Apr 13 '20 at 11:30
  • 1
    That is the absolute ultimate kind of "code completion", where it literally goes and finds already written code for you and fills it in. – Jörg W Mittag Apr 13 '20 at 11:31
  • Nice answer. As an example of what can go wrong with IntellIJ's approach, here is [a bug I raised](https://youtrack.jetbrains.com/issue/IDEA-212740) where following their incorrect hint will actually cause compilation to fail. – Michael Apr 13 '20 at 14:38
  • 1
    @MichaelKay More complex than a simple history, they're also experimenting with [applying machine learning](https://www.jetbrains.com/help/idea/auto-completing-code.html#2920d53b) – Michael Apr 13 '20 at 14:41
17

The IDE understands the code. It is able to parse it and extract all the necessary information for autocomplete, like what classes are available, their names and all their members. The IDE team most probably had to implement this parsing themselves, or use private APIs in the compiler.

And compilers do the same thing as their main function. The compiler builds a representation of the codebase for its own use. But compilers rarely expose that information to the outside world. So if you want to query your code, the most likely scenario is to implement your own parser which might take lots of effort, depending on the complexity of the language.

But if your language is C#, then you are in luck. Over the last few years, the C# compiler team put effort into exposing just that information from their Roslyn compiler. So getting something like SELECT name FROM methods WHERE signature IS 3 ints is as trivial as importing a NuGet package, loading a code and writing a LINQ query (demonstration).

CJ Dennis
  • 659
  • 5
  • 15
Euphoric
  • 36,735
  • 6
  • 78
  • 110
  • 18
    Additionally, Microsoft developed the Language Server Protocol so that you can implement autocomplete once, and then use it across all LSP-aware IDEs. – amon Apr 11 '20 at 19:36
  • 1
    @amon That isn't accurate. LSP is about the IDE requesting the auto-complete information that can be displayed in the IDE UI. The parsing and context-awareness must be implemented in the language server itself. The IDE sends no information that would help with this scenario. – Euphoric Apr 12 '20 at 10:21
  • 7
    Yes, that's exactly what I'm saying: the IDE doesn't have to analyze the semantics itself and can just delegate to an existing language server. – amon Apr 12 '20 at 10:23
  • 2
    Some compilers have public APIs. For example, Clang has libclang, which can parse both C and C++ and has a standard API and API for accessing the AST. – john01dav Apr 12 '20 at 14:51
  • 2
    Downvoted for completely ignoring the existence of language servers, which is pretty much how everyone does it these days. – JohnEye Apr 13 '20 at 00:13
  • 1
    @JohnEye That would make sense if the language server and compilere were commonly one and the same thing. Is that true? – Euphoric Apr 13 '20 at 05:10
  • 1
    @Euphoric That heavily depends a lot on the specific language, its compilers, linters, ability to perform incremental compilation and so on. Compilation is expensive, so LSP implementations tend to avoid using them if possible. For a modern example, Rust's language server uses information both from the compiler where possible, and where that would be too slow, from a code completion tool called Racer. – JohnEye Apr 13 '20 at 13:28
7

How does the IDE become context aware?

Code completion and refactoring features are generally implemented by building an abstract syntax tree from the source code. The nodes of the AST represent things such as variables, operators and method calls. When you type foo., the IDE uses the AST to resolve the variable foo to the type Foo and then displays a list of members from that type.

I am asking because I want to be able to query my code base.

The simplest way to do this would be to write a plugin for your favorite IDE. Most IDEs expose the AST through their plugin APIs and make it easy to build new code analysis tools by leveraging their existing infrastructure.

casablanca
  • 4,934
  • 16
  • 24
  • 4
    Only difficulty is that the IDE needs produce a partial AST for an incomplete function. So typing "switch(x.type)" on its own needs to be parsed as "start of a switch statement with brackets and cases missing". – gnasher729 Apr 12 '20 at 09:30
  • 4
    That's one of the important ways in which the "compiler" built into the IDE differs from a traditional batch compiler: a batch compiler's job is done as soon as it detects an error, whereas an IDE's job only gets started! However, in modern compilers, we generally expect powerful warnings, helpful errors, and even fix suggestions, so the divide gets smaller everyday. (E.g. "having more readable errors than every other compiler" was a specific design goal of Clang compared to e.g. GCC, ICC, VisualC, etc.) – Jörg W Mittag Apr 12 '20 at 12:18
1

Your IDE may use an external plugin/tool to do auto-complete. For example with vim and C or C++ you may use clang-complete, which as the name suggests makes use of clang's ability to suggest code completions given a source file. For python your IDE (such as vim, VSCode, Atom, Emacs, Sublime, Gedit, ...) might use as an example jedi-vim, which uses the library jedi's autocomplete features. These external tools can be used within or without the IDE.

qwr
  • 333
  • 1
  • 9