Machine learning applied to code development

Question

My background is in mechanical engineering, so please forgive my ignorance to this area.

I really enjoy programming and software development. Also, I recently took a free online Machine Learning (ML) class, which I highly recommend, taught by Stanford professor Andrew Ng. Link here.

I've heard this professor say that it's difficult to find areas that ML will never impact.

Question

So my question is, what research has been done so far in applying machine learning to code development? How about debugging?

Please include resources/sources/scientific papers if possible.

I haven't had luck searching for this because often searching ML and software development (or programming) ends up leading to results in the software development (or programming) of ML applications.

Is your question about code that writes code, or are you asking about coding techniques to implement machine learning? — Robert Harvey, Jun 18 '17 at 04:37
Code (ML code) that writes code, or improves code, or checks for mistakes in code (whether it be for web development, numerical solver etc.). Not techniques for implementing machine learning. — Charles, Jun 18 '17 at 05:44
Not to say this won't ever happen but high-level programming languages are designed to make given the computer instruction easier for humans with slow chemically activated control units. At the most fundamental level, ML is machines determining what machines should do. In the future, languages designed for squishy water bags will be as unnecessary as the humans themselves. — JimmyJames, Jun 19 '17 at 20:10
This question is supposed to be migrated to Artificial Intelligence site.Can those who up-voted tell us why?? — quintumnia, Jun 22 '17 at 08:47
Debugging can be defined as "fixing misbehavior", or it can be defined as "educated reasoning about the behavior in order to identify the misbehavior", which yields very different answer as to if/how an AI would debug. Additionally, the consideration that an AI can just regenerate its response rather than tinker with the imperfect earlier response might preclude the concept of debugging altogether. All of this to say that I feel your question is a bit open to interpretation and might yield a range of applicable answers. — Flater, Jun 20 '23 at 01:06

score 8 · Answer 1 · answered Jun 19 '17 at 18:17

8

Fuzzing is a testing method where machine learning can & has been applied. Fuzzing is a method of testing in the realm of automated exploratory testing. It attempts to find defects in software by running a large number of inputs and looking for errors. Unhandled exceptions are the simplest category, but a smart implementation can use ML to find suspect outputs. ML is largely used in this domain in order to make the process more efficient however. This works by using ML to avoid testing every possible input by training on "interesting" inputs. (Dis-similar inputs that are likely to cause failure.)

answered Jun 19 '17 at 18:17

RubberDuck

8,911
5
35
44

Interesting. So this kind of falls into the category of code testing, right? I like RJB's answer a bit more, as it pertains to development, rather than testing. But testing/debugging is certainly still useful. – Charles Jun 20 '17 at 05:04
Yeah. It's definitely in the testing realm, and not enough people have tried it, but it's gaining momentum as viable technique as cloud computing becomes more and more normal. It's become easier to get yourself a cluster of machines, run tests for a week, then discard the cluster until next time. – RubberDuck Jun 20 '17 at 09:42

score 5 · Answer 2 · answered Jun 23 '17 at 04:30

Yes. This area is hot right now. It's called "big code," and DARPA put $40 million into it: http://www.darpa.mil/program/mining-and-understanding-software-enclaves . Some impressive results have come out of this grant, such as the Prophet and Genesis systems of Fan Long, which can automatically fix bugs in programs by using a learned model of correct patches. Martin Vechev and his student Veselin Raychev have also been pioneers in this area. Perhaps their most impressive result is JSNice ( http://jsnice.org/ ), which can "de-minimize JavaScript code.

On the whole, the idea of big code has not lived up to its promise: the data is way too sparse to learn anything much more interesting than variable names. While I am still funded in part by this DARPA program, my lab has mostly stopped working on it. On that note, the last thing I heard about DeepCoder is that it gets fairly pathetic results compared to the state of the art in program synthesis.

Most successful tools for automated programming still rely on on non-ML methods like SMT solvers. Have a look at the proceedings of any PL conference (e.g.: PLDI, POPL, OOPSLA) or any academic software engineering conference (e.g.: ICSE, FSE, ISSTA, ASE), and you'll see plenty of examples.

RJB · Answer 3 · 2017-06-20T21:18:28.120

4

Microsoft has been developing DeepCoder to use deep learning to predict a method body from a given input and outputs. That's the only example I know offhand.

I can tell you that Meta-Genetic Programming is a field of study with a similar ambition, but I can't say I know enough about it to be knowledgeable.

Genetic Programming was in the news in 2015 when muScalpel evolved a solution to transplant a feature from one program to another, using the unit tests for both as a kind of training set.

edited Jun 20 '17 at 21:18

answered Jun 19 '17 at 17:33

RJB

2,090
1
14
11

This is like generating algorithms using a genetic model, right? Do you know of any applications to aiding code development? I'm thinking of human-machine working together, rather than a purely machine driven (genetic-based model). I know that this may sound specific, but I'm mostly curious because I'm new to this area. – Charles Jun 20 '17 at 05:08
Sure you're right, I misread, I was thinking too recursively about using ML to do ML :) #edited – RJB Jun 20 '17 at 21:22

score 2 · Answer 4 · answered Jun 22 '17 at 08:31

So my question is, what research has been done so far in applying machine learning to code development? How about debugging?

A related question is about machine learning techniques for code generation and compilation (since you could imagine transpilers and compilers as a way to automatically "develop code" -actually writing code- from some higher level language).

There have been several papers about that, for example MILEPOST GCC.

You can also google for papers about machine learning techniques for debugging or for static source code analysis (or any kind of static program analysis).

See also J.Pitrat's blog on bootstrapping artificial intelligence which is related to your question.

Claude · Answer 5 · 2017-06-22T19:52:23.200

In a recent article in Communications of the ACM about Making money using math Erik Meijer cited Jeff Dean, Google Senior Fellow, Systems and Infrastructure Group:

If Google were created from scratch today, much of it would be learned, not coded.

The article gives an overview about recent activities in the research area. It is behind a pay wall but might be worth reading if you are interested in theoretical parallels between coding and machine learning/statistics. Maybe the reference list at the end of the article might be helpful too.

As an example the article refers to WebPPL, probabilistic programming for the web.

Flater · Answer 6 · 2018-11-06T07:52:52.567

I found quite an extensive reading list on all coding-related machine learning topics.

As you can see, people have been trying to apply machine learning to coding, but always in very narrow fields, not just a machine that can handle all manner of coding or debugging.
The rest of this answer focuses on your relatively broad scope "debugging" machine and why this has not really been attempted yet (as far as my research on the topic shows).

I redacted a lengthy part of the answer. To summarize (it's important for the next part): going by the current machine learning methodology, anything a human can learn, a machine can as well. We are only limited by the physical realm (CPU speed, size of a machine, ...), not a supposed limited applicability of the learning algorithm itself.

what research has been done so far in applying machine learning to code development? How about debugging?

The issue here isn't that it's impossible, but rather that it's an incredibly complex topic.

Humans have not even come close to defining a universal coding standard that everyone agrees with. Even the most widely agreed upon principles like SOLID are still a source for discussion as to how deeply it must be implemented. For all practical purposes, it's imposible to perfectly adhere to SOLID unless you have no financial (or time) constraint whatsoever; which simply isn't possible in the private sector where most development occurs. SOLID is a guideline, not a hard limit.

In absence of an objective measure of right and wrong, how are we going to be able to give a machine positive/negative feedback to make it learn?
At best, we can have many people give their own opinion to the machine ("this is good/bad code"), and the machine's result will then be an "average opinion". But that's not necessarily the same as a correct solution. It can be, but it's not guaranteed to be.

Secondly, for debugging in particular, it's important to acknowledge that specific developers are prone to introducing a specific type of bug/mistake. The nature of the mistake can in some cases be influenced by the developer that introduced it.

For example, as I am often involved in bugfixing others' code at work, I have a sort of expectation of what kind of mistake each developer is prone to make. Given a certain problem, I know that dev A is likely to forget updating the config file, whereas dev B often writes bad LINQ queries. Based on the developer, I may look towards the config file or the LINQ first.
Similarly, I've worked at several companies as a consultant now, and I can clearly see that types of bugs can be biased towards certain types of companies. It's not a hard and fast rule that I can conclusively point out, but there is a definite trend.

Can a machine learn this? Can it realize that dev A is more likely to mess up the config and dev B is more likely to mess up a LINQ query? Of course it can. Like I said before, anything a human can learn, a machine can as well.
However, how do you know that you've taught the machine the full range of possibilities? How can you ever provide it with a small (i.e. not global) dataset and know for a fact that it represents the full spectrum of bugs? Or, would you instead create specific debuggers to help specific developers/companies, rather than create a debugger that is universally usable?

Asking for a machine-learned debugger is like asking for a machine-learned Sherlock Holmes. It's not provably impossible to create one, but often the core reasoning to be a debugger/Sherlock hinges on subjective assessments that vary from subject to subject and touch on an incredibly wide variety of knowledge/possible flaws.
The lack of quickly provable correct/incorrect outcomes makes it hard to easily teach a machine and verify that it's making good progress.

score 0 · Answer 7 · answered Nov 06 '18 at 06:06

Here is one use case on using machine learning to debug microservices. I documented some efforts in analyzing microservice performance data with machine learning where I trained a decision tree from the performance data collected from load testing a microservice then studied the tree which gave me insight on an environmental issue and helped me diagnose and fix a performance bug.

score 0 · Answer 8 · answered Jun 17 '23 at 20:12

0

Years later and look what DeepMind are doing: https://www.nature.com/articles/s41586-023-06004-9.pdf?pdf=button%20sticky

answered Jun 17 '23 at 20:12

Charles

309
3
8

Machine learning applied to code development

8 Answers8