Why don't research papers that mention custom software release the source code?

Question

Is there a reason why the source code of software mentioned in research papers is not released? I understand that research papers are more about the general idea of accomplishing something than implementation details, but I don't get why they don't release the code.

For example, this paper ends with:

Results

The human line drawing system is implemented through the Qt framework in C++ using OpenGL, and runs on a 2.00 GHz Intel dual core processor workstation without any additional hardware assistance. We can interactively draw lines while the system synthesizes the new path and texture.

Do they keep the source code closed intentionally because of a monetization they intend to make with it, or because of copyright ?

In many cases I know no source code was ever made, its just the math and such — daniel gratzer, Oct 25 '12 at 09:55
For the same reason most papers lack the raw data (publishing only statistically distorted "results"). — SK-logic, Oct 25 '12 at 12:51
@Rig I think the way I asked the question is a bit ambiguous. I'm wondering why they don't release the source code of the programs.. — alecail, Oct 25 '12 at 14:56
The source code for my M.Sc. thesis is in the university archive attached to the hard-copy on a 3¼" floppy disk ;-) — vartec, Oct 25 '12 at 15:23
I edited the question. Perhaps "mention" isn't the best word. In some cases, some of the points made in the papers critically depend on the software, but the software is vaporware. As in, to properly evaluate the validity of the paper, someone has to be able to run the software. — Kaz, Oct 25 '12 at 15:43
@Kaz That's a worthwile clarification. If your paper is about the source code itself then it should be made available — JohnL, Oct 25 '12 at 15:46
@JohnL I'd say a paper can critically depend on the software without being _about_ the software. Like an interesting property of the world that is demonstrated/found using a software tool. If we cannot review the tool, how can we know the conclusion is correct? (Or rather: it is way easier to validate it if we can see the tool!) — Andres F., Oct 25 '12 at 15:58
@AndresF. Fair enough, though that won't always be the case. — JohnL, Oct 25 '12 at 16:39
Much of science is research funded by private entities, and raw materials and resources don't just appear out of thin air, money is a symbol used to trade those resource (the same resource that allowed the research to happen). So it's not a bad thing sometimes for research to have closed source code if that is needed to maintain profit. Otherwise the money (resources) for the research wouldn't have been there in the first place! — Jimbo Jonny, Oct 25 '12 at 17:54
A very similar question already exists at Academia.SE, where it might risk being closed as a duplicate: http://academia.stackexchange.com/a/2382/1033 — gerrit, Oct 25 '12 at 20:23
@JimboJonny Of course, privately funded, in-house research still qualifies as research whatever they do. And indeed, plenty of research is privately funded. However, it's not conducive to *scientific* research if the means to fully reproduce/review it are kept unavailable. Closed source and secretiveness is just another symptom of a bigger problem with current scientific research. Therefore, my opinion is that it is indeed "a bad thing". — Andres F., Oct 25 '12 at 20:47
@AndresF. - I see your point, but my point is: lets say there was a hypothetical law that all scientific research must be open-source/non-closed. How much scientific research in that scenario would have never happened because it would have meant the bankruptcy of the funder? That's why I would say that in the grand scheme of things closed source isn't always holding science back...at least the research is happening! ------ Also, what the program *does* is in the paper, the experiment is reproducable, you just gotta write your own. Some would argue that's part of reproducing the result even! — Jimbo Jonny, Oct 25 '12 at 22:39
@AndresF. - Though I should point out I'm not saying this is case in every scenario. Only pointing out that it's not black & white that open is good and closed is bad. Sometimes there needs to be a bit of grey :) — Jimbo Jonny, Oct 25 '12 at 22:44
@Andres F - One possibility is to implement your own version of the software using the ideas described in the paper. Although this is more work, it also arguably has more value - running the same implementation again only demonstrates the one implementation again. A new implementation helps demonstrate that the ideas themselves are valid, and not a fluke of some implementation detail. Significant issues that may not have been noticed or described before may be discovered during the re-implementation. — , Oct 25 '12 at 22:55
@Steve314 - Yes! That's the point I sort of touched on in the second half of that comment above. But you put it much better than I did :) — Jimbo Jonny, Oct 25 '12 at 23:07
@Steve314 True. For the purposes of this topic, I consider a sufficiently detailed specification almost equivalent to the actual code (but remember that code itself is a kind of spec!). However, an independent review is less feasible the fewer details are available. A complex enough result will never be reproduced without a little help from the original authors -- which is why I suspect many times the code isn't available it's because the actual research is on shaky ground. Which I guess is what SK-logic above me was hinting at. — Andres F., Oct 26 '12 at 00:55
@SK-logic That’s quite misleading. Withholding raw data is scientific misconduct. Withholding source code is often (still) seen as acceptable. — Konrad Rudolph, Oct 26 '12 at 13:19
@KonradRudolph, unfortunately, it is seen as acceptable in many disciplines to withhold raw data. In some cases for the stupid "ethical" reasons (as in social and medical sciences), in some other cases due to the sheer volume of such data (as in high energy physics). — SK-logic, Oct 26 '12 at 13:27
@SK-logic This data isn’t necessarily published but it will **always** be available to fellow researchers upon request (modulo active embargoes, which are time-constrained). Note that this data may of course be anonymised (for privacy reasons). That doesn’t detract from their value though. — Konrad Rudolph, Oct 26 '12 at 13:31
@KonradRudolph, of course it won't *always* be available. It may even become unavailable in a couple of years since publication (tapes erased and re-used, the whole group disbanded, paper shredded, etc.) — SK-logic, Oct 26 '12 at 13:46
@SK-logic Don’t change the topic – You know what I mean. Of course data may become unavailable later (and what a pity!) but we were talking about the acceptability of withholding data after publication. And there you claimed that the reason for lack of source code is the same as lack of raw data and this is simply wrong. — Konrad Rudolph, Oct 27 '12 at 09:07
@KonradRudolph, some essential data is still not available (even upon request) straight after publication. E.g., when the published results are produced by some kind of a fit, correlation matrix is almost never published (which renders results incomplete and sometimes useless). And raw data won't help without the software part (or precise description of the fit procedure). So it is quite a problem, and there is no way yet to resolve it. — SK-logic, Oct 27 '12 at 15:02
"For the same reason most papers lack the raw data (publishing only statistically distorted "results").": "most papers" is a very strong statement. Do you have any sources to back this? — Giorgio, Dec 19 '12 at 15:27
Research software is written to get results for the author, not to be a product for others. It you contact the authors directly they will most likely be happy to share. — Thorbjørn Ravn Andersen, Mar 26 '22 at 10:29

score 74 · Accepted Answer · edited Oct 17 '13 at 16:51

74

Several reasons come to mind.

Code is too big for article. For a short period of time, interesting projects were short enough to be published with the paper that described them. This can still happen, but many projects of sufficiently large size to be interesting have grown too big to be published with the papers that describe them.
Public hosts not free or durable. Until recently, cheap, durable, easy to access public hosts were not available.
Publishing a paper is easier than publishing a project. Some people have time to publish a paper or a project, but not both.
Incentives tied to role. Many years ago I asked a colleague about product development and patents and got the word that most people there pretty much did one or the other. As with paper writers (think academia) and open source developers, rewards are geared toward one work product or the other.
Self motivation. The desire to describe ideas or to implement code is not always present in equal parts in the same person. Many of my professors openly admitted that they either never coded very much, or were many years away from having coded fluently. Similarly, many developers barely want to write comments in their code or when they commit to source control.
Durability of project hosting and work product is also an issue. Who wants to link somewhere that might be gone a few years from now and as a result, diminish the value of the paper.
Tradition. Publishers are oriented toward reviewing and publishing papers, but might not be ready to take on the same evaluation for projects.
Also the traditional views on what is a sensible level of reproducibility varies among fields. A chemist publishing a paper about a new synthesis method is expected to write down enough detail for another chemist to perform the synthesis. She'd not be expected to ship the educts and product to the journal. Readers who want to use/reproduce the paper are expected to buy their own educts and do the synthesis themselves in their lab (though they may ask to come and visit the lab to see how it is done in practice). Neither would a biologist be expected to attach his new transgenic mice to the paper. This view on reproducibility corresponds to e.g. giving a (pseudo-code) description of the algorithm as opposed to shipping the actual implementation.
Naked code can be shocking. It takes a lot less polishing to proof-read a paper length document than to code inspect, code review, and quality assure a project. I have a lot of code I would be more comfortable telling you about than showing you. Hopefully things are moving forward to a point where we will all write beautiful code, but if your code was rushed, barely or doesn't completely work, you might be more comfortable not sharing the executables or the source.
Closed source. Not everyone has embraced open source. Many papers are written about work for DoD, commercial projects, or privately funded projects where there are benefits from exposure of the project to the public, but there are still trade secrets or first to market advantages that could be eroded by open sourcing the code or other work products.
Publish further work based on this code. If the code is not published it may give the author an advantage in publishing followup work. Other competing researchers may need to reimplement the work which may take precious time.

edited Oct 17 '13 at 16:51

cbeleites unhappy with SX

101
3

answered Oct 25 '12 at 10:48

DeveloperDon

4,958
1
26
53

+1 also, the code may not be theirs to publish, it probably belongs to the institution that pays them to research things – James Oct 25 '12 at 15:34
30

If the paper depends on the source code, then it ought not be published. If you can't publish the code, you can't publish the paper. A paper which says "our program does these wonderful things", and you cannot evaluate the paper without running that program, then the paper borders on being an advertizing brochure for some software. – Kaz Oct 25 '12 at 15:44
3

In complete agreement with Kaz -- if to peer review the research you need something that's not available (data, code, etc), it shouldn't be accepted by a peer-reviewed journal. Almost all of the arguments that DeveloperDon mentions has pointed out holds true for data release, too ... yet there's now a rather big movement in recent years towards it. – Joe Oct 25 '12 at 16:28
3

GREAT post. I'd also add that sometimes separate scientists recreating the software on their own is PART of the repeatability of the experiment. If it only works the way 1 person coded it, but not the way others code it...then the results can be called into question and errors can be identified. – Jimbo Jonny Oct 25 '12 at 17:56
As a reviewer I tend to favour papers that link to the source or at least give some reason for not publishing. Some of the mentioned reasons still hold today but it is also takes some time for the research community to catch up – robi Oct 25 '12 at 18:47
Point 8 (Naked code can be shocking) makes me think about the [CRAPL](http://matt.might.net/articles/crapl/) idea – Etienne Racine Oct 25 '12 at 18:52
One reason why it may not always be released, which I see as acceptable, is the tooling available in stock form may not permit just anyone to build it at a whim. It may require a custom compiler, licsensed 3rd party libraries, or even a custom OS or custom hardware; which without, makes the code to anyone else nothing more than pseudo-code. – JustinC Oct 25 '12 at 19:29
4

your second to the last point is the strongest – Alex Gordon Oct 25 '12 at 20:07
1

I would suggest adding: (1) **code is student-written** (raises a variety of issues, including: do you want code you hastily wrote in your college years made public without warning?) ; (2) **code isn't important, only results are important**; and (3) **research competitors should go write their own code**. I've heard these concerns, though I do not necessarily agree. – Paul Oct 25 '12 at 20:22
@Paul But code is important. Even if the research is not code-related, if code was used in any meaningful capacity to compute or validate the research, then it is a fundamental part of said research. Otherwise it's like publishing a new mathematical result and omitting the proof. Code IS the proof. If, on the contrary, code played no important role, then why mention it at all? Best to omit it altogether, or at least relegate it to a minor footnote. – Andres F. Oct 25 '12 at 20:53
@Andres F. While I agree broken code could invalidate results, in practice I suspect few professors read code produced by their assistants and even fewer write code or have a clue about what is good or bad about code or think of code as anything other than a tedious one way human to computer medium. The peer review process doesn't typically examine code and generally couldn't if they wanted to, between the problems of code literacy and time constraints. – Paul Oct 26 '12 at 02:07
An important exception is that code by government scientists is inherently open source, go to the [NASA-GISS](http://www.giss.nasa.gov/tools) website if you want to download some – Michael Shopsin Oct 26 '12 at 14:30
5

@AndresF. Code is absolutely the *least* important thing in a paper. A paper is "Here is what I did; here are my methods; here are my results". The code is a codification of the method, and it should produce the exact same results. If you want to reproduce the paper's results but use the paper's code, you haven't reproduced anything; what you're *supposed* to do is read their methods section, come up with your own implementation, and then write a paper about it when you can't reproduce their results. – Tacroy Oct 26 '12 at 17:24
@Tacroy I classify code as part of the "here are my methods" section. The purpose of having source code available is to _review_ said methods, not to blindly run it again (which except for the case of blatant fraud, will surely produce the same results). Code IS specification, i.e. the actual method. It is NOT an afterthought; otherwise you could skip it entirely without harm to your research. Note that having a detailed step-by-step specification in pseudocode is almost the same as making the actual code available, but it seems less effort to just make the source available. – Andres F. Oct 26 '12 at 18:15
@Tacroy I also do not expect the actual program text to be included in the paper; just a link to it. But if it IS used in your research in any meaningful way, it must be available for review. – Andres F. Oct 26 '12 at 18:18
@AndresF. At that point you should be asking for researchers in other fields to supply videos taken by lab security cameras. The thing is, you *don't want to know* their method in the detail that the source code provides, because when you go to replicate their findings you need to be coming at it from more of a clean slate perspective. – Tacroy Oct 26 '12 at 18:19
@Tacroy I disagree; I think it is a fundamental misunderstanding of the role of code in scientific research. Lack of code availability is like lack of availability of a mathematical proof: sure, it's awesome if you can derive your own proof. But you _must_ be able to review the existing proof as well. Anyway, I suggest you read the [Science Code Manifesto](http://sciencecodemanifesto.org/) :) – Andres F. Oct 26 '12 at 18:27
I like the comments about student code perhaps not being the best representation to start a career, but there are several counter examples. If we look to other fields, student projects can be well appreciated (for example symphony #1 "The Classical" by Prokofiev might be his most popular among the general public). – DeveloperDon Oct 27 '12 at 01:24
Another issue for students is that of plagiarism. If you post an original project from a class, if someone copies it there may a mess as people question whose work is it? I hope most cases like this are proven in favor of the non-cheater, but expect sometimes even blameless people are punished or at best, experience serious stress. – DeveloperDon Oct 27 '12 at 01:27
@Andreas F.: 'I classify code as part of the "here are my methods" section.': Your classification is wrong: a paper should describe an algorithm in such way that anyone can implement it by reading the paper only (without looking at its reference implementation). – Giorgio Dec 19 '12 at 15:35
1

If you have the source code and re-run it and get the same results, there is no guarantee at all that the results are correct. However, it tells you that the author hasn’t been blatantly faking their results which would be the pure minimum. – gnasher729 Mar 27 '22 at 00:17

score 44 · Answer 2 · answered Oct 25 '12 at 15:31

44

Read Randall LeVeque's presentation on "Top 10 Reasons to Not Share Your Code (and why you should anyway)" http://faculty.washington.edu/rjl/talks/LeVeque_CSE2011.pdf

He argues compellingly that code is analogous to proofs in Mathematics, and invites us to consider a world where proofs aren't published, because they are too long, or too ugly, or don't work in the edge cases, or might be worth money, or someone might steal it...

Basically, if you are doing science, then you should publish your code. Otherwise, you are doing alchemy and you can fly right back to the dark ages and die of plague as far as I'm concerned.

answered Oct 25 '12 at 15:31

Spacedman

341
2
2

8

+1 Great presentation. I'm glad there are people pushing for change :) – Andres F. Oct 25 '12 at 16:03
+1 Thanks for the link; that may help me in my ongoing negotiations with my boss about releasing some of our code as Open Source. – Frank Oct 25 '12 at 19:07
Word up! Good analogy deserves a vote from me. – nullpotent Oct 26 '12 at 14:31
I'm not sure this comparison is really good for the purpose here. I'm chemist (who'd rather see much more code published), no mathematician, but the proofs I've seen usually are not giving every little step. So IMHO they correspond rather to a condensed pseudo-code description of the algorithm than the actual source code. – cbeleites unhappy with SX Oct 17 '13 at 16:56

mgoeminne · Answer 3 · 2012-10-25T12:54:25.753

27

Generally, the programs used to produce the papers results are only tools, and only the results matter. So they are not placed on the paper which presents the context, the methodology, the results and a discussion about them.

But results must be reproducible. And then, when the data sources on which the paper is based are publicly available, the programs transforming them into results are generally required too. They are often placed "somewhere" on the Web if it doesn't raise any patent/copyright issue. Or, at least, the authors must send you the programs if you ask them.

edited Oct 25 '12 at 12:54

answered Oct 25 '12 at 10:35

mgoeminne

1,158
6
11

2

I don't you think you have to send anyone who asks you your precious code... IMHO This answer is wrong. But I would like to see a research world where information is free... – Dirk Oct 25 '12 at 14:02
3

@Dirk As far as I know, this is relatively common in software empirical studies. In the last (not yet accepted) submission of my team in this domain, one of the reviewers explicitly asked for a public access to our data as well as some pieces of code. I don't understand why the code should be so precious. It's (generally) just the realisation of the ideas described in the paper. Publishing the programs is a way to let the reader check if we correctly translate our ideas to actions. – mgoeminne Oct 25 '12 at 14:08
1

Hmm, so you know (a) who your reviewer is and (b) give your code and data to someone who might be in direct competition with you? – Dirk Oct 25 '12 at 14:15
2

Not really because (a) The reviewers ask to publish the code in a place they can access anonymously (or the authentication is done by the journal) (b) Since your paper is published, the other researchers can franckly use the same methodology/tools to replicate your study on an other data set or even on the same data set. The replications are less prestigious than the original paper, they will cite your work, and they offer a strong validation to your paper. So the original authors are glad to let the others do all this work for them. – mgoeminne Oct 25 '12 at 14:28
@Paul I don't see the connection with the source code publication. Anyway, good editors pay attention to the notes the reviewers write to justify their decisions. Therefore mentions like "It's bullshit" are not taken into account. If the editor estimates the reviewers' recommandations are not relevant enough, he asks for the opinion of an other expert. The grad students don't take part in the review process. And if you cannot let your paper accepted after some years of submission, you should envisage this paper (or its content) is not so good. – mgoeminne Nov 02 '12 at 08:35
The results don’t matter if (a) they are created by a tool with blatant faults, or (b) the author is lying and the results haven’t been created by that tool at all. – gnasher729 Mar 27 '22 at 00:19

gerrit · Answer 4 · 2022-11-23T13:35:24.690

It is not closed source. The software simply hasn't been published at all.

Short answer:

There are several reasons not to publish the software, but it's uncommon to publish the software in a closed-source manner.

Long answer:

Closed source means that the software has been published and the source-code has not. But the common case is that neither the software nor the source-code has been published.

In my experience (I work in atmospheric science), authors are very happy if you contact them and ask if you can get their software (including source-code, of course) for doing research. If I'm going to write a paper with a project based on theirs, they will at least get a citation out of it (good!), but probably get a co-authored paper out of it (because of course, they didn't document their software so that someone can use it without their help). A relatively cheap co-author paper, so that's even better.

The real question is:

Why don't they publish the software?

There are several reasons for this:

Published software needs documentation. Usually, people don't like to write documentation.
Published software may attract users. Users may have questions. This takes time (but see above).
Published software may require non-trivial maintenance.
Publishing software requires hosting.
People may feel embarrassed about the poor quality of their source code.

The list could be made longer. It deserves to be a separate question, over at Academia.SE, not here.

(Note that in my group, we do publish our software — licensed under GPL)

It may also be possible to publish the code, but under a license that does not allow modification. — asmeurer, Oct 26 '12 at 01:00
I had not even thought of this situation where the authors would publish only a compiled version just to prove that the software actually exists, because it doesn't help understansing how they did - by how I means implemetation details.. I love to read source code! — alecail, Oct 26 '12 at 09:13

score 8 · Answer 5 · answered Oct 25 '12 at 09:52

8

That might sound cynical, but in my experience research papers are not written to be easy to understand or simple to reproduce. Instead, in the research community it is more important to have an article that sounds and looks very scientific. For that reason most authors transform their code into mathematical formulas and try to prove that their algorithm is mathematically correct. Usually the number of pages for such an article is limited so there is no space left to publish the code. Yet, of course this would not limit any author to link to the complete code with an URL...

One could assume that if code is not published, either the authors want to mometarize their findings, or (what I personally think is the case more often) they are afraid that people would see that their research is not as awesome as they claim. Often results only apply to a very limited number of cases.

Also, I have seen that from one simple program/algortihm several research papers are spin off. If code would be published, it would be difficult to write any further papers on the same topic. So knowledge is held back in order to publish it over time in little slices.

Always keep in mind that at universities, it is not so much the results or the applicability of research that is important, but the number of papers you publish. It's sad, but true.

answered Oct 25 '12 at 09:52

codingFriend1

247
1
5

13

This being said, try asking the researchers! Sometimes they will provide you with source code. – Lucina Oct 25 '12 at 09:55
3

I don't think you are being fully fair here: "Instead, in the research community it is more important to have an article that sounds and looks very scientific.". This implies that there is no value to the underlying content, almost because you can't understand it because it looks scientific. The number of papers you publish is almost irrelevant if no one is much interested in the content. This response, in my view, speaks of your prejudices rather than reality. – temptar Oct 25 '12 at 11:03
2

@temptar Well maybe I am a bit negatively biased. What actually strikes me the most is that most researchers obviously are are not willing to describe their research in a way that is easy to understand. Once I had a professor who, after explaining an algorithm to me, added: "But in the paper we will write this more complicated to make it sound more scientific". – codingFriend1 Oct 25 '12 at 11:50
2

"research papers are not written to be easy to understand or simple to reproduce. Instead, in the research community it is more important to have an article that sounds and looks very scientific" this is untrue. There's a lot of jargon that's specific to academics - just like there's a lot of jargon that's specific to programmers. This jargon exists to permit brevity based on shared knowledge, not to make it hard to reproduce results. – Oct 25 '12 at 11:53
6

@codingFriend1 - you cannot and should not generalise on the basis of one single experience. That is a deeply unscientific approach. You have to consider who the target audience for a research specialist is and in many, many cases, it is not people who need the kind of explanation you consider necessary. This is what we have scientific communications for - to bridge to the non-specialists. – temptar Oct 25 '12 at 12:13
3

I support codingFriend1's answer. This has been a common criticism aimed at the scientific community where I live, and specifically at my University (which is nonetheless the best one in the country): that scientists are pushed to publish papers, the more exotic the better. "Publish or perish". Scientists from areas I'm less familiar with also report this. Sorry, but in many places it's the sad and widespread truth. – Andres F. Oct 25 '12 at 13:10
1

Supporting evidence #2: an acquaintance of mine worked in a research laboratory where his boss repeatedly republished her "star" paper even though it was repeatedly pointed by researchers in the team that the empirical data didn't support the conclusions. She didn't care; the paper was confusing enough that not enough people outside the team noticed. Again: publish or perish. – Andres F. Oct 25 '12 at 13:12
1

@temptar Naturally a scientific paper need not be understandable by everyone. Yet, I am convinced that when you publish a paper, your results must be easily reproducible. If you conceal your algorithms (or any other applied methods) you undermine your own credibility. Also - and this is my personal belief - any scientific paper should be written so that every (graduate) student of the corresponding subject is able to understand what you did and why you did it. It is not beneficial if only a very small group of super-specialists can read and reproduce your paper. – codingFriend1 Oct 25 '12 at 13:41
Research is about publications, yes. Source code, even though it's not trivial and *is* the proof that ideas are valid, is not sufficient to communicate the **scientific contribution** of a paper. Publish or perish is true, but it's not the reason there's no source code usually. Scientists have a budget to publish, and extra pages cost money. Scientists are judged on the communication of the science (math, design, algorithm, innovation) not the implementation. Sometimes the source code is short enough that you [can publish it](http://www.springerlink.com/content/9wv21akg33j9jwj6/fulltext.pdf). – Fuhrmanator Oct 25 '12 at 15:10
1

@Fuhrmanator The source code needen't be printed in the paper itself. It just needs to be made available for review. – Andres F. Oct 25 '12 at 15:11
@AndresF. It often is available, at least in the field where I work. How many open-source projects have design documentation to explain the logic or math behind the decisions made? If you're lucky, [someone provided it](http://www.aosabook.org/en/index.html), but it's rare. They are two different worlds. – Fuhrmanator Oct 25 '12 at 15:17
@codingFriend1 You are being extremely unfair and bias about academia. Here is a hint, if you can't understand an academic paper it is not targeted to you. – Joe Tyman Oct 25 '12 at 15:44
1

@JoeTyman It's fair to say we aren't talking about the case where you aren't the target audience, but about the case where you _are_, and still the paper is impenetrable and there is not enough info to validate and reproduce the results. – Andres F. Oct 25 '12 at 16:02

score 7 · Answer 6 · answered Oct 25 '12 at 14:41

7

Aside from the intent to monetize, I do not see a good reason for leaving the source code out of research papers. There is a small movement starting that proposes supplying the source code as a rule to publishing any research that depends on software in some way, shape, or form. You can read more about it, it's called the Science Code Manifesto.

answered Oct 25 '12 at 14:41

hulkmeister

797
1
8
18

1

+1 for the link! It fully embodies my belief in what science and research should be. – Andres F. Oct 25 '12 at 15:14

score 7 · Answer 7 · answered Oct 30 '12 at 18:23

The above answers miss a few practical reasons which frequently arise in Computer Graphics (the area in which the paper mentioned by the author was published). Code Release varies greatly between fields in CS - for example in Machine Learning, code is usually published. In Human Computer Interaction, code is almost never published.

I have released quite a bit of code in Computer Graphics, and while I do think authors should release their code, there are many simple, non-conspiracy-theory reasons why they don't. For example

1) Most Computer Graphics research projects involve collaboration between multiple researchers, often at different institutions, each providing some piece of the puzzle (ie algorithms, libraries, etc). To release working code, all researchers have to agree. This is rarely a simple discussion and usually it is easier to avoid the issue.

2) Often the code for a single paper is embedded in a larger codebase being developed within a lab. That codebase will contain other unpublished work. Separating out the code for a single project is a lot of work, often with no immediate benefit to the people who have to do this work (see incentive below).

3) Universities often have IP rights to the code. Hence, it is necessary to contact an "innovations office" who will make your life endlessly difficult, wanting you to document the "invention" so they can patent it, etc, before you open-source it. In some cases the university can even deny the permission to release source (this varies between institutions, and is greatly complicated by (1) )

4) Lots of Computer Graphics research is done by Corporations. In that case the authors do not own the code either, and have to get permission from Lawyers to release the code. Lawyers have little to no incentive to say yes.

5) There is no incentive to publish code. Most Computer Graphics research code is never used by anyone else. Even if it is, for general-purpose code you usually just get an acknowledgement (worthless in terms of your CV). If you are lucky you will get a citation. Hiring committees and Grant agencies generally don't care one bit if you released your code. So, time spent prepping code for release is time wasted that could have been spent on another paper. (There are people actively trying to change this in Computer Graphics).

6) There are incentives to not publish code. Code can sometimes turn into startup companies, be licensed to existing companies, etc. This funds future research. We all gotta eat.

#2 is very significant. Not only can it be a huge amount of work to separate the code that is relevant to a paper, but once you do, you may find that out of context (that is, away from the 100 other tools, libraries, and custom setups of the lab), it's essentially worthless and impossible to understand or use. Additionally, "research code" is often very brittle, engineered just enough to prove the point of one paper, not to make a robust software system, and the researcher doesn't have time or inclination to fix it up enough to be anything other than a severe headache to anyone else. — Larry Gritz, Oct 30 '12 at 19:52

score 5 · Answer 8 · answered Oct 25 '12 at 09:55

It depends. A person writing a paper, or their supervisor, decides what should be done with the source code. Sometimes, people make the project an open source.

Sometimes, projects are usually funded by companies, meaning it's their property. In those cases, paper's author is not allowed to show the code.

score 3 · Answer 9 · answered Oct 25 '12 at 10:36

It's usually a matter of page limitations. If the algorithm is exceedingly short, it oftentimes is represented, at least as pseudocode, in the paper. On the other hand, if the printed version of the underlying code is even a handful of pages long, printing the code would leave no room for the meat of the article. A journal article that is ten pages long is a long article.

Not making the source available creates a potential for fraud. Because of this potential, many journals now require that authors submit their source code as supplemental information (which is obtainable from the journal if you have access; a a hefty subscription fee may be involved). Some others journal requires the authors to release their source code to anyone who asks for it. Yet other journals are still in the dark ages; the source code isn't required for submission and the authors aren't required to release it.

The easiest thing to do is to ask the authors if they can supply the source code to you. The authors' email addresses are typically listed in most journal papers nowadays.

I think source code being available for review doesn't require that its full text be included in the actual paper :) Not only because of the potential for fraud, but I think it's actually useful for reviewers to be able to doublecheck you didn't commit a genuine mistake. _Especially_ if the coders were scientists and not programmers! — Andres F., Oct 25 '12 at 15:17

score 3 · Answer 10 · answered Oct 25 '12 at 13:09

3

My experience as a scientist (5 papers published) is that often times it is not required by the journal to release the code which was used to create the results. That is not saying that journals would not accept the scripts. Many journals allow online supplementary material. Some journals geared towards algorithms and such (e.g. Computers and Geosciences) require you to add the source of an algorithm, but this is more an exception than a rule.

In addition to the culture at the journals, for scientists code is just a means to an end. Many are not professional software developers. Because many regard the code as just a tool to express science, they do not feel the urgency to also publish the code. In addition, polishing your code to the point where it could be published takes a lot of work. A scientist is paid to do science, not write software.

answered Oct 25 '12 at 13:09

Paul Hiemstra

2,155
17
14

But software is, in a way, the proof. That's what computer science is all about: programs are proofs. I think this is either a case of not enough confidence in the results, or a cultural misunderstanding about the importance of actually producing a working proof of your research. – Andres F. Oct 25 '12 at 13:16
1

I was not talking about computer science perse, but more science in general. In theoretical CS many people work on algorithms and proofs in a math sence. Software is just an implementation, an afterthought. – Paul Hiemstra Oct 25 '12 at 13:49
If your code is a footnote in the paper, I agree. If it is some sort of verification and has its own section, however small, then it IS part of the proof or at least validation. If you won't publish the code, then it's clearly not relevant and you might as well remove every mention from your paper! – Andres F. Oct 25 '12 at 14:12
Polishing your code to the point where it is correct would be helpful. – gnasher729 Mar 27 '22 at 00:23

score 2 · Answer 11 · edited Apr 12 '17 at 07:31

More often than not, the actual programme is just a tool to get to the end, rather than the product in its own right. Giving full details of the source code would be akin to providing a full drawing of the pen used to sign the report, and/or schematics of the PC.

Having said that, especially where peer reviewing is being invited, the source code will be available - although under some form of Non Disclosure Agreement (NDA) - as there is inherently Intellectual Property embodied within the program.

If you are genuinely interested in the code, I suggest @Buttons' comment is the best advice: Ask them :)

score 1 · Answer 12 · answered Oct 25 '12 at 11:10

A lot depends on the purpose for which the code was written. If it was to demonstrate a point, it may well be that it is not optimised, and therefore not ideal that it get released. If the underlying concepts and methodology are valid, then it should be possible to recreate the outcome of the code from scratch. There may be issues of copyright and ownership as well.

In principle, it is not technically impossible to release the code but the reasons for which it might not be released are varied. There probably isn't a simple answer to this question for that reason. In specific cases maybe you could ask the researchers concerned.

score 1 · Answer 13 · edited Mar 26 '22 at 10:05

The paper you cited is already 28 pages, and most of the content is about the design decisions that are related to solving the problem (stated in the title).

The code is the final step to validate the design. It is not trivial, but it is not the part that adds value in the results of the paper, especially if you were to consider the space it would take up.

Not every case is the same. Some papers do give source code, or at least pseudo code. Some editors don't allow it. Some allow it, but because of space, the authors don't include it. One journal where I published source code formatted it as "figures" and the electronic version has it as image data, even though I submitted it as text.

score 1 · Answer 14 · answered Oct 26 '12 at 00:12

Incentives matter and the incentives of researchers are generally to ensure that they can produce a steady stream of papers the incrementally build on each other. Graduate students generally need 3-5 published papers that they can turn into individual chapters of their thesis in order to graduate. Junior faculty need to generate as many publications as they can before their tenure review. For that reason, most academic papers are really paper n in a series. For example, the paper you reference builds on a paper the same group published a year before and discusses the ground the next paper is likely to cover.

Publishing the source code potentially allows another researcher in a different group to produce paper n+1 before the original author does or at least to produce a paper that covers a significant fraction of the ground that the author was expecting to cover as part of this research stream. If that happens, the graduate student could easily find him or herself spending another 6-12 months in grad school in order to produce enough research output to graduate. The faculty member may end up with one fewer published paper when tenure review time comes around. Both of these are obviously large blows to the careers of the researcher. Add in the fact that academic applications are often part of the research efforts of multiple people within a research group (either directly or because they share certain components) and there is pressure within the research group not to release code that might end up hurting someone that you work with every day.

You often get similar sorts of discussions in fields where gathering raw data is time consuming and highly distributed. In astronomy, for example, a research group may spend years gathering data before they have enough information to publish one paper. But they'll then use that data to produce a series of papers. Research groups are very reluctant to share more of their data sets than absolutely necessary because it becomes too easy for other groups to free-ride on the time that was invested gathering the data in order to reap the rewards of actually analyzing the data.

Eventually, a lot of this code will get released just like the astronomical data eventually gets released. That often comes when the author reaches the end of that series of papers or when most of the research groups that are working on similar topics have similar engines so releasing the code no longer gives a new researcher a competitive advantage.

It would be ideal for science if the data and code was released more quickly. But that would often harm the scientific researcher and that is whose incentives matter in this case.

"Publishing the source code potentially allows another researcher in a different group to produce paper n+1 before the original author does or at least to produce a paper that covers a significant fraction of the ground that the author was expecting to cover as part of this research stream." This does not sound so easy to me. Most people (including myself) would have difficulty understanding the kind of code researchers write, without help from the authors, let alone extending it. Do you know cases where this has actually happened? — Faheem Mitha, Jan 25 '13 at 08:29

BlueRaja - Danny Pflughoeft · Answer 15 · 2012-10-26T15:23:48.757

As someone who has done this (on the student side) several times in the past: oftentimes the professors writing the paper never even see the source-code themselves. They'll have their grad students write the code, and then only ask for the final executable (or even just a confirmation of the result) when it's complete.

Also, often the code written is not very readable anyways, because the students just hacked it together to get it done, and because (though they're very bright) grad students with no real-world experience tend not to be the world's best coders...

score 1 · Answer 16 · answered Oct 18 '13 at 16:43

Most of the reasons I can think of have already been raised here, but I thought I would add two more that actually happened to me:

The journal has no idea what to do.

For one of the papers I was working on, I decided that I was absolutely, without question going to include the source code (the whole point of the paper was data visualization) and example data to go along with it. So along with the submission I attached Electronic Supplements 1 and 2 - an R script with my code, and a CSV file with the data needed for said R script.

The journal, as it turns out, can only take electronic supplements if they've been shoehorned into Word files. After trying for the better part of a day to get the R script in that form, I gave up and decided not to include the code as a supplement. I could have hosted it at my University, but as a graduate student I knew that I was going to lose my account there in ~1 year - open source isn't of any use if its immediately overtaken with linkrot.

I ended up hosting it on GitHub and putting a reference to that in the paper, but that was because I really wanted the code to go in. I can see, especially since most people in my field don't use something like GitHub, just deciding that the effort wasn't going to be worth the handful of people who would download it, and who could email me anyway if they really want to.

The journal just isn't interested

I inserted some small details about the code itself into a paper on request from a reviewer, but its a clinical journal (read: no one codes), it doesn't allow electronic supplements, and again, adding the source code would likely have been more trouble than it was worth.

Ironically, if anyone did go looking for the code, it is (or soon will be) open source, but I was already running on the edge of 'This is growing distractingly technical' and I decided that the brief, 'make the reviewer happy' mention was all I was going to do.

score 0 · Answer 17 · answered Oct 25 '12 at 15:55

0

Many times the implementation (i.e. the software doesn't matter) but increasingly the implementation DOES affect the results.

Anytime the implementation matters... the source code should definitely be made available! The more that the results depend on the implementation or computational methods the more important it becomes to post the source code.

answered Oct 25 '12 at 15:55

Trevor Boyd Smith

234
1
8

Regarding who/where will store the source code. Ideally the journal that the article is published in will store the entirety of the source code. However many of the most important journals do not store both the article and the source code. IMO if the journal does not have the ability to store the entire source code, the author is responsible for finding a web addressable storage place for the source code. – Trevor Boyd Smith Oct 25 '12 at 15:58

score 0 · Answer 18 · answered Oct 17 '13 at 17:13

I'd like to add a few points on the type of code I deal with as a chemometrician (chemist doing data analysis):

People who write data analysis code (like I do) are comparatively few compared to the people who use that code. "Custom code written in house" does not mean that the authors wrote it - could be colleagues' code so the authors cannot publish it.
A separate publication of the code may be planned, and the code's author (or the supervisor) may be concerned that the novelty is lost if the code has been (partially) made public before.
Even if the journal where code publication is intended for doesn't object to the code having been available publicly before, the pure concern of the supervisor (or someone in the IP office) can be enough to stop the publication of the code.
Data analysis code is often tailored to the data. It doesn't make too much sense without the data. (You may argue that the data should be published anyways, but that is a different question and off topic here.)
In any way, at my institute, we archive raw data and data analysis code together with the paper. Default policy is not (yet?) to make them publicly available, but they would certainly be available on request.
(The traditional view on what is reproducibility in chemistry corresponds rather to a description (possibly pseudo-code) of the algorithm than to shipping the actual source code)
Many of my colleagues use interactive tools for their data analysis which do not log the steps of the data analysis. So there is no source code that could be published. The data analyis corresponds less to a programming than to a lab approach: you do things and write down what you do and observe in your lab book.

This answer is from a data analysis point of view, so rather a particular niche. However, this question is linked from academia.SX, so non-computer-scientists may come along reading this. — cbeleites unhappy with SX, Oct 17 '13 at 17:14

Why don't research papers that mention custom software release the source code?

18 Answers18