15

What is it about Perl that makes it so useful in Bioinformatics ? Why isn't C++ or Matlab or Python the big language?

Caffeinated
  • 656
  • 2
  • 11
  • 33

6 Answers6

24

Aside from the inherent virtues of Perl, part of this is simply history. There was a major expansion of bioinformatics at the turn of the century because of the Human Genome Project. At the time Perl was by far the most popular scripting language in general use. Ruby and Python were certainly around, but didn't have nearly the support/mind share they do today. This gave Perl a lot of momentum in the field.

I think the use of Perl in bioinformatics is declining, and R is rapidly increasing in popularity. But for any language you care to name, you can probably find a bioinformatics lab using it.

Charles E. Grant
  • 16,612
  • 1
  • 46
  • 73
  • 2
    Agreed. I remember an article, I believe in Dr Dobbs or something like that back in the mid-90s with the title "How Perl saved the Human Genome Project" or something quite close to that. I've worked in the bioinformatics space for about 10 years now, and have yet to encounter someone actually using Perl though. It's been mostly R with a lesser amount of Matlab & Python. – geoffjentry Jul 14 '11 at 17:11
  • 6
    Its not like R is going to replace Perl. R is used for data analysis/visualization for exploratory data. General scripting tasks will still be done with perl. – wespiserA Jul 14 '11 at 17:11
  • +1: Also, it seems that Japanese still use a lot more Perl than Python or Ruby (from a Japanese recruiter I chat with), so maybe it have a huge impact on which technologies were involve with Japanese research, like in bioinformatic, where they are leaders with the US? – Klaim Mar 04 '12 at 19:39
  • @geoffjentry: I've seen quite a lot of processing done in Perl, though I don't have 10 years in the field (2, so far, plus a few short experiences a few years ago). R, however, is king when mathematical computations are expected (statistical analysis on large datasets comes to mind) and to generate good visualizations (in fact a lot of solutions prefer to simply integrate with the R Engine rather than to roll out their own). – haylem Mar 05 '12 at 00:09
  • In the 8ish months since I wrote that I have to recant my statement, I've now seen perl used :). In those cases though it still seems of a historical nature - either old code or someone who cut their teeth using perl and just stuck with it. Most of the computational biologists at my new job use some mix (depending on the person) of python & r, with some matlab thrown in and the aforementioned perl. – geoffjentry Mar 07 '12 at 20:32
13

What makes Perl so useful for bioinformatics is that 1) its a relatively easy language to learn, 2) there are lots of pre-existing scripts to use, including bioPerl and 3)chances are the lab you work in has hundreds of scripts and modules, already written in Perl.

The level of the programmer less to do with the choice of language, then the tasks being asked of him. Any advanced or computationally expensive jobs are usually written in Java, or C, and run on a cluster.

One thing to understand about bioinformatics, is that it is a diverse field, with diverse tasks being asked upon those who practice it. Its not uncommon for me to use Perl, R, and Java in one day. Perl for scripting stuff, moving files, downloading things, some basic data analysis etc, R for data visualization, and Java for algorithmic computation/working with and modifying applications. That being said, most of the tasks I do require the use of Perl, however, I would like to switch to Ruby, as it has more advanced functions, lambdas & procs, that can lead to more succinct code and is fully object oriented.

wespiserA
  • 317
  • 1
  • 7
  • 1
    your welcome. If you have any more questions, or are thinking about getting into the field, here's another response that might help you out. – wespiserA Jul 15 '11 at 02:33
  • http://stackoverflow.com/questions/3359675/bioinformatics-and-computer-science/3474716#3474716 – wespiserA Jul 15 '11 at 02:33
10

I am going to add an answer here as I think a lot of them have missed a key point...

Perl is popular in bio-informatics because it is originally a text-processing language.

Text is King

Perl makes it easy to:

  • implement NLP and bio-informatics algorithms,
  • extract textual data,
  • generate textual data.

The Language Isn't (Half) Bad

It also has the benefits of having:

  • a decent expressiveness,
  • a relatively low learning curve (until your discover all its tricks and hacks),
  • but also a decent performance.

While it won't allow to create processing programs running as fast as a C equivalent, the development time is a lot inferior and it comes with batteries included when it comes to text processing (powerful regular expressions, anyone?), thus making it easy to pick up and use in a lab context to solve these tasks.

Portability and Extensibility Made Easy

Also, it obviously also:

  • is portable across many platforms,
  • comes with a very large library of extensions.

But the reason why there are so many bioinformatics (and scientific in general) extensions and modules for Perl in the first place is because of the reasons given above. In a great many cases, the language's design and abilities make it an almost perfect fit (despite many possible grudges one can hold against it) for the job.


All this makes Perl a good contender for scientific research, especially in fields where the data to process is mostly in text format.

Of course, other languages have emerged and claim a market share for different reason (enhanced expressiveness, better readability, explicitly avoid obscure hacks and guru-ish one-liners...), but they still compete with Perl on certain aspects (Ruby is as fast to learn as it is slow to process data, for instance). So, in the domain of bioinformatics (or NLP) where you deal with text formats, quick research cycles and more and more big data that keeps getting bigger (thank you, genomics and NGS), Perl is still very relevant.


Actually, just noticed maple_shaft, Charles and geoffjentry's comments, which mentioned the importance of regular expressions as well, so not everybody overlooked this. :)

haylem
  • 28,856
  • 10
  • 103
  • 119
6

One of the big reasons behind Perl's popularity in bioinformatics is BioPerl, a comprehensive set of modules for working with relevant data.

It looks like most of the modules are actually designed to work with data generated by other programs. Perl makes for excellent reporting duct tape, after all.

Charles
  • 1,411
  • 12
  • 13
5

Tools are selected by the skill level of operators and ease of adoption - it takes a while for a compiled program or IDE to overtake a simple interpreted language.

Perl has some serious chops, serious documentation, serious libraries and wide free availability. What's not to like about any of that?

bmike
  • 182
  • 2
  • 10
2

Perl has all the same abilities, data constructs and methods of other languages, and its easier to learn then most. This is good for researchers and scientists not very experienced with programming, as they can easily pick up Perl and get their desired task(s) accomplished

Additionally:

Lot of online support and free scripts are available which is clearly advantageous! =)

In sum, most scientists and researchers just want to get the job done, and done as quickly as possible and Perl is the perfect fit for that

rrazd
  • 1,398
  • 2
  • 12
  • 23