5

In our company we have a handful of R users who have collectively written some ~30 .R scripts over the last year. The scripts are mostly 100 lines or less, defining useful and reusable functions.

Currently everyone's .Rprofile contains some code that sources all of the files in a shared 'Common R Scripts' directory on startup.

How would we benefit from writing R packages to replace these scripts:

  • Now?
  • In a year's time, when we have ~60 scripts?
Doc Brown
  • 199,015
  • 33
  • 367
  • 565
logworthy
  • 175
  • 3

4 Answers4

7

The introduction to the R Packages book explains the benefits pretty well:

In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others. As of January 2015, there were over 6,000 packages available on the Comprehensive R Archive Network, or CRAN, the public clearing house for R packages. This huge variety of packages is one of the reasons that R is so successful: the chances are that someone has already solved a problem that you’re working on, and you can benefit from their work by downloading their package.

Why write a package? One compelling reason is that you have code that you want to share with others. Bundling your code into a package makes it easy for other people to use it, because like you, they already know how to use packages. If your code is in a package, any R user can easily download it, install it and learn how to use it.

But packages are useful even if you never share your code. As Hilary Parker says in her introduction to packages: “Seriously, it doesn’t have to be about sharing your code (although that is an added benefit!). It is about saving yourself time.” Organising code in a package makes your life easier because packages come with conventions. For example, you put R code in R/, you put tests in tests/ and you put data in data/. These conventions are helpful because:

  • They save you time — you don’t need to think about the best way to organise a project, you can just follow a template.

  • Standardised conventions lead to standardised tools — if you buy into R’s package conventions, you get many tools for free.

It’s even possible to use packages to structure your data analyses, as Robert M Flight discusses in a series of blog posts.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
4

I think your question is not really specific to R, the same issue often arises when a group of teammates has some code to share amongst them, written in whatever programming language they use. With a growing amount of code, they reach the point where they have to consider if they just keep sharing it by loosely throwing it together into a common folder, or if they are going to use a more rigid standard packaging or library mechanism of the language (at least, for parts of the code base).

The answer to this question is: "it depends". Using standard packaging mechanisms has several benefits concerning

  • they provide a standard for versioning and dependency management

  • they provide standard for documentation and API description

  • you shift dependencies from the "per function" level to the "package level", which reduces the number of dependencies heavily and makes them more manageable

  • the mechanism may provide other standards like how to structure the code, the tests, the documentation, etc.

Ideally, this should make it easier for the team to reuse the code.

On the other hand, you never get this for free. When you start building packages, you need to introduce a maintainer for each package, someone who is collecting the source code for the package (and if necessary, makes some editorial changes), who decides what goes in there or what not, who assigns a version number to the package, and who knows the technical side of the package mechanism in-depth. The package code will probably need to fulfill a higher formal level of quality than unpackaged code (for example, additional docs and tests).

So, if you want to know if your team already reached the point where the benefits outweigh the extra effort, you cannot simply decide this by comparing "30" vs. "60" scripts. It depends on factors how many people are involved in your team in writing and providing scripts, how many reuse them, how often do changes occur, do people in your team have problems finding existing code for reuse, problems to understand how to reuse a specific function, problems on resolving dependencies, and so on?

So if your team does not have any problems with the current approach, don't do anything for now. If, however, you see at least some of the problems, but are unsure if packaging will solve them, I suggest to simply try it out. Put some of the current code, most heavily reused code into a package, publish it in your team and see if the benefits are worth the overhead for your team.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
0

It depends on the scripts. If they work as small applications (One *.R file and a few *.txt or *.csv files to work on, all in a single directory) it is not obvious how to transform them into the format of a package. Packages are collections of programming tools that add functionality to user-created scripts, whereas scripts are tools for performing tasks without any programm writing involved (copying and editing files may be part of the workflow). So, if the audience for your script collection is not all R-programmers you certainly have to stay with the script form. Then this is not a matter of growing number of scripts.

0

To me, this is a no brainer: turn your R scripts as an R package as soon as possible.

Why?

  • it will give a strong incentive for the developers to write a documentation and examples
  • the code can be thoroughly tested using R CMD check or other tools (e.g. goopractice: https://github.com/MangoTheCat/goodpractice), which cannot be done otherwise
  • you can include tests using testthat which will make sure that when new functions are been added, no old function will brake
  • the organisation makes it easier to use collaborative tools efficiently such as Git

Why now rather than in a year?

  • it requires little work to turn a few functions into a package
  • it requires little work to add new functions to an existing package
  • it requires big work to turn many functions into a package

How to?

Just have a look at the excellent (free) book R package by H. Wickham as suggested by Robert Harvey in this thread. It has become very easy to create packages.