8

My group has been developing an R package to simulate plant growth (see GitHub repository). The R package uses .Call to interface with C.

We have decided that it would be worthwhile to create a standalone C library. The two key reasons are 1) to use familiar C debugging tools and 2) a large portion of the developer / user community is familiar with compiled languages (mostly models in the class are written in C or Fortran). However, the R package is accessible to many outside this community, so we want to maintain its functionality.

I have reviewed some related questions, e.g. https://stackoverflow.com/q/12328156/199217, that discuss R packages with C library dependencies, but have not found one that deals specifically with decoupling an existing R package.

A proposed approach

(what we have come up with so far ... a strawman)

  1. Write tests for existing functionality
  2. keep the C library inside the src/ folder
  3. Place R-specific C code (e.g. SEXP, loading R libraries, etc) into 'R wrapper' files prepended with R_*
  4. create separate functions for reading configuration files in C
  5. create a 'main' C function to replace functionality in R
  6. write a makefile for the C library that ignores R wrapper files
  7. Once the C library works independently and equivalently to the R package, we could consider moving the C functions to a separate repository, that would be a dependency for the R package

Questions:

  1. Is this effort misguided?
  2. Are we overlooking any potential pitfalls?
  3. Is there a better way to develop both the R and C libraries in parallel?
  4. Are there any examples of C libraries that have been decoupled from R packages?
  5. How might we write tests to compare equivalent functions in R and C?
  • I don't know R internals, but generally speaking about embedding a library in some interpreter you should care a lot about memory management (i.e. garbage collection) – Basile Starynkevitch Jun 28 '14 at 17:40

1 Answers1

2

Generally speaking, this is a fine idea and many packages do this. You might look at RSQLite for inspiration -- they package up sqlite and just include some wrapper functions. Similar for rhdf5 and hdf5

Regarding your points:

Write tests for existing functionality

Always a good idea!

Keep the C library inside the src/ folder

Yes -- or you could consider inst/include if you ever went the 'header-only' route, a la Rcpp

Place R-specific C code (e.g. SEXP, loading R libraries, etc) into 'R wrapper' files prepended with R_*

Seems sensible enough.

Create separate functions for reading configuration files in C

Not exactly sure what you mean here.

Create a 'main' C function to replace functionality in R

This seems odd to me. Why do you need a main -- aren't you just developing a library of callable functions? Let R be your main :)

Write a makefile for the C library that ignores R wrapper files

Yes, you will probably need a more involved Makefile to handle this -- I again suggest looking at the source code for RSQLite, and also R-exts may be helpful.

Once the C library works independently and equivalently to the R package, we could consider moving the C functions to a separate repository, that would be a dependency for the R package

Yes, this seems sensible -- have the R package retrieve the C source code as necessary when building / developing the package. This way they can be effectively decoupled.

Kevin Ushey
  • 236
  • 1
  • 1
  • Thanks. 1) "separate functions for reading configuration files in C" is an alternative to our current approach that uses xmlToList in R to read an input file, 2) re "main" we want to be able to compute entirely in C by passing a configuration file to the executable. – David LeBauer Jul 01 '14 at 17:50