Chapter 10 Building R Packages

“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.”

- Tony Hoare, Pioneering British computer scientist

10.1 Introduction

One of strengths of R is its capacity to format and share user-designed software as packages. Clearly it is possible to apply R for one’s entire scientific career without creating an R package. However, development of a package, even if it is not distributed to a formal repository, ensures that your software is trustworthy and portable. Importantly, this chapter only provides a overview of basic topics in package development. The most thorough guide to package creation is the document Writing R Extensions, which is maintained by the the R development core team.

10.2 Package Components

An R package is a directory of files, generally with nested subdirectories. Specifically,

DESCRIPTION and NAMESPACE files define fundamental characteristics of the package, e.g., the author(s), the maintainer, the package version, the dependency on other packages, etc.
Subdirectories, and their nested files, contain the package contents. The following subdirectories are possible, although not all need to exist within a package.
The R subdirectory contains the package R code, stored as .r files, and will almost always exist.
The data subdirectory contains package datasets, usually stored as .rda files, which can be created using save().
The man subdirectory contains the package documentation, stored as .rd files, for functions (in the R directory) and data (in data), and almost always exists.
The (optional) src subdirectory contains raw source code requiring compilation (C, C++, Fortran). When building a package R will call R CMD SHLIB (see Section 9.3.1) to create appropriate binary shared library files.
Other potential subdirectories include: demo, exec, inst, po, tests, tools, and vignettes.

Fig 10.1 shows the contents of the streamDAG package. These directories, and their files, are contained within a parent directory called streamDAG.

Figure 10.1: Subdirectory level components of the streamDAG package.

Example 10.1 $\text{}$

Creation of package components can be facilitated with the function package.skeleton() . From the package.skeleton() documentation Examples (see ?package.skeleton), assume that we want to build a package that contains two silly functions: (f and g) and two silly datasets: (d and e).

f <- function(x, y) x + y
g <- function(x, y) x - y
d <- data.frame(a = 1, b = 2)
e <- rnorm(1000)

We specify these as the list argument in package.skeleton() and give the package the name mypkg.

package.skeleton(list = c("f","g","d","e"), name = "mypkg")

Running this code will cause a package skeleton for mypkg to be sent to the working directory. Note that the skeleton contains the subdirectories: data, r, and man (Fig 10.2). The datasets d and e were converted to .rda files by package.skeleton() and were placed in the data subdirectory. The functions f and g were converted to .r files and placed in the r subdirectory. Documentation skeletons for both functions and both datasets, as .rd files, were placed in the man subdirectory. Package DESCRIPTION, NAMESPACE files, and a throw-away (Read-and-delete-me) file were also created (Fig 10.2).

Figure 10.2: Subdirectory level components of the toy mypkg package.

$\blacksquare$

10.3 Datasets (the `data` Subdirectory)

Datasets in R are stored in the data subdirectory. Three data formats are possible:

Raw .r code
Tabular data (e.g., .txt, .csv files)
Data `images'' created using the functionsave()`, e.g., .rda or .Rdata files. This approach is generally recommended, particularly for large datasets. Here we create a simple .rda dataset, and send it to the working directory.

x <- rnorm(5)
save(x, file = "x.rda")

Data from packages will either be accessible via lazy loading (which allows increased accessibility) or with the data() function. Under the former approach, package data objects will not be loaded upon loading of their package environment, however promises are created, requiring the object to be loaded when its name is entered in a session. Lazy loading always occurs for package R code but is optional for package data. Lazy loading of data can be specified in a ‘LazyData’ field from a package’s DESCRIPTION file (see below). Examples of lazy loaded data include objects from the package datasets. Note that these do not require data() for loading:

datasets::BOD # data describing Biochemical Oxygen Demand

  Time demand
1    1    8.3
2    2   10.3
3    3   19.0
4    4   16.0
5    5   15.6
6    7   19.8

Under the latter, more common approach, data(*foo*) must be called to allow availability of the dataset foo.

library(asbio)
data(bighorn.sel) # bighorn sheep resource use and availability
bighorn.sel

          resources avail  y1  n1
1          Riparian  0.06   0 445
2           Conifer  0.13   6 445
3       Mt. Shrub 1  0.16   9 445
4             Aspen  0.15  18 445
5      Rock outcrop  0.06  14 445
6  Sage/Bitterbrush  0.17  63 445
7  Windblown ridges  0.12  46 445
8        Mt shrub 2  0.04  62 445
9  Prescribed burns  0.09 178 445
10         Clearcut  0.02  49 445

10.4 R Code (the `r` Subdirectory)

Code for functions is generally stored in the r directory, as .r files. IDEs like RStudio, which contain options for the generation of .r scripts, e.g., File $>$ New File $>$ R script, can greatly aid in this process. Single .r files can contain multiple functions, although a one function per file approach may be easier to manage.

10.5 Documentation (the `man` Subdirectory)

As functions become complex, it may become difficult to keep track of the meaning of function arguments, and the characteristics of function output, using a simple notes-to-self approach, e.g., . R documentation (.rd) files provide a framework for documenting, R functions, methods, and datasets. The prompt() family of functions can greatly facilitate the creation of .rd files. In Example 10.1, the function package.skelton() used the functions prompt() and promptData to build documentation skeletons for functions and datasets, respectively. For instance, the code below was applied to create documentation for the function f().

f <- function(x, y) x + y
prompt(f, filename = "f")

Created file named 'f'.
Edit the file and move it to the appropriate directory.

This code causes the file f.rd to be generated, and sent to the working directory for further editing (Fig 10.3).

Figure 10.3: Documentation file skeleton for the toy function f()

Some guidance for completing .rd files is provided by notes in the skeleton generated by prompt(). I have removed these notes in Fig 10.3 to save space. As before, the authoritative resource for documentation building is Writing R Extensions.

Package documentation files can be placed into a man directory and compiled into a single documentation entity as the package is compiled¹, or compiled singly for R objects that a user deems worthy of documentation. The latter approach is facilitated with the Preview widget in RStudio, which is available upon opening an .rd file. Running Preview on the file f.rd resulting in the .html preview shown in Fig 10.4.

$Preview of the .html generated from the code shown Fig \@ref(fig:fdoc).$

Figure 10.4: Preview of the .html generated from the code shown Fig 10.3.

An .rd file can be converted to legible documentation in .html, .pdf or other formats by depositing the file in the R directory containing R CMD routines (e.g., bin/x64), and running the appropriate R CMD algorithms from the command line. In Windows this requires first navigating to the directory containing the R CMD routines using the Windows shell command line editor (see Ch 9). Important R CMD documentation rendering algorithms include:

R CMD Rd2pdf foo.rd`, can be used to compile the documentation file foo.rd into a .pdf document.
R CMD Rd2txt foo.rd`, can be used to compile the documentation file foo.rd into a pretty text format.
R CMD Rdconv foo.rd`, can be used to compile the documentation file foo.rd into a variety formats including plain text, HTML, or LaTeX.

10.6 The DESCRIPTION File

The DESCRIPTION file contains basic information about a package. The DESCRIPTION file skeleton for the mypkg package, created by package.skeleton() in Example 10.1, is shown in Fig 10.5.

Figure 10.5: DESCRIPTION file of the toy mypkg package.

The DESCRIPTION file will have a Debian control file format (see ?read.dcf. Specifically, fields in DESCRIPTION must start with the field name, comprised of ASCII (Ch 12) printable characters, followed by a colon. The value for the field is given after the colon and an additional space (Fig 10.5). If allowed, field values longer than one line must use a space or a tab to start a new line. Specification of ‘Package’, ‘Version’, ‘License’, ‘Description’, ‘Title’, ‘Author’, and ‘Maintainer’ fields, shown in Fig 10.5, are mandatory.

The ‘Package’ field gives the name of the package.
The ‘Version’ field gives a user-specified package version. It should be a sequence of at least two non-negative integers separated by single usages ‘.’ and/or ‘-’ characters.
The ‘Title’ field should provide a descriptive title for the package. It should use title case (capitals for principal words), and not have any continuation lines.
The ‘Author’ field describes who wrote the package. Note that if your package contains wrappers of the work of others, which are included in the src directory, then you are not the sole author.
The ‘Maintainer’ field provides a single name followed by a valid email address in angle brackets (Fig 10.5).
The ‘Description’ field should provide a comprehensive description of what the package does. Several (complete) sentences, complete, although these should limited to one paragraph. The field value should not to start with the package name, or ‘This package...’.
The ‘License’ field provides standard open source license information for the package. Failure to specify license information may prevent others from legally using, or distributing your package. Standard licenses available from (https://www.R-project.org/Licenses/) include GPL-2, GPL-3, LGPL-2, LGPL-2.1, LGPL-3, AGPL-3, Artistic-2.0, BSD_2_clause, and BSD_3_clause MIT. See Writing R Extensions for more information.
Other optional fields include: ‘Copyright’, ‘Date’, ‘Depends’, ‘Imports’, ‘Suggests’, ‘Enhances’, ‘LinkingTo’, ‘Additional_repositories’, ‘SystemRequirements’, ‘URL’, ‘BugReports’, ‘Collate’, ‘LazyData’, ‘KeepSource’, ‘ByteCompile’, ‘UseLTO’, ‘StagedInstall’, ‘Biarch’, ‘BuildVignettes’, ‘VignetteBuilder’, ‘NeedsCompilation’, ‘OS_type’, and ‘Type’. See Writing R Extensions for more information on these fields.

10.7 The NAMESPACE File

The R namespace management system allows package authors to specify which variables in the package can be exported to package users, and which variables should be imported from other packages. The mandatory NAMESPACE file for the toy mypkg package is extremely simple (Fig 10.6). It indicates that all four objects contained in the package, and their associated names, can be exported. If one wishes to export all objects and names for a large package, it is simpler to specify: exportPattern(.).

Figure 10.6: NAMESPACE file of the toy mypkg package.

Import of exported variables from other packages requires specification of import and importFrom. The import directive imports all exported variables from specified package(s). Thus, import(foo) imports all exported variables in the package foo. If a package requires some of the exported variables from a package, then importFrom can be used. The NAMESPACE directive importFrom(foo, f, g) indicates that f and g from package foo should be imported.

To ensure that S3 methods for package classes are available, one must register the methods in the NAMESPACE file. For instance, if a package has a function print.foo() that serves as a print method for class foo, then one should include S3method(print, foo) as a line in NAMESPACE.

## Package Compilation As with compilation of C and Fortran files (Ch 9), and the conversion of individual .rd files, the building and installation of a user-designed package requires depositing the package contents in the R directory containing the R CMD routines.^{[Or providing a navigation address to the package for R CMD]$^{,}$}[Probably the only R CMD routine isn’t clearly tied to the development of R packages is Rcmd BATCH, which is used for running R scripts from the command line.] As before, one must run R CMD routines from the command line, requiring (in Windows) that a user navigate to the directory containing the R CMD routines at the Windows shell command line. This is unnecessary in Unix-like operating system (including MacOS), as these algorithms can be called directly from the computer’s command line. R CMD routines for package building include:

R CMD build foo, which would build the package foo.
R CMD check foo.tar.gz, which would check the tarballed package foo.tar.gz, created by R CMD build.
R CMD INSTALL foo.tar.gz can be used to install the package foo.

Example 10.2 $\text{}$
Continuing from Example 10.1, I complete the following steps for package building/compression, checking, and installation.

Here I Build a tarballed version of the mypkg package using: R CMD build mypkg.

Here I check the tarballed version of the package using: R CMD check mypkg_0.1.tar.gz.

Note that the checks from R CMD check can be extensive (the output above is just an excerpt). Checks are even more taxing if one uses the option --as-cran which performs assessments one must pass for submission to CRAN.

Finally, I Install the mypkg package into my workstation using: R CMD INSTALL mypkg_0.1.tar.gz.