Chapter 10 Building R Packages
“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.”
- Tony Hoare, Pioneering British computer scientist
10.1 Introduction
One of strengths of R is its capacity to format and share user-designed software as packages. Clearly it is possible to apply R for one’s entire scientific career without creating an R package. However, development of a package, even if it is not distributed to a formal repository, ensures that your software is trustworthy and portable. Importantly, this chapter only provides a overview of basic topics in package development. The most thorough guide to package creation is the document Writing R Extensions, which is maintained by the the R development core team.
10.2 Package Components
An R package is a directory of files, generally with nested subdirectories. Specifically,
DESCRIPTION
andNAMESPACE
files define fundamental characteristics of the package, e.g., the author(s), the maintainer, the package version, the dependency on other packages, etc.- Subdirectories, and their nested files, contain the package contents. The following subdirectories are possible, although not all need to exist within a package.
- The
R
subdirectory contains the package R code, stored as .r files, and will almost always exist. - The
data
subdirectory contains package datasets, usually stored as .rda files, which can be created usingsave()
. - The
man
subdirectory contains the package documentation, stored as .rd files, for functions (in theR
directory) and data (indata
), and almost always exists. - The (optional)
src
subdirectory contains raw source code requiring compilation (C, C++, Fortran). When building a package R will callR CMD SHLIB
(see Section 9.3.1) to create appropriate binary shared library files.
- Other potential subdirectories include:
demo
,exec
,inst
,po
,tests
,tools
, andvignettes
.
Fig 10.1 shows the contents of the streamDAG package. These directories, and their files, are contained within a parent directory called streamDAG
.
Example 10.1 \(\text{}\)
Creation of package components can be facilitated with the function package.skeleton()
. From the package.skeleton()
documentation Examples (see ?package.skeleton
), assume that we want to build a package that contains two silly functions: (f
and g
) and two silly datasets: (d
and e
).
We specify these as the list
argument in package.skeleton()
and give the package the name mypkg.
Running this code will cause a package skeleton for mypkg to be sent to the working directory. Note that the skeleton contains the subdirectories: data
, r
, and man
(Fig 10.2). The datasets d
and e
were converted to .rda files by package.skeleton()
and were placed in the data
subdirectory. The functions f
and g
were converted to .r files and placed in the r
subdirectory. Documentation skeletons for both functions and both datasets, as .rd files, were placed in the man
subdirectory. Package DESCRIPTION
, NAMESPACE
files, and a throw-away (Read-and-delete-me
) file were also created (Fig 10.2).
\(\blacksquare\)
10.3 Datasets (the data
Subdirectory)
Datasets in R are stored in the data
subdirectory. Three data formats are possible:
- Raw .r code
- Tabular data (e.g., .txt, .csv files)
- Data `
images'' created using the function
save()`, e.g., .rda or .Rdata files. This approach is generally recommended, particularly for large datasets. Here we create a simple .rda dataset, and send it to the working directory.
Data from packages will either be accessible via lazy loading (which allows increased accessibility) or with the data()
function. Under the former approach, package data objects will not be loaded upon loading of their package environment, however promises are created, requiring the object to be loaded when its name is entered in a session. Lazy loading always occurs for package R code but is optional for package data. Lazy loading of data can be specified in a ‘LazyData’
field from a package’s DESCRIPTION
file (see below). Examples of lazy loaded data include objects from the package datasets. Note that these do not require data()
for loading:
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
Under the latter, more common approach, data(*foo*)
must be called to allow availability of the dataset foo.
resources avail y1 n1
1 Riparian 0.06 0 445
2 Conifer 0.13 6 445
3 Mt. Shrub 1 0.16 9 445
4 Aspen 0.15 18 445
5 Rock outcrop 0.06 14 445
6 Sage/Bitterbrush 0.17 63 445
7 Windblown ridges 0.12 46 445
8 Mt shrub 2 0.04 62 445
9 Prescribed burns 0.09 178 445
10 Clearcut 0.02 49 445
10.4 R Code (the r
Subdirectory)
Code for functions is generally stored in the r
directory, as .r files. IDEs like RStudio, which contain options for the generation of .r scripts, e.g., File \(>\) New File \(>\) R script, can greatly aid in this process. Single .r files can contain multiple functions, although a one function per file approach may be easier to manage.
10.5 Documentation (the man
Subdirectory)
As functions become complex, it may become difficult to keep track of the meaning of function arguments, and the characteristics of function output, using a simple notes-to-self approach, e.g., . R documentation (.rd) files provide a framework for documenting, R functions, methods, and datasets. The prompt()
family of functions can greatly facilitate the creation of .rd files. In Example 10.1, the function package.skelton()
used the functions prompt()
and promptData
to build documentation skeletons for functions and datasets, respectively. For instance, the code below was applied to create documentation for the function f()
.
Created file named 'f'.
Edit the file and move it to the appropriate directory.
This code causes the file f.rd
to be generated, and sent to the working directory for further editing (Fig 10.3).
Some guidance for completing .rd files is provided by notes in the skeleton generated by prompt()
. I have removed these notes in Fig 10.3 to save space. As before, the authoritative resource for documentation building is Writing R Extensions.
Package documentation files can be placed into a man
directory and compiled into a single documentation entity as the package is compiled1, or compiled singly for R objects that a user deems worthy of documentation. The latter approach is facilitated with the Preview widget in RStudio, which is available upon opening an .rd file. Running Preview on the file f.rd
resulting in the .html preview shown in Fig 10.4.
An .rd file can be converted to legible documentation in .html, .pdf or other formats by depositing the file in the R directory containing R CMD
routines (e.g., bin/x64), and running the appropriate R CMD
algorithms from the command line. In Windows this requires first navigating to the directory containing the R CMD
routines using the Windows shell command line editor (see Ch 9). Important R CMD
documentation rendering algorithms include:
R CMD Rd2pdf
foo.rd`, can be used to compile the documentation file foo.rd into a .pdf document.R CMD Rd2txt
foo.rd`, can be used to compile the documentation file foo.rd into a pretty text format.R CMD Rdconv
foo.rd`, can be used to compile the documentation file foo.rd into a variety formats including plain text, HTML, or LaTeX.
10.6 The DESCRIPTION File
The DESCRIPTION
file contains basic information about a package. The DESCRIPTION
file skeleton for the mypkg package, created by package.skeleton()
in Example 10.1, is shown in Fig 10.5.
The DESCRIPTION
file will have a Debian control file format (see ?read.dcf
. Specifically, fields in DESCRIPTION
must start with the field name, comprised of ASCII (Ch 12) printable characters, followed by a colon. The value for the field is given after the colon and an additional space (Fig 10.5). If allowed, field values longer than one line must use a space or a tab to start a new line. Specification of ‘Package’
, ‘Version’
, ‘License’
, ‘Description’
, ‘Title’
, ‘Author’
, and ‘Maintainer’
fields, shown in Fig 10.5, are mandatory.
- The
‘Package’
field gives the name of the package. - The
‘Version’
field gives a user-specified package version. It should be a sequence of at least two non-negative integers separated by single usages‘.’
and/or‘-’
characters. - The
‘Title’
field should provide a descriptive title for the package. It should use title case (capitals for principal words), and not have any continuation lines. - The
‘Author’
field describes who wrote the package. Note that if your package contains wrappers of the work of others, which are included in thesrc
directory, then you are not the sole author. - The
‘Maintainer’
field provides a single name followed by a valid email address in angle brackets (Fig 10.5). - The
‘Description’
field should provide a comprehensive description of what the package does. Several (complete) sentences, complete, although these should limited to one paragraph. The field value should not to start with the package name, or‘This package...’
. - The
‘License’
field provides standard open source license information for the package. Failure to specify license information may prevent others from legally using, or distributing your package. Standard licenses available from (https://www.R-project.org/Licenses/) include GPL-2, GPL-3, LGPL-2, LGPL-2.1, LGPL-3, AGPL-3, Artistic-2.0, BSD_2_clause, and BSD_3_clause MIT. See Writing R Extensions for more information. - Other optional fields include:
‘Copyright’
,‘Date’
,‘Depends’
,‘Imports’
,‘Suggests’
,‘Enhances’
,‘LinkingTo’
,‘Additional_repositories’
,‘SystemRequirements’
,‘URL’
,‘BugReports’
,‘Collate’
,‘LazyData’
,‘KeepSource’
,‘ByteCompile’
,‘UseLTO’
,‘StagedInstall’
,‘Biarch’
,‘BuildVignettes’
,‘VignetteBuilder’
,‘NeedsCompilation’
,‘OS_type’
, and‘Type’
. See Writing R Extensions for more information on these fields.
10.7 The NAMESPACE File
The R namespace management system allows package authors to specify which variables in the package can be exported to package users, and which variables should be imported from other packages. The mandatory NAMESPACE
file for the toy mypkg package is extremely simple (Fig 10.6). It indicates that all four objects contained in the package, and their associated names, can be exported. If one wishes to export all objects and names for a large package, it is simpler to specify: exportPattern(.)
.
Import of exported variables from other packages requires specification of import
and importFrom
. The import
directive imports all exported variables from specified package(s). Thus, import(foo)
imports all exported variables in the package foo. If a package requires some of the exported variables from a package, then importFrom
can be used. The NAMESPACE
directive importFrom(foo, f, g)
indicates that f
and g
from package foo should be imported.
To ensure that S3 methods for package classes are available, one must register the methods in the NAMESPACE
file. For instance, if a package has a function print.foo()
that serves as a print method for class foo
, then one should include S3method(print, foo)
as a line in NAMESPACE
.
## Package Compilation
As with compilation of C and Fortran files (Ch 9), and the conversion of individual .rd files, the building and installation of a user-designed package requires depositing the package contents in the R directory containing the R CMD
routines.[Or providing a navigation address to the package for R CMD
]\(^{,}\)[Probably the only R CMD
routine isn’t clearly tied to the development of R packages is Rcmd BATCH
, which is used for running R scripts from the command line.] As before, one must run R CMD
routines from the command line, requiring (in Windows) that a user navigate to the directory containing the R CMD
routines at the Windows shell command line. This is unnecessary in Unix-like operating system (including MacOS), as these algorithms can be called directly from the computer’s command line. R CMD
routines for package building include:
R CMD build
foo, which would build the package foo.R CMD check
foo.tar.gz, which would check the tarballed package foo.tar.gz, created byR CMD build
.R CMD INSTALL
foo.tar.gz can be used to install the package foo.
Example 10.2 \(\text{}\)
Continuing from Example 10.1, I complete the following steps for package building/compression, checking, and installation.
- Here I Build a tarballed version of the mypkg package using:
R CMD build mypkg
.
- Here I check the tarballed version of the package using:
R CMD check mypkg_0.1.tar.gz
.
Note that the checks from R CMD check
can be extensive (the output above is just an excerpt). Checks are even more taxing if one uses the option --as-cran
which performs assessments one must pass for submission to CRAN.
- Finally, I Install the mypkg package into my workstation using:
R CMD INSTALL mypkg_0.1.tar.gz
.
\(\blacksquare\)
Exercises
Create an .rd documentation file for the function for McIntosh’s index of site biodiversity from Exercise 2 in 8. Make a .pdf or .html from the .rd file using the appropriate
R CMD
routines.Create an R package consisting of at least one function. Specifically,
- Create a skeleton of the package using
package.skeleton()
. - Finish the .rd file(s) in
man
. - Complete the
DESCRIPTION
file. - Complete the
NAMESPACE
file. - Build the package using
R CMD build
. - Check the package using
R CMD check
. Modify the package (if necessary) until no moreERRORS
orWARNINGS
occur.
- Create a skeleton of the package using