19 R Package Development
19.1 Prerequisites
Answer the following questions to see if you can bypass this chapter. You can find the answers at the end of the chapter in Section 19.19.
- What are the two absolutely essential files or directories required for a minimal functional R package?
- What is the primary function of the
DESCRIPTIONfile? - What is the key distinction between packages listed under
Importsand those listed underSuggests?
19.2 Learning objectives
By the end of this chapter you should be able to:
- Create a skeletal R package with
usethis::create_package()and add the canonical development tooling (use_git,use_testthat,use_readme_rmd,use_mit_license). - Structure code in
R/, data indata/, and tests intests/testthat/. - Document a function with
roxygen2tags and generate.Rdfiles withdevtools::document(). - Manage dependencies in
DESCRIPTIONusingImports,Suggests, andDepends. - Manage exports through the auto-generated
NAMESPACE, with@exportand@importFromroxygen tags. - Install a package from source, from a local directory, and from GitHub.
- Run
devtools::check()and resolve common warnings.
19.3 Orientation
Everything you have written so far has been a script. A script is a set of instructions that runs top-to-bottom and ends. A package is a reusable library of functions with documentation, tests, and a dependency declaration. Packages are how analytical work scales beyond a single analyst and a single project.
The Wickham and Bryan book R Packages (2nd ed.) and the usethis and devtools packages it documents are the modern foundation for R package development. The workflow they describe has become the standard, and this chapter follows it.
This chapter covers the workflow up to a buildable, installable, documented package. The next chapter (testing) covers the testing infrastructure that makes packages maintainable.
19.4 The statistician’s contribution
Package development looks like software engineering, and it is. The judgements specific to statistical packages are about audience and scope.
Who will use this? A package for your own analysis needs little more than working code and a DESCRIPTION. A package for your research group needs documentation and tests. A package for the public CRAN repository needs clean dependencies, vignettes, and continuous integration. Pick the audience first; the engineering follows.
What goes in the package, and what stays in the analysis? A function reused across two projects belongs in a package; one used once does not. A regression diagnostic that comes up in every analysis is a candidate; a one-off plot for one paper is not. The boundary is fuzzy; the heuristic is that anything you would otherwise copy-paste between projects is a packaging candidate.
Documentation as a contract. The roxygen comment above a function is a promise about its behaviour: what arguments it takes, what it returns, what conditions it expects. Vague documentation produces confused users and bug reports about expected behaviour. Specific documentation — including the edge cases the function does or does not handle, saves time over the package’s life.
Dependencies are a tax. Each package you Import becomes a thing your users must install. Each version restriction (dplyr (>= 1.0.0)) constrains the environments your package will work in. The minimum-viable dependency list saves friction; the maximalist ‘use whatever is convenient’ approach produces packages that are hard to install five years later.
These judgements are what distinguishes packages people use from packages that exist. Software craft does the rest.
19.5 Why package code?
Concrete benefits:
- Reuse. A function in a package is loaded with
library(yourpkg); a function in a script is sourced by hand or copy-pasted. - Documentation. Roxygen comments produce help pages accessible via
?function_name, indistinguishable from base R or CRAN package help. - Tests. Packages have a canonical home for tests (
tests/testthat/); scripts do not. - Dependency management.
DESCRIPTIONdeclares what the package needs; users install dependencies automatically. - Distribution. A
.tar.gzbuild can be installed by others. CRAN, GitHub, and internal package servers all use this format.
The cost: more files, more conventions, more tooling. The threshold for going from script to package is roughly ‘this code will be reused by me or someone else more than once’.
19.6 usethis::create_package() and first commit
library(usethis)
# create the skeleton in a new directory
create_package("~/research/phb228utils")This creates:
DESCRIPTION(metadata)NAMESPACE(export declarations; auto-generated)R/(source code directory; empty)phb228utils.Rproj(RStudio project file).Rbuildignore(files to exclude from package build).gitignore(files to exclude from version control)
Then add the standard scaffolding:
use_git() # initialise git repository
use_github() # create a GitHub repository
use_mit_license() # add an MIT licence
use_readme_rmd() # README that knits to README.md
use_testthat() # set up the testing framework
use_news_md() # NEWS.md for changelogEach of these adds files in canonical locations and updates DESCRIPTION and .Rbuildignore as needed. Doing all of this manually is tedious and error-prone; usethis encodes the conventions.
19.7 Package structure
phb228utils/
├── DESCRIPTION # metadata
├── NAMESPACE # auto-generated by roxygen2
├── R/ # source code
│ ├── summarise.R
│ ├── plot.R
│ └── package.R # package-level documentation
├── man/ # auto-generated by roxygen2
│ └── *.Rd
├── tests/
│ ├── testthat.R
│ └── testthat/
│ └── test-summarise.R
├── vignettes/
│ └── intro.Rmd # long-form documentation (optional)
├── data/ # binary R data files (.rda)
├── data-raw/ # scripts that create data/*.rda (not shipped)
├── inst/ # other files shipped with the package
├── LICENSE
├── README.md
├── NEWS.md
└── phb228utils.Rproj
What goes where:
R/: source code. One function per file is a common convention (R/summarise.Rforsummarise()). Files are loaded in alphabetical order; if some functions depend on others, use@includeroxygen tags or merge into one file.man/: auto-generated.Rddocumentation files. Never edit these by hand; edit the roxygen comments and re-rundevtools::document().tests/testthat/: test files (covered in Chapter 20).data/: example datasets shipped with the package, loaded withdata(my_data).vignettes/: long-form articles built with the package and accessible viavignette("intro", "phb228utils").
19.8 roxygen2 documentation
Documentation lives in comments above each function:
#' Summarise a numeric vector
#'
#' Produces a tibble with the mean, standard deviation, and
#' quartiles of a numeric vector, ignoring missing values.
#'
#' @param x A numeric vector.
#' @param probs Quantile probabilities to report. Defaults
#' to `c(0.25, 0.5, 0.75)`.
#' @return A tibble with one row and columns `mean`, `sd`,
#' and one column per requested quantile.
#' @export
#' @examples
#' summarise_numeric(rnorm(100))
#' summarise_numeric(rnorm(100), probs = c(0.05, 0.5, 0.95))
summarise_numeric <- function(x, probs = c(0.25, 0.5, 0.75)) {
stopifnot(is.numeric(x))
q <- quantile(x, probs = probs, na.rm = TRUE)
tibble::tibble(
mean = mean(x, na.rm = TRUE),
sd = sd(x, na.rm = TRUE),
!!!setNames(as.list(q), paste0("q", probs * 100))
)
}Tags:
@param name description: each argument.@return description: what the function returns.@export: the function is part of the package’s public API.@examples: runnable examples; checked byR CMD check.@importFrom pkg fn: import a specific function from another package.@seealso: links to related functions.@inheritParams other_function: copy parameter docs from another function.
After editing roxygen comments, regenerate the .Rd files and NAMESPACE:
devtools::document()This is one of the most common commands in package development; bind it to a keystroke.
19.9 DESCRIPTION: Imports, Suggests, Depends
A typical DESCRIPTION:
Package: phb228utils
Title: Helper Functions for Statistical Computing
Version: 0.1.0
Authors@R:
person("Ronald", "Thomas", email = "rgthomas47@gmail.com",
role = c("aut", "cre"))
Description: Reusable helpers for the PHB 228 statistical
computing course textbook.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Imports:
dplyr (>= 1.0.0),
tibble,
rlang
Suggests:
testthat (>= 3.0.0),
knitr,
rmarkdown
RoxygenNote: 7.3.2
The dependency fields:
Importsare mandatory. Functions you call in your exported code go here.library(yourpkg)will fail if these are not installed. Usepkg::fn()to call imported functions explicitly (best practice) or@importFrom pkg fnto make them available unqualified.Suggestsare optional. Used in vignettes, tests, examples, or by features that gate themselves onrequireNamespace("pkg", quietly = TRUE). The package must work without these.Dependsis for packages that should be loaded whenever yours is loaded (solibrary(yourpkg)also loads them). Avoid: it pollutes the user’s namespace. UseImportsinstead.
Add a dependency with usethis::use_package():
use_package("dplyr") # adds to Imports
use_package("testthat", "Suggests")Version requirements: use minimum versions for features you need, not exact pins. dplyr (>= 1.0.0) means ‘any version 1.0 or later’. Avoid dplyr (== 1.0.5), this constrains users to one specific version, which usually breaks within a year.
19.10 Installing and loading
During development:
devtools::load_all() # simulate library(yourpkg) without installingload_all() makes your package’s functions available in the current R session, including non-exported ones (so you can call internal functions for testing). It is the fastest way to iterate on changes.
To install for real use:
devtools::install() # build and install in user's library
devtools::build() # build a .tar.gz for distributionFor a package on GitHub:
remotes::install_github("yourname/yourpkg")Or from a local source directory:
install.packages("/path/to/yourpkg.tar.gz", repos = NULL,
type = "source")The standard checks before submission to anyone (CRAN, a collaborator, your future self):
devtools::check()This runs R CMD check, the gold standard for package quality. It tests:
- Documentation is complete and consistent.
- Examples run without errors.
- Tests pass.
- Dependencies are declared correctly.
- No undocumented functions.
- The package builds and loads on a clean R session.
A clean check() (no errors, warnings, or notes) is the goal for any shareable package.
19.11 Common R CMD check warnings
‘no visible binding for global variable’ when you use column names with non-standard evaluation (NSE) inside functions. Common in dplyr code: dplyr::filter(data, year == 2020) references year without quoting. The fix:
utils::globalVariables(c("year", "treatment"))at the top of one of your R/ files, or use .data$year and .data$treatment (preferred).
‘undefined exports’ when an @exported function does not exist. Re-run devtools::document().
‘package required but not declared’ when you use pkg::fn() for a package not in Imports. Add it.
‘examples lines wider than 100 characters’ when an example line is too long. Break it up.
The fixes are straightforward; the work is doing the fixes consistently.
19.12 Vignettes
A vignette is a long-form article packaged with your code:
use_vignette("intro")This creates vignettes/intro.Rmd with a template. Edit it; build with devtools::build_vignettes(); access in R with vignette("intro", "phb228utils").
Vignettes are how to teach users why and how to use your package, beyond the function-by-function reference. For an analysis package, the vignette is often a worked example.
19.13 Worked example: a small package
# 1. create the package
usethis::create_package("~/research/phb228utils")
# 2. add tooling (run from inside the new package)
usethis::use_git()
usethis::use_mit_license()
usethis::use_testthat()
usethis::use_readme_rmd()
# 3. add a function
# in R/summarise.R:
# roxygen header above summarise_numeric()
usethis::use_package("tibble")
# 4. document and check
devtools::document()
devtools::load_all()
?summarise_numeric
# 5. add a test
usethis::use_test("summarise")
# write the test, run:
devtools::test()
# 6. install
devtools::install()This sequence creates a working, documented, tested, installable package in about thirty minutes of focused work.
19.14 Collaborating with an LLM on package development
Package development has a lot of conventions; LLMs handle most of them reasonably and stumble on a few specific ones.
Prompt 1: drafting a roxygen header. Paste the function and ask: ‘write a roxygen2 header with (param?), (return?), (export?), and (examples?). The example should be a realistic, runnable use of the function.’
What to watch for. The example needs to actually run. A common LLM error: the example uses a variable that is not defined. Run the example yourself before committing.
Verification. devtools::document() then run the example via ?function_name. If the example fails, fix it.
Prompt 2: diagnosing an R CMD check warning. Paste the warning verbatim and ask: ‘what does this mean and how do I fix it?’
What to watch for. The standard warnings (global variables, undocumented arguments, missing imports) have known fixes; LLMs handle them well. Less common warnings (invalid CITATION format, vignette engine issues) get mixed answers; verify against the R packages book or the CRAN policies.
Verification. Apply the fix and re-run check(). If the warning persists, look up the message in the R packages book.
Prompt 3: deciding Imports vs Suggests. Describe how you use a dependency (e.g., ‘I call ggplot2::ggplot inside one of my exported functions; I also use it in a vignette’). Ask: ‘should this be in Imports or Suggests?’
What to watch for. The rule is ‘Imports if exported code uses it’. Vignette-only or test-only dependencies go in Suggests. The LLM should know this; if it hesitates, push for the rule.
Verification. Try installing the package on a fresh R session without the dependency installed. If library(yourpkg) fails, the dependency belongs in Imports.
19.15 Principle in use
Three habits define defensible package development:
- Use
usethisfor scaffolding. Hand-creating the package skeleton is error-prone and wastes time.usethisencodes the canonical layout and conventions. - Document every exported function. A roxygen header with
@param,@return, and a runnable@examplesis the contract with users. - Aim for a clean
R CMD check. No errors, warnings, or notes. The output ofcheck()is the first thing CRAN reviewers (and any thoughtful collaborator) look at.
19.16 Exercises
- Create a package
phb228utilswith a single functionsummarise_numeric()from chapter 1. Document it, add an example, and confirm that?summarise_numericworks afterdevtools::document()anddevtools::load_all(). - Add a dependency on
dplyrtophb228utils. Decide whether it belongs inImportsorSuggestsand justify your choice. - Build the package as a
.tar.gzwithdevtools::build()and install it on a clean R session. Verify it works. - Run
devtools::check()on the package. Fix every warning and note until the output is clean. - Write a vignette demonstrating
summarise_numeric()on a real dataset. Build the vignette and access it viavignette().
19.17 Further reading
- (Wickham & Bryan, 2023), R Packages, 2nd ed., the canonical reference. Free at r-pkgs.org. Tracks the modern
usethis/devtoolsworkflow. - The
usethisanddevtoolspackage documentation. - Writing R Extensions (the official R-core manual) for the technical reference; usually a last resort, but authoritative when conflicts arise.
19.18 Practice test
The following multiple-choice questions exercise the chapter’s content. Attempt each question before expanding the answer.
19.18.1 Question 1
What are the two absolutely essential files/directories required for a minimal functional R package?
DESCRIPTIONfile andman/directory
DESCRIPTIONfile andR/directory
NAMESPACEfile andR/directory
R/directory andtests/directory
B. DESCRIPTION provides metadata; R/ contains source code. All other components (NAMESPACE, man/, tests/) are either auto-generated or optional.
19.18.2 Question 2
What is the primary function of the DESCRIPTION file?
- To contain the actual R function code
- To store example datasets
- To provide essential package metadata including dependencies, author information, and version numbers
- To automatically generate help documentation
C. DESCRIPTION stores package metadata including Imports, Suggests, Depends, Author, Version, and License.
19.18.3 Question 3
What is the key distinction between packages listed under Imports versus Suggests?
Importslists packages required for core functionality;Suggestslists optional packages for enhanced features
Importsrefers to newer packages;Suggestsrefers to older, deprecated ones
Importsindicates packages from CRAN;Suggestsindicates packages from GitHub
- There is no meaningful difference
A. Imports packages are mandatory dependencies; Suggests packages are used only for optional features (vignettes, tests, examples) and need not be installed by default.
19.18.4 Question 4
You add @export above one function and not above another. What does this mean?
- Only the
@exported function is part of the package’s public API; the other is internal.
- Only the
- Both are public;
@exportis a stylistic choice.
- Both are public;
- The non-exported function is broken.
- The exported function is loaded faster.
A. Internal functions are accessible only via pkg:::fn() and may change without notice; exported functions are the package’s stable public API.
19.18.5 Question 5
After modifying a roxygen header above a function, you should next:
- Edit
man/function.Rddirectly to match.
- Edit
- Run
devtools::document()to regenerate the.Rdfile andNAMESPACE.
- Run
- Manually update
DESCRIPTION.
- Manually update
- Restart R.
B. devtools::document() regenerates documentation and the namespace from roxygen comments. Hand-editing .Rd files is wrong; they will be overwritten.
19.19 Prerequisites answers
- The
DESCRIPTIONfile (metadata) and theR/directory (source code). Everything else (NAMESPACE,man/,tests/,data/, vignettes) is either auto-generated or optional. The minimum viable package is one function inR/foo.Rplus a one-lineDESCRIPTION. DESCRIPTIONstores package metadata: name, version, title, description, author, license, and, crucially, dependencies (Imports,Suggests,Depends). It is the file that distinguishes a package from a directory of R scripts.Importslists packages required for core functionality (they must be installed for the package to work).Suggestslists packages used only for optional features (vignettes, tests, examples) and need not be installed by default. The rule of thumb: if your exported code callspkg::fn(), the package goes inImports. If only your tests, vignettes, or optional features use it,Suggests.