This training is designed as an introduction to making and developing R packages which are important to reproducible ways of working. You should first have completed the following training sessions (or reached an equivalent standard to having done so):
- Introduction to using R on the Analytical Platform
- Introduction to R
- Writing functions in R
- Introduction to Git and GitHub
You must also have completed steps 1 to 4 and 6 of the MoJ Analytical Platform quickstart guide, making sure you can access RStudio from the control panel. If you have any issues, please post them in the appropriate Slack channel (either #ask-operations-engineering or #intro_r).
You will also require access to the S3 bucket alpha-r-training
. You can post an access request to the #intro_r (slack channel) slack channel.
Using two screens (e.g. your laptop plus a monitor) during the training session might be useful to enable you to watch the session on one and code on the other.
Recordings of these sessions can be viewed via links provided in the Analytical Platform and related tools training section on R training. If you have any access problems please contact [email protected].
- Section 1 - Introduction
- Section 2 - Package scope and naming
- Section 3 - Package structure
- Section 4 - Create the package
- Section 5 - Copyright and licencing
- Section 6 - Package metadata
- Section 7 - Checking your package
- Section 8 - Adding functions
- Section 9 - Making functions work in a package
- Section 10 - Documenting functions
- Section 11 - Testing your code
- Section 12 - Add a README
- Section 13 - Add a NEWS file
- Section 14 - Managing releases of your package
- Section 15 - Installing and using your package
- Section 16 - Maintenance cycle
- Annex
This training is based on the book R Packages by Hadley Wickham and Jennifer Bryan. The goal of it is to teach you how to make and develop packages. R packages are not difficult to make and have several benefits:
- Packages have a standard structure and are easy to install.
- Documentation is included with the code.
- Packages facilitate the integration of unit testing.
- Code changes can be clearly tracked via package versioning.
These benefits together improve the reliability, reusability and sharability of code, and give you the confidence to update it without the fear of unknowingly breaking something.
This training is designed with exercises to enable you to develop a package. Your example package will include functions to fetch data from s3 and build a simple tabulation like those found in many publication tables and MI-packs etc. A preview of the data we will be using is given below:
Rows: 6,000,000
Columns: 3
$ year <int> 2004, 2005, 2004, 2002, 2002, 2000, 2002, 2000, 2005, 2001, 2003, 2003, 200…
$ month <chr> "December", "June", "September", "August", "April", "May", "April", "March"…
$ crime <chr> "Crime C", "Crime A", "Crime B", "Crime B", "Crime C", "Crime C", "Crime C"…
Before you start developing a package there are two questions to consider "what will your package contain?" (the scope) and "what will you call it?" (the name).
You could put every function you ever write into one package but it is likely that this would quickly become difficult to maintain especially if this resulted in a large number of dependencies. Instead it is better to group your functions into thematically similar activities. For example the {forcats} package contains functions for working with categorical data and factors and the {stringr} package contains functions for working with strings and regular expressions.
Some packages may contain generalized functions (on a particular theme) that have a broad spectrum of applications e.g. the {psutils} R package. Others may contain very specialized functions that are only used as part of one process e.g. the {pssf} R package.
It is also worth considering whether your functions might fit within an existing package rather than starting a new one.
Possibly the hardest part of creating a package is choosing a name for it. This should:
- be short
- be unique (for Google searches)
- be made of ASCII letters, numbers and "." only (it must start with a letter)
- not use a mixture of upper and lower case letters (this makes the name hard to remember)
- if possible be clear about what the package does i.e. reflect the scope
You can read more in the R Packages section Name your package
- 2.1 Decide what name to call your package (something like your initials or name combined with "demo", "eg", or "toy" might be appropriate for this training). Make sure you respect the constraints on permitted characters!
- 2.2 Create a new github repository (Analytical Platform User Guidance), giving it your chosen name and "internal" visibility. Add a .gitignore file (using the "R" template) but not a license or README at this stage.
- 2.3 Clone the repo (Analytical Platform User Guidance) as an RStudio project.
R packages have a standard structure. The following components must be included (either because they are essential package components or because they are essential parts of the development and maintenance process).
- R/ - A folder where functions are saved (This is for package code only if you are making notes during the training don't save them here!).
- man/ - A folder for documentation.
- tests/ - A folder for {testthat} infrastructure and testing scrips.
- .Rbuildignore - A file that allows certain paths to be ignored when the package is built (R Packages book).
- DESCRIPTION - A file containing package metadata.
- NAMESPACE - A file containing exported and imported variable names.
- LICENCE and/or LICENSE.md - A file or files with information about how the code can be used.
- NEWS - A file that acts as a changelog so returning users can quickly see what has changed between different version of the package.
- README - A file or files that covers how to install the package and a guide for first time users.
Some packages may have other components (R Packages book), a few common ones that you may want to use are listed below:
- inst/ - A folder for "other files" e.g. markdown templates.
- data/ - A folder for data (nothing sensitive!) in .rda format that are available as part of the package e.g. for demonstrating functionality. Each data set should be documented in a similar way to functions.
- data-raw/ - A folder for preserving the creation history of your .rda file (must be added to the .Rbuildignore). This could also contain CSV versions of small data files used in testing code.
- 3.1 Take a look at the structure of a github repo which contains an R package e.g. {stringr} or {dplyr} and see if you can recognise the structure and elements described above.
The default branch of an R package GitHub repo must be reserved for working releases of the package.
Always make your changes on a different branch then merge to the default branch for each release.
You should also add protections to your main
branch (GitHub Docs article) to shield it from accidental pushes. (We will skip this step in the training for speed but it is very important for production code).
- 4.1 Create a new git branch called
dev
in RStudio where we will begin building the package.
There are several R packages that contain tools to help ensure your package is set up in the correct format and aid development by automating common tasks. The two we will be using today are {devtools} and {usethis}.
- 4.2 Using
install.packages()
, install the {devtools} and {usethis} packages. If you are using R < 4.4.0 on the AP please review appendix A3 first.
The following {usethis} function will structure your current working directory as an R package (you will need to overwrite what is already there when prompted):
usethis::create_package(getwd())
This will create several of the files and folders discussed at the start of the package structure section.
- 4.3 Set up you project as a folder using
usethis::create_package(getwd())
. You will be asked if you want to overwrite the existing .Rproj file. You do! - 4.4 Which standard package elements have been created?
Licencing code is essential as it sets out how others can use it. You can read more about licencing (R Packages book). The work-product of civil servants falls under Crown copyright (archived article) and usually requires an Open Government Licence but for open source software we have the option to use other open source licences (archived article). The MIT licence (open source initiative article) is the MoJ preferred choice (Analytical Platform User Guidance) and can be added to your package using:
usethis::use_mit_license("Crown Copyright (Ministry of Justice)")
This will add two text files to the top level of your project, LICENCE
and LICENCE.md
. It will also update the relevant section in the DESCRIPTION file and update the .buildignore file.
- 5.1 Add an MIT licence to your package
The DESCRIPTION file (R Packages book) contains important metadata about the package; it is a text file that you can open and edit in RStudio. You can view as an example the amended psutils package DESCRIPTION file. The formatting is important. Each line consists of a field name and a value, separated by a colon. Where values span multiple lines, they need to be indented. In particular:
- Title: - a one line description of the package - keep this short, with suitable use of capitals and less than 65 characters.
- Version: - the package version. This must be amended when you update the package. Use Semantic Versioning (see below)
- Authors@R: - the package authors and their rolls (more info below)
- Description: - a one paragraph summary of the package
- License: - licencing information (this will have been automatically updated when you added the licence with {usethis}).
- Imports: - all the other packages that your package uses for basic functionality. You can specify a minimum or maximum version in brackets after the name.
- Suggests: - packages that are not required for basic functionality but allow enhanced features such as vignettes or are useful during package development.
- Remotes: - if your package depends on another one that is not on CRAN, this is where you specify how to find it.
- Depends: - this is where you list a minimum version of R if you are aware of one. For example if you are using the R native pipe (
|>
) in your package you would need to specify R (>= 4.1.0).
- 6.1 Add a package title to the relevant field in the DESCRIPTION file.
- 6.2 Add a package description to the relevant field in the DESCRIPTION file.
Package authors are supplied as a vector of persons i.e. c(person(...), person(...))
. In addition
to a given
name, family
name, and an email
, each person should have a role
specified. More
information can be found by running ?person
but the four most common roles are detailed below
(multiple roles should be combined with c()
):
- aut: authors; those who have made significant contributions to the package.
- ctb: contributors; those who have made smaller contributions, like patches.
- cre: the package maintainer; the person you should contact if you have a problem.
- cph: copyright holder; most likely
person("Crown Copyright (Ministry of Justice)", role = "cph")
- 6.3 Add yourself to the DESCRIPTION file as the author and maintainer of the package.
- 6.4 Add the relevant copyright holder.
Semantic Versioning is a version control paradigm which uses a major.minor.patch system to communicate what type of changes occur between versions. A "major change" will increment the major number, a "minor change" will increment the minor number and a "patch change" will increment the patch number. The type of version change is linked to the type of code changes you make. The full Semantic Versioning specification is worth reading and learning (especially points 2-8) but a basic summary for now:
- You must not change your package without also changing the version number.
- If your code update contains any backwards incompatible (breaking) changes e.g. removing/renaming a function, changing an argument name, etc you must implement a major version change.
- If your code update contains any backwards compatible new features e.g. adding a new function, etc you must implement at least a minor version change.
- If your code update only contains backwards compatible changes e.g. refactoring code, bug fix, etc this would be a patch version change.
- Before version 1.0.0 any type of changes can occur at any point (the normal rules don't apply to allow rapid development).
- Once your package is in use, the version should probably be at least 1.0.0.
- Incrementing a number sets those to the right of it to zero e.g. a major change from version 1.2.3 would take you to version 2.0.0; a minor change from 0.1.3 would take you to 0.2.0.
- 6.5 Amend the description file to set the package version number to "0.1.0".
The Imports and Suggests fields are used for dependency management for your package/ development processes. You want to be as permissive as possible specifying minimum or maximum versions of packages listed in Imports and Suggests to increase the compatibility of your package with others. If you know that your code relies on functionality added in a particular version of a package you must specify the minimum version otherwise don't specify a minimum version.
Any package that your code relied upon for core functionality should be listed in the "Imports" section. The "Suggests" section is for packages that are used in the development process or give extra optional functionality.
There is a tool in {usethis} for adding packages to the description file. It will check if the package is installed before adding it so is useful for catching spelling mistakes!
By default, packages are added as Imports e.g. to add {dplyr} as an import:
usethis::use_package("dplyr")
. You can use the type
argument to add them to Suggests instead e.g.
to add {devtools} as a suggested package: usethis::use_package("devtools", type = "Suggests")
.
- 6.6 Add {devtools} and {usethis} to the suggests field.
- 6.7 We will be using the R native pipe so set the minimum version of R as >= 4.1.0 in the depends field:
usethis::use_package("R", type = "Depends", min_version = "4.1.0")
Packages require that the right files and the right information are in the right places. A small
mistake can prevent the package from functioning as intended. Many package features can be checked
using the function devtools::check()
. It runs a series of checks that examine (among other things)
package structure, metadata, code structure, and documentation. You can read more about the
individual checks (R Packages book). Any issues that are
identified will be labeled as "errors", "warnings" or "notes". Errors and warnings must be fixed.
Occasionally it is acceptable to leave a "note" but usually these should be fixed too.
- 7.1 Run
devtools::check()
- there should be no errors, warnings or notes. - 7.2 If all the checks pass, commit (to the git version history) and push (to GitHub.com) the DESCRIPTION file.
- 7.3 If all the checks pass, commit and push both licence files.
- 7.4 If all the checks pass, commit and push any changes to the .Rbuildignore, .gitignore and .Rproj files.
As you can take the Writing functions in R training course (GitHub repository), we will skip function development in this course.
We are going to include two functions in our example package, one that builds a tabulation of data and another that fetches some data from s3 before building the tabulation. The functions omit things like data validation and error handling that you should include in real production code.
In a package, functions must be saved in .R files in the R/ folder. You can have multiple functions in a single script (see suggestions about how to organise your functions (R Packages book)) but we will use one function per file for this exercise.
wrangle_data <- function(df, pub_year) {
df |>
dplyr::filter(.data$year == pub_year) |>
dplyr::mutate(
month_fct = forcats::fct(.data$month, month.name)
) |>
dplyr::group_by(.data$crime, .data$month_fct, .drop = FALSE) |>
dplyr::count() |>
tidyr::pivot_wider(names_from = "month_fct", values_from = "n", values_fill = 0)
}
assemble_crime_data <- function(path, year) {
path |>
arrow::read_parquet() |>
wrangle_data(pub_year = year)
}
- 8.1 Copy each function to a new R script and save it in the R/ folder. The function name is probably an appropriate name for each file.
- 8.2 Run
devtools::check()
- You will get a warning about undeclared imports and a note about an "undefined global function or variable". We will deal with these in the next section.
While the format of code inside a package is very similar to "normal R code", it is vital to
properly reference functions that you are using from other packages. You must never use
library()
, require()
or source()
calls inside a package; instead you should use
package::function()
syntax. You can read more about properly referencing functions (R Packages book).
In some instances it is better to import a function from the relevant namespace (more on this later).
Because packages like {dplyr} use "tidy evaluation" we need to make some changes to the code when
including it within packages. To find out more, read the
Programming with dplyr article). In the wrangle data function we get
around the use of unquoted column names by including the .data
"pronoun". For example, outside of
a package context iris |> dplyr::filter(Species == "Setosa")
is valid syntax and Species
will
be interpreted as a string (the name of a column in the data frame iris
) via "tidy evaluation".
In a package context however, it will be interpreted as an object name (and probably the name of an
object without a definition). This will cause the checks on the package to fail.
- 9.1 Have a look at the use of
package::function()
syntax in the functions. - 9.2 Have a look at the use of the
.data
pronoun in the wrangle data function. - 9.3 Add {arrow}, {dplyr}, {forcats} and {tidyr} to the imports field of the DESCRIPTION file (install the packages if prompted to).
- 9.4 Commit and push the changes to the DESCRIPTION file.
- 9.5 Run
devtools::check()
- you will still be getting the note about.data
- we will deal with this in the next section.
Documentation is really important so users know how to use the package, and package managers and developers can quickly get up to speed. It should therefore be embedded within the package in such a way that it is easily available to all users.
We can include "roxygen comments" with our functions to provide documentation that can be
automatically knitted into help files. Roxygen comments are denoted by hash and a single quotation
mark followed by a space #'
. Comments can then be labeled with a tag which is a string starting
with @ e.g. @title
would be the tag for the help file's title.
A set of roxygen comments for the assemble crime data function is given below.
#' @title Assemble Crime Data
#' @description Fetch crime data from a specified path and tabulate ready for publication.
#' @param path A string. The path or S3 URI to the parquet file containing the data.
#' @param year The year of the publication.
#' @export
#' @examples
#' assemble_crime_data(
#' "s3://alpha-r-training/r-package-training/synthetic-crime-data.parquet",
#' year = 2000
#' )
As a minimum, for each function exported for users of your package you should include:
@title
- the title for the help file@description
- a description of what your function does@param
- One for each argument in your function (Note that the name of the parameter comes after the tag followed by another space before the text describing the parameter)@examples
- Sufficient examples for users to get started with your function (most people will probably look at the examples before reading the text!)
There is a special tag @export
which indicates that the function should be added to the NAMESPACE
of your package. This means it will be accessible to users of your package and using the @export
tag
will also trigger the generation of a help file. Any functions that are for internal package use only
should not be tagged with @export
.
There is another special tag @importFrom
that can be used to import functions and methods etc from
the NAMESPACE of other packages. The use of this should be reserved for things like operators and
functions that are always nested inside other functions (for example aes()
from {ggplot2}) and
pronouns where the use of ::
syntax is either invalid or makes the code hard to read.
Once we have added our roxygen comments we can use devtools::document()
to generate the the help
files. These will be saved in the man/
folder. You will also see that the function is now listed
in the NAMESPACE file. (Note that devtools::document()
is also run as part of
devtools::check())
.
- 10.1 Copy the roxygen comment chunk above and paste it in the relevant script above assemble crime data function.
- 10.2 Run
devtools::document()
- you will now see a file inman/
and a change to the NAMESPACE - 10.3 Run
devtools::load_all()
followed by?assemble_crime_data
to view the help file generated from the roxygen comments - 10.4 Add roxygen comments for the wrangle data function (we can skip adding an example to speed up the training course)
- 10.5 Run
devtools::document()
- you will see another file inman/
and other function added to the NAMESPACE - 10.6 Add the following as as additional roxygen comment to the wrangle data file:
#' @importFrom dplyr .data
- 10.7 Run
devtools::document()
- you will see a new line in your NAMESPACE file that makes dplyr's.data
available for use in your package. This syntax should also be used for things like operators - 10.8 Run
devtools::check()
- 10.9 When all tests pass commit and push the R scripts containing the functions, the
man/
files and the NAMESPACE file.
You have written (in this case been given) some code but how do you know that it is actually doing
what you intended? You might use devtools::load_all()
to load your package and then try the
functions to see if they give the expected output. This works but every time you need to test your
functions (e.g. if any changes are made to your code base or if there are changes in your
dependencies) you will need to re-create the inputs to the function and re-write the code. This
quickly makes testing a very time consuming process.
We can instead formalize this testing process (and automate the running of it) using the {testthat}
R package. When we run the function usethis::use_testthat()
it will:
- Add
testthat (>= 3.0.0)
to the Suggests field in the DESCRIPTION file. - Creates a
tests/
folder, inside of which is atestthat/
folder, where your R test scripts should be placed, and atestthat.R
which helps in automating the testing.
- 11.1 Run
usethis::use_testthat()
to set up the testing infrastructure. - 11.2 Navigate to the script containing the assemble crime data function and in the console run:
usethis::use_test()
. This will open a new script which is saved intests/testhat/
. The script will have the same name as the function script but will have atest-
prefix. An example test will be given.
The {testthat} tests contain two elements, the name of the test and one or more expectations. A test will fail if at least one expectation is not met or if there is an unexpected error.
You can have multiple tests for a single function so the name of the test is important for identifying which test failed (when it fails). The test name should therefore contain information about what you are testing i.e. the function name and what specific behavior you are testing. Each test should always have a unique name within a package to avoid wasting time debugging the wrong test!
Expectations ({testthat} reference) are a series of functions that check for the presence or absence of specific values or properties in function outputs or their side effects.
- 11.3 Have a look at the {testthat} reference to see some of the pre-built expectations
Some tests for the assemble crime data function are given below. We are checking that when a valid path (and year) are supplied we get a data frame and no warnings are generated. We are not worried about testing the content of the data frame here as that is controlled by the wrangle data function. We will cover that with the tests for that function.
Due to the absence of bespoke error handling/ input checking in the function, and time constraints
when running the training, we are largely ignoring the year
argument in the assemble crime data
function. Furthermore, for "real" production code it would probably be safer/simpler to have
separate functions for "getting a data frame into R" and "doing stuff to the data frame" rather
than just relying on one that combines both elements. Structuring it like this for the training is
useful for conveying particular points in the training.
Additionally, we are checking that when an invalid path is used we get an error.
test_that("assemble_crime_data works with valid path", {
uri <- "s3://alpha-r-training/r-package-training/synthetic-crime-data.parquet"
assemble_crime_data(uri, year = 2000) |> expect_s3_class("data.frame")
assemble_crime_data(uri, year = 2001) |> expect_no_warning()
})
test_that("assemble_crime_data fails with invalid path", {
assemble_crime_data("foo", year = 2001) |> expect_error()
})
- 11.4 Copy the code above to the test file for the assemble crime data function.
- 11.5 Save the test file and run
devtools::load_all()
. - 11.6 Run
devtools::test()
- you will get feedback as the tests run about how many have failed, resulted in a warning, or passed.
Test coverage is a metric that can be useful in assessing the adequacy of tests. The {covr} package can be used to examine test coverage. It builds the package and runs the tests in a modified environment counting how many times each line of package code is run by the tests. You should aim to have every line covered by tests but don't rely on coverage alone when assessing the adequacy of tests. When we run the test coverage of our package we will get 100% (the wrangle data function is called by the assemble crime data function) but we are not (yet) properly testing the intended behaviour of the wrangle data function.
Test coverage can be particularly useful where you have if()
statements in your code to help you
ensure that all the various conditions that can arise have been covered. For example, if the
assemble crime data function did something special when the year was set to 2002 those lines
would not be covered by our existing text and this would be revealed by examining the test coverage.
if (year == 2002) {
message("Happy 2002!")
}
- 11.7 Run
devtools::test_coverage()
- the first time you run this you might be prompted to install the packages {covr} and {DT}. - 11.8 Add {covr} and {DT} to the Suggests field in your DESCRIPTION file.
In order to properly test the wrangle data function we probably want to ensure that the following exceptions are met in the output data frame:
- The output is a 13 column data frame (one column for
crime
and twelve for the months) - The month columns are arranged in chronological order (January to December)
- The data are filtered by
pub_year
correctly - The number of rows is the same as the number of unique "crimes" for the target year
We probably don't want to use "real" data when writing tests. By checking specific things like values, number of rows, number of columns etc in the outputs there is a risk of revealing unpublished information. Real data may also be subject to change (potentially causing tests to fail incorrectly). Additionally, real data is likely to be quite large (slowing down the testing process) and contain a lot of noise i.e. elements that are not relevant for testing a specific function.
We will use the following data frame to test the wrangle data function. It contains only the three
columns used by the test and two rows. The values for crime
are dummy values i.e. not the same as
the values used in the "real" data but that difference is not important for testing whether the
function works.
testing_df <- data.frame(
crime = c("foo", "bar"),
year = 2000:2001,
month = "January"
)
- 11.9 Create a testing file for the wrangle data function.
- 11.10 We will use one test - give it an appropriate name.
- 11.11 Include the
testing_df
data frame in the test and then add expectations to test the four points listed above. - 11.12 Run
devtools::check()
- this will also run the tests alongside the other checks. - 11.13 If all the checks pass, commit and push the testing files and the DESCRIPTION file.
The README acts as a "quick-start guide" for users of your package. It should include:
- Instructions for installing the package.
- A brief overview of what the package does and how you can get started using it.
- If the package is intended for open collaboration, instructions for how people can get involved.
You can use a simple markdown README or dynamically generate one using R Markdown which
enables the ability to embed code chunks and several other extensions useful for writing
technical reports. The latter may be preferable if you want to demonstrate what some of
your code does. You can add a README with either usethis::use_readme_md()
or
usethis::use_readme_rmd()
depending on the type you want.
- 12.1 Add a markdown README to your package
- 12.2 Update the install instructions to the following:
renv::install("[email protected]:moj-analytical-services/PACKAGE.git")
(you will need to replace "PACKAGE" with the name of your package). You can also remove the line about installing a "development" version. - 12.3 Replace the example with the example from the assemble crime data function.
- 12.4 Update the overview of what your package does.
- 12.5 Run
devtools::check()
- if all the checks pass commit and push the README.
The NEWS markdown file functions as a change-log for your package. It must be updated every time you make changes to your package.
- 13.1 Have a look at the NEWS file for {dplyr} R package - when were inequality joins introduced?
- 13.2 Add a NEWS file to your package (
usethis::use_news_md()
). - 13.3 We will not be submitting this package to CRAN so update the bullet point to something like "initial release".
- 13.4 Run
devtools::check()
- if all the checks pass commit and push the NEWS file.
Congratulations, you have successfully produced a working package in R! Open a pull request and
merge it to the main
branch.
GitHub Releases (GitHub Docs article) are a great way to manage the versions of your package. Every time you release an updated version of your package, include a GitHub release. This way if you ever need an older version of your package it is very easy to install using the GitHub Release Tag.
- 14.1 Open a pull request and merge the
dev
branch intomain
(delete thedev
branch once it is merged) - 14.2 Click on the "Releases" section on the Code tab of the GitHub repo for your package.
- 14.3 Click on "Draft a new release"
- 14.4 Fill in the release title with the Semantic Version number of your package
- 14.5 Add a description of the release (the section of your NEWS file pertaining to this version of the package might be appropriate)
- 14.6 Click on "Choose a tag"
- 14.7 The tag should be the Semantic version number prepended with a lowercase "v" e.g. for version
0.1.0
the tag will bev0.1.0
. After typing the tag you will need to click on "Create new tag: ... on publish". - 14.8 Click on the "Publish release" button
To install a package from a public GitHub repo using renv
you just need the owner and the
repo:
renv::install("moj-analytical-services/mojchart")
The easiest way install a package from an internal or private GitHub repo is with the following (SSH URL) syntax:
renv::install("[email protected]:moj-analytical-services/mojchart.git")
Note: If your package has any Imports that are from internal or private repos you will need to also use the SSH URL syntax in the Remotes field. For example the {psutils} package DESCRIPTION FILE includes {verify} as an import and as another internal package, the {verify} SSH URL syntax is specified in the {psutils} package DESCRIPTION FILE Remotes field.
With renv
>= 0.15.0
you can also include @ref
on the end of the URL where the "ref" is a
branch name, commit or github tag e.g.
renv::install("[email protected]:moj-analytical-services/[email protected]")
- 15.1 Try installing your completed package in a different repo
- 15.2 Have a look at the help file for the assemble crime data function
- 15.3 Run the example from the assemble crime data function help
You have released your package and have received some feedback from a user - "it would be better if the year was also included in the date column headings".
- 16.1 Switch back to the RStudio project where you are developing your package
- 16.2 Create a new
dev
branch (if you first need to remove the existing one, rungit branch -d dev
in the terminal) - 16.3 Install {renv} and run
renv::install()
. This function has special behavior in the presence of a DESCRIPTION file - it will install the packages listed there. This behaviour is bugged in some versions of {renv}. If you get an error message, runrenv::install("[email protected]")
, restart R (Ctrl+Shift+F10) then try again. - 16.4 Run
devtools::check()
. This is to see if any changes in your packages dependencies have broken anything (the effectiveness of this will depend on the quality of your code and testing). Address any dependency related issues before making further changes. - 16.5 Add the following as a second argument to the
dplyr::mutate()
inwrangle_data()
:month_fct = forcats::fct_relabel(.data$month_fct, ~ paste(.x, pub_year))
- 16.6 Run
devtools::load_all()
anddevtools::test()
- 16.7 Update the tests as necessary
- 16.8 Update the version number in the DESCRIPTION file
- 16.9 Update the NEWS file
- 16.10 Run
devtools::check()
- 16.11 When all tests pass, commit and push the changes
- 16.12 Open a pull request, merge to
main
and generate a new GitHub release
Continuous integration is about automating software workflows. An automated workflow can be
setup so that when you or someone else pushes changes to github.com, tests are run to
ascertain whether there are any problems. These checks should include the unit tests you've
developed and also the R CMD tests (over 50 individual checks for common problems) carried
out when you run devtools::check()
.
Before setting up this automation, you should have fixed any problems identified by running the R CMD tests - see Section 7 - Checking your package.
To setup continuous integration using GitHub Actions:
usethis::use_github_actions()
This automatically puts a status badge in your README.
You can read further about automating checking in R Packages Automated Checking chapter.
test_that("wrangle_data works", {
testing_df <- data.frame(
crime = c("foo", "bar"),
year = 2000:2001,
month = "January"
)
out_df_1 <- testing_df |> wrangle_data(pub_year = 2000)
out_df_1 |> ncol() |> expect_equal(13)
out_df_1 |> names() |> tail(12) |> expect_equal(month.name)
out_df_1$crime |> expect_equal("foo")
out_df_2 <- testing_df |> wrangle_data(pub_year = 2001)
out_df_2$crime |> expect_equal("bar")
})
Most R packages you install come from CRAN (The Comprehensive R Archive Network)
which stores them on a series of mirrored servers that act as package repositories.
Prior to R version 4.4.0 the Analytical Platform is set up to use a fixed R
package repository by default. Depending on the version of R on the Analytical
Platform you are using, this may be fairly old. Run options("repos")
in the
console and look at the date at the end to see which version you are using. To
access the latest versions of packages you can use the following to update
where you install from (this will reset when R is restarted).
options(repos = "https://packagemanager.rstudio.com/all/__linux__/focal/latest")