Package management in R is a fairly straightforward process. The library
function simply loads all the functions (and other objects, such as datasets) from that package to the namespace.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Now that we loaded dplyr
, the function mutate
now lives in our namespace.
mutate
function (.data, ...)
{
mutate_(.data, .dots = lazyeval::lazy_dots(...))
}
<environment: namespace:dplyr>
head(mutate(iris, SPECIES = toupper(Species)))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species SPECIES
1 5.1 3.5 1.4 0.2 setosa SETOSA
2 4.9 3.0 1.4 0.2 setosa SETOSA
3 4.7 3.2 1.3 0.2 setosa SETOSA
4 4.6 3.1 1.5 0.2 setosa SETOSA
5 5.0 3.6 1.4 0.2 setosa SETOSA
6 5.4 3.9 1.7 0.4 setosa SETOSA
Pretty simple, right? If there’s a package containing some functions you want to use, just load it using library
, and repeat this for all the packages containing functions to be used.
library(plyr)
-------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
-------------------------------------------------------------------------
Attaching package: 'plyr'
The following objects are masked from 'package:dplyr':
arrange, count, desc, failwith, id, mutate, rename, summarise,
summarize
library(tidyr)
library(magrittr)
Attaching package: 'magrittr'
The following object is masked from 'package:tidyr':
extract
library(ggplot2)
library(foreach)
But if you look closely at some of these messages, you start to see where there might be problems. plyr
and dplyr
share many function names, and one of the messages explicitly states that the function name extract
is shared between magrittr
and tidyr
. R simply overwrites objects as they’re loaded–if you load magrittr
after tidyr
, then extract
will be from magrittr
and vice versa. Recently, RStudio’s very own Hadley Wickham has come under fire for the lag
function in dplyr
. lag
happens to be a widely used function from the stats
package, which is loaded automatically when you start RStudio. However, once you load dplyr
the stat::lag
function is removed from your namespace and is replaced by dplyr::lag
. Not only are conflicting function names problematic, such changes to widely used packages can easily break legacy code.
In Python, the analogue of import(package.name)
is from module import *
, as in “I want to load everything from this module to the namespace.” This is often considered bad practice for same reasons import(package.name)
causes problems. Instead, it’s preferred that you load only the objects that will be used: from module import function_1, function_2, dataset_1
. If there are too many to list conveniently, then it’s best to use import module
and then whenever an object is called from that module, it is prepended, e.g., module.function_1(x)
.
Some consider R’s analogue of Python’s import module
as best practice when writing in R. Regardless of whether a package is loaded, all package-dependent functions must be prepended (e.g., dplyr::select
, stats::lag
, xgboost::xgboost
, etc.) to avoid conflicts. The problem here is this causes R scripts to become very bloated, especially when it comes to functions like magrittr::`%>%`
or foreach::`%do%`
. In Python, a combination of import module
and from module import <objects>
is used. In general, import module
is for when a large number of objects from the module are needed and from module import <objects>
is for when only a select few objects from the module are needed. import module
can further be modified to import module as mod
to make prepending less verbose.
So for R, it would be great if we had the option to do something like:
import dplyr as dp
from magrittr import `%>%`, `%<>%`
from ggplot2 import *
iris %>%
dp$filter(Species == 'versicolor') %>%
ggplot() +
geom_histogram(aes(x = Petal.Length))
It turns out that this is possible in R with the base::loadNamespace
and import::from
functions!
loadNamespace('package')
(I won’t prepend this since it’s from the base
package and loadNamespace
probably isn’t used elsewhere) is the package “environment”, similar to a Python module. And this can be assigned to a variable. That variable will contain all the objects from that package which can be called using $
. For example,
# first, make sure dplyr is unloaded
detach(package:dplyr, unload = TRUE)
# assign dp as the dplyr environment
dp <- loadNamespace('dplyr')
# dp contains all of dplyr's objects
sample(names(dp), 6)
[1] "Progress" "print.sql_variant" "op_sort.tbl_lazy"
[4] "recode.factor" "select_vars" "arrange"
# for example ...
dp$mutate
function (.data, ...)
{
mutate_(.data, .dots = lazyeval::lazy_dots(...))
}
<environment: namespace:dplyr>
# and it works as expected
head(dp$mutate(iris, SPECIES = toupper(Species)))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species SPECIES
1 5.1 3.5 1.4 0.2 setosa SETOSA
2 4.9 3.0 1.4 0.2 setosa SETOSA
3 4.7 3.2 1.3 0.2 setosa SETOSA
4 4.6 3.1 1.5 0.2 setosa SETOSA
5 5.0 3.6 1.4 0.2 setosa SETOSA
6 5.4 3.9 1.7 0.4 setosa SETOSA
So in short, R’s loadNamespace('package')
is basically the analogue of Python’s import module as mod
! This presents a nice compromise between keeping code concise and avoiding namespace conflicts–dp$mutate
is much easier to swallow than having to write dplyr::mutate
while maintaining clarity.
But what about functions like `%>% `
? Even if we use m <- loadNamespace('magrittr')
, having to write m$`%>%`
every time we pipe is annoying. It would be nice if there was something like from magrittr import `%>% `
. The import
package lets us do exactly that.
# first, detach magrittr
detach(package:tidyr, unload = TRUE) # tidyr has to be detached first due to dependencies
detach(package:magrittr, unload = TRUE)
# confirm that %>% is gone
exists('%>%')
[1] FALSE
# load only the pipe operators from magrittr
import::from(magrittr, `%>%`, `%<>%`, `%T>%`)
# confirm that it worked
sapply(c('%>%', '%<>%', '%T>%'), exists)
%>% %<>% %T>%
TRUE TRUE TRUE
# confirm that it works as expected
iris %>% head()
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa