After thoughts of a recent Data Carpentry workshop...

I recently team instructed a Data Carpentry workshop with an awesome group of students. Interestingly, there were three students using laptops with Ubuntu Linux (or derivative) installed, which is both a first and just plain amazing! Unfortunately, there were issues with the SQLite Browser and either the R-base installation or R-base in conjunction with R-Studio, both required applications for the workshop. Read on to learn more...

SQLite Browser

First off, the SQLite Browser is a nice, light-weight GUI for interacting with SQLite databases. It is supported on all three of the main OSs (Windows, OS X, and Linux) and is free of charge. What seems to be a common Ubuntu package theme with applications that are under active development, the most recent stable releases of the applications out-pace the Ubuntu versions by leaps and bounds, especially in older versions of Ubuntu. In the case of SQLite Browser (also known as DB Browser) and Ubuntu 18.04 LTS, the default version is 3.10.1. This particular version of SQLite Browser seems to be afflicted with an inability to import CSV files correctly: empty values in CSV tables are assigned empty strings instead of NULL and all data are typed as text instead of a good interpretation of what the data type should be (e.g, 20 should be integer and 20.23 should be float). The latest stable version of SQL Browser (3.11.2 as of this posting) does not have these issues. So, how to get the latest stable version of SQLite Browser? I used the following recipe:

sudo add-apt-repository -y ppa:linuxgndu/sqlitebrowser
sudo apt update
sudo install sqlitebrowser

R-base and R-Studio

The issues surrounding R is a bit more cryptic, but for some reason, the R package tidyverse, which aggregates and extends other R packages (and is required for the workshop module), is fraught with installation and dependency issues in Ubuntu. What I have found to be a simpler approach to installing R for the Data Carpentry requirement is to first install Anaconda (i.e., conda), then install rstudio into either the conda base environment, or even better, install it into a new conda virtual environment (below, the name of my virtual environment in this example is "are"). The many necessary R base packages are installed as dependencies of rstudio, so all is good and simple when all you ask conda to install only rstudio:

conda create -n are --no-default-packages
conda activate are
conda install rstudio
rstudio

Although installing the tidyverse R package from within R-Studio (see below) still requires additional source package downloads, with subsequent compilation (patience here), I have found that Anaconda dependencies have greater integrity and seem to build consistently without errors, where as the default Ubuntu R base or that installed directly from the Comprehensive R Archive Network (CRAN) seems to be deficient in dependency management.

> install.packages("tidyverse")
> ...
> ...
> ...
> * DONE (tidyverse)
> ...
> library(tidyverse)
── Attaching packages ──────────────────────── tidyverse 1.3.0 ──
✔ ggplot2 3.3.0     ✔ purrr   0.3.3
✔ tibble  2.1.3     ✔ dplyr   0.8.5
✔ tidyr   1.0.2     ✔ stringr 1.4.0
✔ readr   1.3.1     ✔ forcats 0.4.0
── Conflicts ─────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

I do find it a bit odd that the Carpentries recommends installing Python via Anaconda, but not R? The R installation from Anaconda is simple and straightforward to perform, and is my choice for deployment.

Final Thoughts

These recommendations are my own and not those of the Carpentries organization.  As always, your mileage will vary.


Note: installation of these packages for Windows and OS X seems to be a bit more successful when following the Carpentries' setup instructions.