ML with R on HPC

Why use R?

  • R has been the quintessensial language for Statistics for a while
  • Comes batteries included – tons of datasets and data visualization tools
  • RStudio & Shiny

(minor) Caveats

  • Multiple R versions
  • Bazillion packages to do the same thing
    • Solution: Use packages from the tidyverse universe if possible
  • Installing packages on HPC clusters can sometimes be non-trivial
  • Conda 😭

Resources for learning ML with R

Setting up R for ML on HPC

  • R and RStudio are already installed on HPC.
  • Highly recommend reading the HPC docs to troubleshoot R package installation issues.

Access RStudio from OOD

Navigate to https://ood.hpc.arizona.edu/. After login, you will see the Open OnDemand dashboard.

Select Interactive Apps, and then from the drop-down menu select RStudio Server.

Fill in the details in the form that opens up, and select Launch.

After the session becomes available, select Connect to RStudio Server.

Examples

For today’s examples, install the palmerpenguins, and naivebayes packages.

install.packages(c("palmerpenguins, naivebayes"))

Incomplete datasets

Realistic datasets, like R’s airquality dataset, often come with missing values.

  • Remove observations with missing entries
  • Fill the missing entries
  • Use models / algorithms that can account for missing entries (semi-supervised learning)

Download R script: data_prelim.R

Clustering 🐧

Cluster penguins into groups based on their bill features

Artwork by @allison_horst

  • We will use $k$ -means clustering to cluster the penguins
  • $k$ -means clustering partitions $n$ observations into $k$ clusters
  • Each observation belongs to the cluster with the nearest mean (centroid)
  • You have to specify the number of clusters
  • Penguin data comes from the palmerpenguins dataset

Download R script: penguins_kmeans.R

🍄 classification

Classify mushroom as edible or poisonous based on their physical features

Full dataset: Mushroom

Download R script with Naive Bayes classifier: mushroom_naivebayes.R

Backlinks