Training Manual for SDAL

2.1 Code documentaion

Each R script should perform a single task (if you have a 1000+ long file, you’re probably doing something wrong). For example:
1. 01-data_ingestion.R
2. 02-data_clean.R
3. 03-data_visualize.R
4. 04-data_output.R
Each script should have a short description on the top that explains what it is doing. If the script is part of a pipline. It should also document where the input data/script is coming from.
If you wrote a function in the script make sure it has a docstring that explains what it does and what the inputs and outputs are. For example:

#' squares a given value
#'
#' x: a value to square
#' return: a numeric value
my_square <- function(x):
    return(x ** 2)

Make sure the libraries that are loaded are towards the top of the script. There should not be a library call in the middle of your script. This helps figuring out what packages are needed.
Functions you’ve written should be towards the top (and doucumented) of the script as well. Usually it is under the library loading. This helps separate your functions from the code.
If the script does not take too long to run, you should test your script by restarting an R session. And running the script from top to bottom. To reset your R session, you can:
1. Click the red button on the top right corner of rstudio
2. In RStudio: command/ctrl + shift + F10
3. In RStudio type: .rs.restartR() in the terminal

This will make sure you have a totally clean enviornment when you are testing and running your script. It’s even ‘better’ than using ls(list = ls()) since it will also detach loaded packages.

2.1.1 lintr

Using a linter helps find potential errors in your code. For example, variables that you don’t use. It also checks to conform code to a common code style, all of which help make code easier to read for other people/collaboratiors.

To lint your script, you can run

lintr::lint('my_r_script.R')

RStudio will open a static code analysis “Markers” tab