DataSailr

Row by row data manipulation for R

The data manipulation instruction is writtein in Sailr script language, which is designed specially for data manipulation.

   Install    Documents   Feedback  

About DataSailr

DataSailr package brings intuitive row by row data manipulation to R. The data manipulation instruction is writtein in Sailr scripting language, which is designed specially for data manipulation. In contrast to vanilla R, in which dataframe is manipulated using column vector and vector operations, the row wise data manipulation is more natural.

For example, when calculating body mass index (BMI) from body weight and height, calculation needs to be done for each row. Categorizing each person based on his/her BMI is also done for each row.

# Pass the following script to datasailr::sail() function.
# Example of Sailr script

code = '
// When calculating BMI, multipllication of 703 is required in the U.S. (using lbs and inches)
// In other countires using meter and kilograms, 703 should be omitted.
if( us == 1){
  bmi = weight / (height * height) * 703
}else{
  bmi = weight / (height * height)
}

if(bmi >= 40){ weight_level = . } 
else if( bmi >= 35 ){ weight_level = . } 
else if( bmi >= 30 ){ weight_level = 3 } 
else if( bmi >= 25 ){ weight_level = 2 }
else if( bmi >= 20 ){ weight_level = 1 }
else { weight_level = . }
'

DataSailr's main function, sail(), works by taking two arguments. It takes dataset and Sailr script, and it processes each row following the Sailr script.

library(datasailr)
df = data.frame(us=c(1,1,1,1,1,0,0,0,0), weight=c(150,120,175,160,180,80,60,50,90), height=c(70,60,60,70,65,1.7,1.6,1.7,1.9))
sail(df , code)

Examples are shown in documents.

Motivation

From my personal experience of data analysis in epidemiology field, I wanted to have a way to manipulate data in a row direction. R did not have this kind of functionality, and I started to develop DataSailr.

The Sailr script language is designed for data anaysis and statistics. People in these fields must feel natural. Compared with general purpose programming languages, Sailr script's functionality is very limited, but this limitation results in great fit for data manipulation.

How to install

CRAN

When DataSailr is available on CRAN, please use the following code to install. (When there are problems to fix, the package may be archived and not available on CRAN.)

# R interpreter

install.packages("datasailr")

Binary

Another way is to download a binary package and install it.

Download
# Download binary package appropriate for your environment.

# Linux 64bit
R CMD INSTALL datasailr_0.8.6_R_x86_64-pc-linux-gnu.tar.gz

# Windows 64bit
R CMD INSTALL datasailr_0.8.6.zip

How to use

Examples are shown in documents.

Presentation @UseR!2020

This package was accepted as a regualr presentation at UseR!2020 (which was originally planned to take place in St.Louis, and finally was held online). In this presentation, I introduced functionality of DataSailr and how to write Sailr script. When the presentation was made, the DataSailr version was 0.8.5. More features and bug fixes have been added since then.

Link to YouTube video (This link opens YouTube video.)

Presentation materials can be obtained from UseR!2020 website. https://user2020.r-project.org/program/contributed/

DataSailr or dplyr ?

A famous R package, dplyr, has been improving the same kind of points. It enables data manipulation without thinking much about column vectors. Pipe operator, %>% in magrittr package, and dplyr functions realize intuitive data manipulation flow. The DataSailr package enables the same kind of thing with a single Sailr code. The two packages do not compete, and I intend to implement DataSailr as it also can work with dplyr.

DataSailr dplyr
How to manipulate data Apply a single Sailr code (datasailr::sail()) Apply multiple functions using (%>%)
Create new column Assign value to new variable mutate()
Keep some columns (Not for this purpose) select()
Keep some rows discard!() drops rows filter()
Summarize columns (Not for this purpose) summarize()
Sort rows (Not for this purpose) arrange()
Regular expression Built-in Partially available with another R package
Available functions Only Sailr built-in functions are available Can call R functions
Convert wide to long format push!() function (use reshape2 package instead)
Convert long to wide format (Not implemented yet) (use reshape2 package instead)

Forum

Please leave a message. Your questions, comments and feedback are welcome.

   Feedback   

Get involeved