DataSailr

Row by row data manipulation for R

The data manipulation instruction is writtein in DataSailr script, which is designed specially for data manipulation.

   Install    Documents   Feedback  

About DataSailr

DataSailr package brings intuitive row by row data manipulation to R. The data manipulation instruction is writtein in DataSailr script, which is designed specially for data manipulation. In contrast to vanilla R, in which dataframe is manipulated using column vector and vector operations, the row wise data manipulation is more natural.

For example, when calculating body mass index (BMI) from body weight and height, calculation needs to be done for each row. Categorizing each person based on his/her BMI is also done for each row.

# Pass the following script to datasailr::sail() function.
# Example of DataSailr script

code = '
// When calculating BMI, multipllication of 703 is required in the U.S. (using lbs and inches)
// In other countires using meter and kilograms, 703 should be omitted.
if( us == 1){
  bmi = weight / (height * height) * 703
}else{
  bmi = weight / (height * height)
}

if(bmi >= 40){ weight_level = . } 
else if( bmi >= 35 ){ weight_level = . } 
else if( bmi >= 30 ){ weight_level = 3 } 
else if( bmi >= 25 ){ weight_level = 2 }
else if( bmi >= 20 ){ weight_level = 1 }
else { weight_level = . }
'

DataSailr's main function, sail(), works by taking two arguments. It takes dataset and DataSailr script, and it processes each row following the DataSailr script.

library(datasailr)
df = data.frame(us=c(1,1,1,1,1,0,0,0,0), weight=c(150,120,175,160,180,80,60,50,90), height=c(70,60,60,70,65,1.7,1.6,1.7,1.9))
sail(df , code)

Examples are shown in documents.

Motivation

From my personal experience of data analysis in epidemiology field, I wanted to have a way to manipulate data in a row direction. R did not have this kind of functionality, and I started to develop DataSailr.

The DataSailr script is designed for data anaysis and statistics. People in these fields must feel natural. Compared with general purpose programming languages, DataSailr script's functionality is very limited, but this limitation results in great fit for data manipulation.

How to Install

CRAN

When DataSailr is available on CRAN, please use the following code to install.

# R interpreter

install.packages("datasailr")

Binary

Another way is to download a binary package and install it.

Download
# Download binary package appropriate for your environment.

# Linux 64bit
R CMD INSTALL datasailr_0.8.7_R_x86_64-pc-linux-gnu.tar.gz

# Windows 64bit
R CMD INSTALL datasailr_0.8.7.zip

How to Use

Examples are shown in documents.

Presentation @UseR!2020

This package was accepted as a regualr presentation at UseR!2020 (which was originally planned to take place in St.Louis, and finally was held online). In this presentation, I introduced functionality of DataSailr and how to write DataSailr script. When the presentation was made, the DataSailr version was 0.8.5. More features and bug fixes have been added since then. Also, at that time DataSailr script was called just Sailr script, but it is now called DataSailr script.

Link to YouTube video (This link opens YouTube video.)

Presentation materials can be obtained from UseR!2020 website. https://user2020.r-project.org/program/contributed/


E-print @viXra

datasailr - An R Package for Row by Row Data Processing, Using DataSailr Script

This document introduces datasailr package, and shows potential benefits of using domain specific language for data processing.


DataSailr or dplyr ?

A famous R package, dplyr, has been improving the same kind of points. It enables data manipulation without thinking much about column vectors. Pipe operator, %>% in magrittr package, and dplyr functions realize intuitive data manipulation flow. The DataSailr package enables the same kind of thing with a single DataSailr code. The two packages do not compete, and I intend to implement DataSailr as it also can work with dplyr.

DataSailr dplyr
How to manipulate data Apply a single DataSailr code (datasailr::sail()) Apply multiple functions using (%>%)
Create new column Assign value to new variable mutate()
Keep some columns (Not for this purpose) select()
Keep some rows discard!() drops rows filter()
Summarize columns (Not for this purpose) summarize()
Sort rows (Not for this purpose) arrange()
Regular expression Built-in Partially available with another R package
Available functions Only DataSailr built-in functions are available Can call R functions
Convert wide to long format push!() function (use reshape2 package instead)
Convert long to wide format (Not implemented yet) (use reshape2 package instead)

Feedback

When you report issues or problems with the software

Please report issues or problems on Github.

When you seek support

For support, please post on a duscussion board or send an email.

Citation

Article @the Journal of Open Source Software

datasailr - An R Package for Row by Row Data Processing, Using DataSailr Script

When you need to cite this package, please use the following bibtex citation.

  @Article{,
    title = {datasailr - An R Package for Row by Row Data Processing, Using DataSailr Script},
    author = {Toshihiro Umehara},
    year = {2021},
    journal = {Journal of Open Source Software},
    volume = {6},
    number = {61},
    pages = {3166},
    doi = {10.21105/joss.03166},
    url = {https://doi.org/10.21105/joss.03166},
  }

Get involeved