DataSailr internal

Overview of DataSailr internal

DataSailr is an R package which conducts numerical calculation and string manipulation, and is implemented using C/C++ library via Rcpp. Dataframes are passed from R world to C++ world, or are accessed via Rcpp from C++ world. C++ extracts each record (each row) and passes it to the core engine, called libsailr. Libsailr takes data for each record as a table with pairs of variable name and pointer to object, which is internally called ptr_table.

With that ptr_table, libsailr conducts numerical caulcuation and manipulate strings following the DataSailr script. In more detail, DataSailr script is not directly used. DataSailr script is parsed and conveted to AST (abstract syntax tree), and finally is converted to Sailr VM instructions by libsailr. Libsailr VM works as a virtual stack machine, and works with ptr_table. When virtual machine finishes executing all the VM instructions, the results are on the ptr_table.

DataSailr internal

The result on ptr_table is copied back to Rcpp dataframe which is finally returned to R.

How DataSailr works with libsailr

How libsailr works

How libsailr works

Variable sources

What are called as varialbes can come from three different sources.

  1. LHS on assignment in DataSailr script
  2. RHS on assignment (or used as value) in Dataailr script
  3. Preexisting variables on ptr_table

These three types can overlap. For example, it's possible the variable preexits before execution and used in DataSailr script as RHS value, and also redefined on LHS in DataSailr script.

Varibles of some type should not exist. For example, variables that appear on RHS, but that do not preexist or appear on LHS must cause errors, because they are not defined even when they appear on RHS.

How variables are managed

Availble types and instructions at each component in libsailr

Roughly speacking, only integer, double and sring are the available types in libsailr. Integer and doule values can be dealt both as value itself and as pointers. Name beginning with "PTR_" suggest that they are pointer and beginning with "PP_" suggets pointer to pointer.

How variables are managed

Source file description

For the time when you find some bugs or inconsistent behaviors and you try to fix it by yourself, I will show you which files you may need to see. Also, if you happend to have interest in datasailr/libsair, I hope this guide to be helpful.