Converting long format to wide format

To manipulate time series data, converting long format to wide format is also necessary.

Currently (v0.8.6), DataSailr does not provide this functionality. In my opinion, this functionality is somewhat unintuitive, and even when there is something wrong with your code, it is hard to find after finishing conversion.

At now, I recommend the follwoing way, which you can detect errors.

Example: Body weigth time series data in long format. Convert it to wide format.

long_df = data.frame(
subj = c("Tom", "Tom", "Tom", "Tom", "Mary", "Mary", "Mary", "Mary", "Jack", "Jack", "Jack", "Jack"),
time = c(0, 1, 2, 3,                  0, 1, 2, 3,                    0, 1, 2, 3),
bw   = c(50, 48, 46, 42,              42, 42, 44, 42,                80, 75, 72, 73)
)
code = '
  subj = subj
  if( time == 0 ){
    t0 = bw
  }else if( time == 1 ){
    t1 = bw
  }else if( time == 2){
    t2 = bw
  }else if( time == 3){
    t3 = bw
  }
'
library(datasailr)
result = sail( long_df, code , fullData = F)
result
##    subj t0 t1 t2 t3
## 1   Tom 50 NA NA NA
## 2   Tom NA 48 NA NA
## 3   Tom NA NA 46 NA
## 4   Tom NA NA NA 42
## 5  Mary 42 NA NA NA
## 6  Mary NA 42 NA NA
## 7  Mary NA NA 44 NA
## 8  Mary NA NA NA 42
## 9  Jack 80 NA NA NA
## 10 Jack NA 75 NA NA
## 11 Jack NA NA 72 NA
## 12 Jack NA NA NA 73

Visually, you can easily see the conversion process is going well or not.

Compress rows with the same subject name into one row. Use the first non-missing value from each column as the new compressed row.The following R code does this job.

splitted_ds = split( result, as.numeric(as.factor(result$subj)))  # Split dataset with subj column into list.

new_df = data.frame()
for( sub_df in splitted_ds){
  new_df = rbind( new_df, data.frame( lapply( sub_df , function(x){ x[which.min(is.na(x))] })))
}

new_df
##   subj t0 t1 t2 t3
## 1 Jack 80 75 72 73
## 2 Mary 42 42 44 42
## 3  Tom 50 48 46 42