Ask Your Question
0

How can I pivot multiple columns?

asked 2017-11-20 14:46:31 -0600

anonymous user

Anonymous

updated 2017-11-20 15:12:23 -0600

metadaddy gravatar image

I really love streamsets but I don't see how I can pivot a large number of columns rather than one at a time. For example, I have a table with a large number of columns that need to be pivoted. Let's say columns 1-4 do not need to be pivoted but 5-9 do:

Table: col1,col2,col3,col4,col5,col6,col7,col8,col9

TO

col1,col2,col3,col4,col5
col1,col2,col3,col4,col6
col1,col2,col3,col4,col7
col1,col2,col3,col4,col8
col1,col2,col3,col4,col9

Does anyone have any idea how to do that in JS or any other supported Streamsets language or do you guys normally do larger transformations with perl or PIG? I just wanted to keep everything in Streamsets to make it easier to manage.

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted
0

answered 2017-11-20 17:12:33 -0600

metadaddy gravatar image

If you look at the Field Pivoter processor, you'll see that it assumes that the fields to pivot are in a list. We can use any of the script evaluators to put the required fields in a list.

Here is JavaScript that does the job:

var fieldsToPivot = ['col5', 'col6', 'col7', 'col8', 'col9'];

// Sample JavaScript code
for(var i = 0; i < records.length; i++) {
  try {
    var record = records[i];

    var fieldsToRemove = [];
    record.value.pivoted = [];
    for (key in record.value) {
      if (fieldsToPivot.indexOf(key) > -1) {
        record.value.pivoted.push(record.value[key]);
        fieldsToRemove.push(key)
      }
    }
    for (index in fieldsToRemove) {
      record.value.remove(fieldsToRemove[index]);
    }    

    // Write record to processor output
    output.write(record);
  } catch (e) {
    // Send record to error
    error.write(records[i], e);
  }
}

For example, with delimited input like this:

col1,col2,col3,col4,col5,col6,col7,col8,col9
1,2,3,4,5,6,7,8,9
a,b,c,d,e,f,g,h,i

The above code modifies the records:

image description

Configuring the pivoter thus:

image description

Achieves the following result:

image description image description

edit flag offensive delete link more

Comments

Thank you very much, that is awesome!

supaxi gravatar imagesupaxi ( 2017-12-28 13:03:42 -0600 )edit
0

answered 2017-12-28 13:06:45 -0600

supaxi gravatar image

I was working to extend this a little by concatenating each character in two columns with each other and then pivoting them all:

   for (var i = 0, len = record.value['col_a'].length; i < len; i++) {
      for (var j = 0, lenj = record.value['col_b'].length; j < lenj; j++) {
        var concatenateColA = record.value['col_a'].charAt(i);
        var concatenateColB= record.value['col_b'].charAt(j);
        var concatenateCombined = concatenateColA + concatenateColB;
        record.value.pivoted_combined.push(concatenateCombined); 
      }
    }

This loop causes the streamsets pipeline to die with a heapdump. Any ideas why a simple loop would do that?

Thanks!

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2017-11-20 14:46:31 -0600

Seen: 82 times

Last updated: Dec 28 '17