Ask Your Question
1

Filter File Tail records on the edge

asked 2018-09-20 09:58:24 -0500

Tom gravatar image

updated 2018-09-20 14:12:18 -0500

Hi StreamSets community,

I’m working on some tests of SDC and SDC Edge. Thereby, I’m experiencing problems with the filtering of data on the edge node and hope that you can give some helpful advice.

I’m using the File Tail processor to read an active file. A new data record is appended every 100ms to this file. The appended record is structured in a CSV style and the first line of the file contains the schema information. Here an example of the first three records (Dataset):

"X1_ActualPosition","X1_ActualVelocity","X1_ActualAcceleration","X1_CommandPosition","X1_CommandVelocity","X1_CommandAcceleration","X1_CurrentFeedback","X1_DCBusVoltage","X1_OutputCurrent","X1_OutputVoltage","X1_OutputPower","Y1_ActualPosition","Y1_ActualVelocity","Y1_ActualAcceleration","Y1_CommandPosition","Y1_CommandVelocity","Y1_CommandAcceleration","Y1_CurrentFeedback","Y1_DCBusVoltage","Y1_OutputCurrent","Y1_OutputVoltage","Y1_OutputPower","Z1_ActualPosition","Z1_ActualVelocity","Z1_ActualAcceleration","Z1_CommandPosition","Z1_CommandVelocity","Z1_CommandAcceleration","Z1_CurrentFeedback","Z1_DCBusVoltage","Z1_OutputCurrent","Z1_OutputVoltage","S1_ActualPosition","S1_ActualVelocity","S1_ActualAcceleration","S1_CommandPosition","S1_CommandVelocity","S1_CommandAcceleration","S1_CurrentFeedback","S1_DCBusVoltage","S1_OutputCurrent","S1_OutputVoltage","S1_OutputPower","S1_SystemInertia","M1_CURRENT_PROGRAM_NUMBER","M1_sequence_number","X.M1_CURRENT_FEEDRATE","Machining_Process"
198,0,0,198,0,0,0.18,0.0207,329,2.77,-1.42e-06,158,-0.025,-6.25,158,0,0,0.539,0.0167,328,1.84,6.43e-07,119,0,0,119,0,0,0,0,0,0,-361,0.001,0.25,-361,0,0,0.524,2.74e-19,329,0,6.96e-07,12,1,0,50,"Starting"
198,-10.8,-350,198,-13.6,-358,-10.9,0.186,328,23.3,0.00448,158,-19.8,-750,157,-24.6,-647,-14.5,0.281,325,37.8,0.0126,119,-20.3,-712,118,-25.6,-674,0,0,0,0,-361,0,0.25,-361,0,0,-0.288,2.74e-19,328,0,-5.27e-07,12,1,4,50,"Prep"
196,-17.8,-6.25,196,-17.9,-9.54e-05,-8.59,0.14,328,30.6,0.00533,154,-32.5,0,154,-32.3,-9.54e-05,-7.79,0.139,327,49.4,0.00943,115,-33.7,37.5,115,-33.7,-9.54e-05,0,0,0,0,-361,0,-0.438,-361,0,0,0.524,2.74e-19,328,0,9.1e-07,12,1,7,50,"Prep"

I want to filter the dataset for entries that have the following characteristics:

X1_ActualPosition == 198 (1st field)

M1_CURRENT_PROGRAM_NUMBER != 0 (45th field)

X.M1_CURRENT_FEEDRATE == 50 (47th field)

Is there a way to turn the text formatted output of the File Tail processor into a CSV formatted output, so that it can be filtered with a stream selector processor?

My current pipeline:

image description

I would like to use the following condition in the stream selector processor:

${record:value('/X1_ActualPosition') == 198 && record:value('/M1_CURRENT_PROGRAM_NUMBER') != 0 && record:value('/X.M1_CURRENT_FEEDRATE') == 50}

Thank you,

Tom

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
3

answered 2018-09-21 08:10:54 -0500

madhu gravatar image

Currently Delimited data format is not supported in Edge for File Tail Origin. We are planning to add the support soon - https://issues.streamsets.com/browse/....

Also in upcoming release we have added support for new string EL in edge to split the string - ${str:split(record:value('/text'), ',')}. So you can use expression evaluator processor to split the string value from file tail origin.

Expression Evaluator: /fields - ${str:split(record:value('/text'), ',')}

Stream Selector Processor: Lane 1 - ${record:value('/fields[0]') == "198" && record:value('/fields[45]') != "0" && record:value('/fields[50]') == "50"}

image description

image description

image description

edit flag offensive delete link more

Comments

Hi Madhu, thank you for your answer! Do you know when the new release will be published? I tried to build 3.5 on my own, but I ran into a lot of issues... Thanks, Tom

Tom gravatar imageTom ( 2018-09-22 03:49:57 -0500 )edit
1

Hi Tom, We are planning to release 3.5 soon (in one or two weeks). Try using 3.5 RC2 build - http://nightly.streamsets.com/3.5/3.5.0-RC2/tarball/SDCe/. Also please share the issues you are facing with build, I can help you out.

madhu gravatar imagemadhu ( 2018-09-22 12:32:19 -0500 )edit
1

Hi Madhu, thanks for pointing out the nightly builds! I used the http://nightly.streamsets.com.s3-us-west-2.amazonaws.com/datacollector/3.5/3.5.0-RC2/tarball/streamsets-datacollector-all-3.5.0.tgz tarball and exported the executable from the StreamSets UI. It works like a charm! Thank you very much!

Tom gravatar imageTom ( 2018-09-23 07:05:38 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-09-20 09:57:30 -0500

Seen: 65 times

Last updated: Sep 21