Exporting Multiple Pipelines into .CSV for Excel

Hi Everybody,

I greatly appreciate your assistance! I have a couple of StreamSets Pipelines on Multiple StreamSets Servers. They are processing from different sources and destinations. I would like to have it output/export of all my different pipelines on all my StreamSets servers to a CSV file all the different stages with their details that I can select or deselect as to be listed.

e.g. I would like this pseudo-output (human-readable so I making it simple one for this example):

PIPELINE1 contains Kafka servers KAFKA1 with KAFKATOPIC1 with Final output to KUDU-TABLE-A, KUDU-TABLE-B, KUDU-TABLE-C

PIPELINE2 contains Kafka servers KAFKA2 with KAFKATOPIC2 with Final output to KUDU-TABLE-D, KUDU-TABLE-E, KUDU-TABLE-F

to be in excel CSV format:


This way I can bring it into a spreadsheet for our entire system of which components are connected with with stages all on one screen without having to go through all the different pipelines one at a time, on each different servers (essentially a macroview of the entire StreamSet environment).

Thank you very much!

Thank you so much Manjit, iamontheinet,kirti.

Looking at the SDK, it says that one must "If you don’t yet have this activation key, contact StreamSets Support with a request for access to the SDK for Python". How and who do I email to for a request for access to the SDK for Python if I only have the community edition? Or is it no matter what, I need to pay to get either the StreamSets Control Hub or the SDK for Python?

One way is to use our SDK for Python:

  • This will enable you to loop through pipelines for one datacollector and then know details related to stages etc. which you can send/process to output to excel.
  • If You have SCH, then it will be easier for you . SDK can talk to SCH and get all data collectors and do step 1 for all of them. In fact, SCH can beautifully show you the topology of the system. If not, you have to do step 1 for each datacollector.
  • Here is documentation for SDK for Python --
Here's an example to point you in the direction of how to do this with SDK,

from streamsets.sdk import ControlHub

control_hub = ControlHub(server_url=<url>, username=<username>, password=<password>)
for pipeline in control_hub.pipelines:
    matched_stages = pipeline.stages.get_all(
    # or 
    stages = pipeline.stages

Similarly you can do the same with Data Collector instead of Control Hub if needed

