My pipeline is processing 145-155 records per second. Is it consider good how can we increse it.

I have created a pipeline that has 5 stages as below

  1. JDBC query consumer (reading from oracle -1000 records per batch)
  2. Field rename (it rename the column name)
  3. Expression evaluator to add new column combining exiting column value
  4. Ingesting to elastic search
  5. Pipeline finisher executor attached to originator (jdbc query consumer)

I have three separate server where Oracle, SDC, and elastic search running on each.

  • SDC: 4 vCPUs , RAM:8GB , DISK - 160GB
  • Oracle: 4 vCPUs , RAM:8GB , DISK - 160GB
  • Elastic search: 8 vCPUs , RAM:32GB , DISK - 640GB

Total processed records are between 145 to 155. It seems very low . How can i increase the count and what should be average count should it processed. Please suggest

There is no standard notion of "good" because the performance depends on a number of factors:

  • The speed of and network bandwidth to the RDBMS server
  • The configuration of JDBC parameters in the origin
  • The JVM options used by the Data Collector instance (ex: garbage collection, heap size)
  • The other stages involved in the pipeline
  • The batch characteristics (size, number of fields, depth of fields, etc.)
  • The speed and network bandwidth to the target infrastructure (Elasticsearch in this case)

What does the batch timing meter show is the stage that is taking the most time?

Thanks jeff for your points. 60 -75% time is taking by Elastic search destination.

Pankaj ( 2020-02-12 10:16:02 -0500 )

Improving indexing performance is mostly an ElasticSearch level exercise. Have a look at some of their high level recommendations <-here

jeff ( 2020-02-20 10:38:22 -0500 )
