RPC - Speed issues

asked 2018-06-22 04:14:07 -0600

anonymous user

Anonymous

updated 2018-06-25 09:27:55 -0600

I am actually trying to copy data between two pipelines (on two different VM) using an RPC.

The first pipeline read data from a Sql Server on the local network and send it to another VM via a VPN (between a on-premises CheckPoint and Azure). This second pipeline, on Azure, write the data to Azure Data Lake Store.

The communication between the two VMs is encrypted and data are compressed.

The problem is, it tooks roughly 17 hours(two times more when I do not use any compression between the two RPCs) to write a table of ~6.5 GB to the data lake. But when the Network Team looked at the network consumption, they found that sdc was using 10 Mbits/s (the limit between the two VMs, tested with iperf3, is 30 Mbits/s) which doesn't correspond with the 17 hours needed for the ingestion.

So I looked at the logs and I saw:

2018-06-25 16:13:16,015 [user:] [pipeline:] [runner:] [thread:Scheduler-1608358806] WARN BaseWebSocket - WebSocket 'alerts' error: java.net.SocketTimeoutException: Timeout on Read java.net.SocketTimeoutException: Timeout on Read ...

2018-06-25 16:18:45,117 [user:*admin] [pipeline:...] [runner:0] [thread:Table Jdbc Runner - 0] WARN SdcIpcTarget - Batch for entity 'null' and offset 'null' could not be written out: java.net.SocketTimeoutException: Read timed out java.net.SocketTimeoutException: Read timed out

Does it means that I should increase the read timeout of the RPC ?

I also looked at jstat (after 17 hours):

S0     S1       E       O       M       CCS     YGC     YGCT      FGC     FGCT       GCT 
0,00   100,00   8,57    20,44   92,67   90,38   9238    1353,639  0       0,000      1353,639

The GC doesn't seem to be the problem

When I looked at top, one of the core of the sdc sending data is always at 100% and a second one at 25% (other cores are not active) but the sdc receiving the data is almost always doing nothing.

So I looked at iostat:

     iostat -xh 10
Linux 3.10.0-862.3.2.el7.x86_64    25/06/2018      _x86_64_        (4 CPU)
...

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          30,48    0,00    0,08    0,00    0,00   69,44

Device: rrqm/s  wrqm/s  r/s  w/s rkB/s  wkB/s avgrq-sz avgqu-sz await r_await w_await  svctm  %util
sda
         0,00     0,00 0,00 1,60  0,00  15,30    19,12     0,00   1,44   0,00    1,44    0,19  0,03
dm-0
         0,00     0,00 0,00 1,60  0,00  15,30    19,12     0,00   1,44   0,00    1,44    0,19  0,03
dm-1
         0,00     0,00 0,00 0,00  0,00   0,00     0,00     0,00   0,00   0,00    0,00    0,00  0,00

The disks are not active, which is normal.

And I looked also at vmstat and mpstat:

vmstat 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so ...
(more)
edit retag flag offensive close merge delete

Comments

How is the performance when both the source and target pipelines are in the same local network? For the CPU utilization, after top, I'd have a look at the outputs of iostat and vmstat as well to see where the extent of the time is getting spent.

Mufy gravatar imageMufy ( 2018-06-22 08:15:45 -0600 )edit