[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bottleneck hunting: dm-crypt and RAID



Hola.  Maybe someone can give me a hand with this.  I have 5 disks,
connected to an LSI megaraid 150-6 PCI controller. Because I prefer the
transparency of linux software RAID, the drives are configured on the
card as 5 single-disk RAID0's.  

I get great speed when I use the RAID directly:

# dd if=/dev/md1 of=/dev/null bs=4096 count=200K
204800+0 records in
204800+0 records out
838860800 bytes (839 MB) copied, 8.0912 s, 104 MB/s

(from iostat -m 2 in the middle of the xfer)
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.50    0.00   33.67   65.83    0.00    0.00

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda             516.50        19.94         0.00         39          0
sdb             516.50        19.94         0.00         39          0
sdc             520.50        19.94         0.00         39          0
sdd             508.50        19.94         0.00         39          0
sde             508.50        19.97         0.00         39          0
md0               0.00         0.00         0.00          0          0
sdf               0.00         0.00         0.00          0          0
sdg               0.00         0.00         0.00          0          0
dm-0              0.00         0.00         0.00          0          0
md1           25408.00        99.25         0.00        198          0
dm-1              0.00         0.00         0.00          0          0
dm-2              0.00         0.00         0.00          0          0

Cool beans, right? 32bit 33mhz PCI bus's total bandwidth is 127M/sec, so
100M/sec isn't bad.  The CPU usage was the 'dd' process and minor system
overhead. The iowait being the system hungrily waiting for more if the
bus could provide. (my understanding, of course correct if wrong.)  The
numbers roughly matched top.

Now against the crypted.  I use aes-256-cbc for the dm-crypt.  Here's
how my processor handles that form of crypto.  (dm-crypt uses 512byte
sectors, so I'd expect a ceiling of 78M/sec minus overhead of other
processes.)

# openssl speed aes-256-cbc
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256 cbc      57872.85k    71981.65k    77056.43k    78409.39k 78817.96k

Here's what I get.

# dd if=/dev/mapper/storage1_crypt of=/dev/null bs=4096
count=400K
409600+0 records in
409600+0 records out
1677721600 bytes (1.7 GB) copied, 42.6342 s, 39.4 MB/s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00   70.85   29.15    0.00    0.00

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda             298.00         7.57         0.00         15          0
sdb             299.00         7.56         0.00         15          0
sdc             299.50         7.56         0.00         15          0
sdd             299.00         7.58         0.00         15          0
sde             299.50         7.59         0.00         15          0
md0               0.00         0.00         0.00          0          0
sdf               0.00         0.00         0.00          0          0
sdg               0.00         0.00         0.00          0          0
dm-0              0.00         0.00         0.00          0          0
md1            9664.00        37.75         0.00         75          0
dm-1              0.00         0.00         0.00          0          0
dm-2           9664.00        37.75         0.00         75          0

The CPU statistics boggle the mind.  I would expect 100% CPU utilization
and 0% iowait (working hard at the crypto but getting all the data it
wants in a timely fashion.)  Watching top creates more questions:

Cpu(s):  0.3%us, 69.8%sy,  0.0%ni,  0.0%id, 27.9%wa,  1.7%hi,  0.3%si,
0.0%st
Mem:   2075736k total,  2062784k used,    12952k free,   854484k buffers
Swap:  2007088k total,    37356k used,  1969732k free,  1121304k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5628 root      15  -5     0    0    0 S 47.9  0.0   1:02.61 kcryptd
 5688 root      20   0  3088  732  616 R  6.0  0.0   0:01.78 dd
 5627 root      15  -5     0    0    0 S  2.7  0.0   0:03.42 kcryptd_io
  189 root      15  -5     0    0    0 S  0.7  0.0   0:00.90 kswapd0
 3666 root      15  -5     0    0    0 S  0.7  0.0   0:01.52 md1_raid5

Those are the top 5 processes.. but I only come up with 58% CPU, yet the
top bar indicates 11% more than that.  And again, the waiting for IO.
Why isn't kcryptd using more of the available CPU?  Where's that other
11% coming from?  

Does anyone have any idea where the bottleneck is, or ideas on how I
might further optimize this?

Thanks,
Dam

-
To unsubscribe, send email to majordomo@luci.org with
"unsubscribe luci-discuss" in the body.