Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

High i/o usage

  • 20-06-2013 12:29pm
    #1
    Closed Accounts Posts: 3,981 ✭✭✭


    Hi there,

    We have a bunch of Linux systems, 32 bit, and we've built some 64 bit systems to replace them.

    I'm doing some web app testing against the 64 bit systems and I notice response time issues, every now and again response time goes through the roof and the transactions per second drops to zero.

    I'm digging into the system level stats now, and one thing I notice is that there are periods of the day which seem to line up with the application issues, whereby the I/O pressure for /dev/sdb is at 100%.

    When I run iostat -x 5 I get the following:
    Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
    sda               0.00    40.20    0.00   12.60     0.00   422.40    33.52     0.02    1.57   0.24   0.30
    sdb               0.00   151.60    0.00 1426.60     0.00 12625.60     8.85     3.94    2.76   0.09  13.26
    scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-2              0.00     0.00    0.00 1577.20     0.00 12617.60     8.00     4.37    2.77   0.08  13.14
    dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-4              0.00     0.00    0.00    1.00     0.00     8.00     8.00     0.00    3.00   1.20   0.12
    dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-6              0.00     0.00    0.00   44.20     0.00   353.60     8.00     0.04    0.95   0.01   0.06
    dm-7              0.00     0.00    0.00    0.80     0.00     6.40     8.00     0.00    0.00   0.00   0.00
    dm-8              0.00     0.00    0.00    5.20     0.00    41.60     8.00     0.01    2.81   0.35   0.18
    dm-9              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-10             0.00     0.00    0.00    2.60     0.00    20.80     8.00     0.00    0.46   0.23   0.06
    dm-11             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-12             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    

    I have highlighted dm2 and sdb as the problem areas. A regular iostat command shows:
    Linux 2.6.32-279.14.1.el6.x86_64 (fmdnwsdal23)  06/20/2013      _x86_64_        (64 CPU)
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               0.22    0.00    0.30    0.10    0.00   99.38
    
    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    sda               5.87         2.95       180.77   11054874  678013692
    sdb             144.42        10.63      1627.51   39869274 6104476914
    scd0              0.00         0.00         0.00       4928          0
    dm-0              0.04         0.49         0.23    1833380     876248
    dm-1              0.00         0.00         0.00       5388         10
    dm-2            199.35         8.71      1592.89   32659282 5974602176
    dm-3              0.05         0.01         0.38      49786    1412160
    dm-4              4.43         1.90        34.25    7113474  128462568
    dm-5              0.73         0.18         5.76     687634   21606912
    dm-6             16.25         0.17       129.99     649874  487580768
    dm-7              3.41         1.48        27.21    5561034  102056168
    dm-8              0.60         0.23         4.76     849242   17837760
    dm-9              0.00         0.00         0.00      17386       5120
    dm-10             0.39         0.01         3.12      19346   11711776
    dm-11             0.01         0.03         0.04      99770     157312
    dm-12             1.21         0.33         9.64    1220354   36155504
    

    I'm trying to find out which processes are responsible for this, as the 32 bit systems look fine:
    Linux 2.6.9-100.0.0.0.1.ELhugemem (fmdnewsmmk025)       06/20/2013
    
    avg-cpu:  %user   %nice    %sys %iowait   %idle
               1.33    0.00    0.30    0.00   98.37
    
    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    cciss/c0d0        3.16         0.45        42.07    8015048  752074180
    cciss/c0d0p1      0.00         0.00         0.00      73286          4
    cciss/c0d0p2      0.00         0.00         0.00      33926         16
    cciss/c0d0p3      0.00         0.00         0.00      31100          0
    cciss/c0d0p4      0.00         0.00         0.00         64          0
    cciss/c0d0p5      0.90         0.14        16.81    2491864  300527424
    cciss/c0d0p6      0.02         0.15         0.24    2684050    4324408
    cciss/c0d0p7      2.24         0.15        25.02    2695070  447222328
    cciss/c0d1        2.24         0.83        42.19   14824504  754201732
    dm-0              1.79         0.11        14.28    2018882  255339136
    dm-1              0.00         0.35         0.00    6254242         16
    dm-2              0.00         0.00         0.00      33498         16
    dm-3              0.00         0.00         0.00      10658         10
    dm-4              0.12         0.00         0.23      12356    4164722
    dm-5              0.00         0.00         0.00      31098         16
    dm-6              0.00         0.00         0.00      28720         10
    dm-7              3.34         0.01        26.73     131834  477861216
    dm-8              0.00         0.03         0.00     453042         16
    dm-9              0.00         0.00         0.00      30994         16
    dm-10             0.00         0.00         0.00      30970         16
    dm-11             0.12         0.14         0.94    2516218   16768088
    dm-12             0.00         0.00         0.00      29768         10
    dm-13             0.00         0.02         0.00     269878        572
    dm-14             0.00         0.16         0.00    2865306      67808
    dm-15             0.01         0.00         0.04      37146     689312
    dm-16             1.38         0.06        11.03    1090946  197131104
    dm-17             0.72         0.05         5.74     887698  102551200
    dm-18             0.00         0.00         0.00      31026         16
    dm-19             0.00         0.02         0.01     367610     155776
    dm-20             0.00         0.00         0.00      33482         16
    
    

    These systems don't have iotop or atop, which is quite unfortunate.

    Can you guys offer any pointers as to how I can find what is causing this increased I/O activity?

    Thanks! :)


Comments

  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    More I/O stat output, I'm only outputting every second so I'm sure I'll miss it going to 100% in real time, but we do have a tool that is alerting us when it hits 100%. when the alert came out, my iostat command grabbed this (65% util):
    Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
    sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdb               0.00   828.00    0.00 10327.00     0.00 89240.00     8.64    46.30    4.48   0.06  65.80
    scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-2              0.00     0.00    0.00 11155.00     0.00 89240.00     8.00   143.26   12.84   0.06  65.80
    dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-8              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-9              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-10             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-11             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    dm-12             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    sdb               0.00   744.00    0.00 6650.00     0.00 59096.00     8.89   136.70   18.08   0.15 100.00
    dm-2              0.00     0.00    0.00 7463.00     0.00 59704.00     8.00   153.08   18.08   0.13 100.00
    sdb               0.00   116.00    0.00  554.00     0.00  5136.00     9.27   147.64  249.87   1.81 100.00
    dm-2              0.00     0.00    0.00  680.00     0.00  5440.00     8.00   174.66  237.63   1.47 100.10
    sdb               0.00   101.00    0.00  556.00     0.00  5456.00     9.81   146.65  280.98   1.80 100.00
    dm-2              0.00     0.00    0.00  657.00     0.00  5256.00     8.00   173.45  283.56   1.52 100.00
    sdb               0.00   227.00    0.00  506.00     0.00  5272.00    10.42   180.75  303.91   1.98 100.10
    dm-2              0.00     0.00    0.00  736.00     0.00  5888.00     8.00   314.23  310.59   1.36 100.00
    sdb               0.00   124.00    0.00  504.00     0.00  4496.00     8.92   150.78  264.77   1.99 100.10
    dm-2              0.00     0.00    0.00  648.00     0.00  5184.00     8.00   186.58  229.06   1.54 100.10
    sdb               0.00    69.00    0.00  504.00     0.00  5120.00    10.16   153.53  326.82   1.98 100.00
    dm-2              0.00     0.00    0.00  531.00     0.00  4248.00     8.00   205.07  445.68   1.88 100.00
    sdb               0.00    42.00    0.00  519.00     0.00  4528.00     8.72   144.24  272.94   1.93 100.10
    dm-2              0.00     0.00    0.00  554.00     0.00  4432.00     8.00   159.14  281.20   1.81 100.10
    sdb               0.00    64.00    0.00  458.00     0.00  4064.00     8.87   145.51  309.05   2.18 100.00
    dm-2              0.00     0.00    0.00  506.00     0.00  4048.00     8.00   157.17  301.00   1.98 100.00
    sdb               0.00    48.00    0.00  405.00     0.00  3504.00     8.65   182.75  357.02   2.47 100.10
    dm-2              0.00     0.00    0.00  567.00     0.00  4536.00     8.00   197.91  273.99   1.77 100.10
    sdb               0.00   971.00    0.00  302.00     0.00  5480.00    18.15   200.53  542.03   3.31 100.00
    dm-2              0.00     0.00    0.00 1149.00     0.00  9192.00     8.00   474.96  325.96   0.87 100.00
    sdb               0.00    13.00    0.00  273.00     0.00  7112.00    26.05   145.04  796.65   3.66 100.00
    dm-2              0.00     0.00    0.00  270.00     0.00  2160.00     8.00   242.51 1416.94   3.70 100.00
    sdb               0.00    25.00    0.00  413.00     0.00  3464.00     8.39   144.67  354.37   2.42 100.00
    dm-2              0.00     0.00    0.00  450.00     0.00  3600.00     8.00   151.72  340.05   2.22 100.00
    sdb               0.00    15.00    0.00  440.00     0.00  3688.00     8.38   144.86  333.67   2.27 100.00
    dm-2              0.00     0.00    0.00  446.00     0.00  3568.00     8.00   151.40  347.17   2.24 100.00
    

    I'm going to install iotop to see what the craic is. More details to follow.


  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    [-0-] wrote: »
    I'm going to install iotop to see what the craic is. More details to follow.

    That was going to be my suggestion. Should nail it in seconds.


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    So it appears this is caused by jbd2, the journal block device.

    I'm wondering if we could maybe disable journaling? There may be one of my processes performing indexing regularly which in turn causes jbd2 to go mental.

    Has anyone encountered this before? Google shows up lots of threads whereby people are experiencing the same thing.


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    I should also state that these systems are all ext4.

    The following article is interesting: http://www.phoronix.com/scan.php?page=article&item=ext4_linux35_tuning&num=1

    Disabling journaling is risky business and can lead to a completely corrupted disk if the box has power issues. Not something we want with Production systems which need to be up 24/7.


  • Advertisement
  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    Our apps are trigger high write spikes, which in turn is causing jbd2 to block I/O.

    Next steps are to increase the I/O queue size, and the consider disabling journaling if that does not work.


  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    Interested to hear how this plays out. I'd bet money there is something configurable in the kernel to ease your journaling woes. Have you tried optimising the filesystem mount options (noatime, that kind of thing)?


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    Khannie wrote: »
    Interested to hear how this plays out. I'd bet money there is something configurable in the kernel to ease your journaling woes. Have you tried optimising the filesystem mount options (noatime, that kind of thing)?

    Sure have. :) No joy with that.

    I'm hoping increasing nr_requests to 100000 (it's at 128 now) works.

    Will know shortly. :)


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    Updated nr_requests to 2048, and disabled journaling.

    Now we wait. :)


  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    Ah disabling journaling will almost certainly box it off, but as you said earlier - at a cost. Have you battery backed raid cards perchance?


  • Advertisement
  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    Khannie wrote: »
    Ah disabling journaling will almost certainly box it off, but as you said earlier - at a cost. Have you battery backed raid cards perchance?

    We do indeed, and they have a backup supply as well.

    I would rather be able to tweak journaling instead of turning it off. Turning it off now to see what happens is just a way of us trying to identify if there's something going on with the hardware. Maybe the BUS is slow, for instance.

    I'll know how we look by 9pm EST tonight. :)


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    No dice. It's still happening with journaling disabled.

    Digging into the application now to see what is going on. Looks like it does some sort of cleanup at this time, and I'm seeing differences between 64 bit configs & arguments vs 32 bit.

    The fun continues. :)


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    Hey there,

    Do you know of a tool that will gather OS settings, such as:
    Drive mount options
    buffer i/o settings
    RAM
    CPU
    Library versions
    Kernel config parameters
    ETC

    Basically, I'm still seeing this issue (although some tuning has made a small improvement), and our vendor is not seeing the issue.

    I want to compare our OS settings to theirs, without having to grab all of these settings manually and scripting it. I was wondering if a tool existed that would do this.

    Yes, it is Friday and I am lazy!


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    Using fio.... very interesting behaviour.
    64 bit ext3
    read : io=5120.0MB, bw=2203.1MB/s, iops=563993, runt= 2324msec
    read : io=5120.0MB, bw=1570.8MB/s, iops=401938, runt= 3261msec
    write: io=5120.0MB, bw=169289KB/s, iops=42322, runt= 30970msec

    64 bit ext4
    read : io=5120.0MB, bw=162158KB/s, iops=40539, runt= 32332msec
    read : io=5120.0MB, bw=914828KB/s, iops=228707, runt= 5731msec
    write: io=5120.0MB, bw=806473KB/s, iops=201618, runt= 6501msec


  • Registered Users, Registered Users 2 Posts: 8,061 ✭✭✭10-10-20


    Do:
    cat /sys/block/*/queue/scheduler
    
    and report back.


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    64bit ext4:
    201% cat /sys/block/*/queue/scheduler
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    noop anticipatory deadline [cfq]
    noop anticipatory deadline [cfq]
    noop anticipatory deadline [cfq]
    

    64bit ext3:
    201% cat /sys/block/*/queue/scheduler
    none
    none
    none
    none
    noop anticipatory deadline [cfq]
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    none
    noop anticipatory deadline [cfq]
    noop anticipatory deadline [cfq]
    noop anticipatory deadline [cfq]
    noop anticipatory deadline [cfq]
    noop anticipatory deadline [cfq]
    


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    For the particular partitions in question (dm-2), there is none.

    For example:
    fmdnwsuat31:fmdadm,207% cat /sys/block/dm-2/queue/scheduler
    none


  • Registered Users, Registered Users 2 Posts: 8,061 ✭✭✭10-10-20


    Try this:
    Change to noop or deadline instead of CFQ. We found that CFQ queues IO for 5 seconds before flushing down to disk, causing latency spikes with fio.

    1. At boot time, the kernel parameter elevator=noop is typically appended to a kernel menu line in the appropriate boot path (such as /boot/grub/menu.lst, /boot/grub/grub.conf)

    2. At runtime, change the scheduler by echoing the name of the scheduler into /sys/block/$devicename/queue/scheduler, where the device name is the basename of the block device (such as sda), for /dev/sda. Some versions of Linux also apply the scheduler to multipath devices, and these also need to be amended, such as the following examples:

    echo noop > /sys/block/sda/queue/scheduler
    echo noop > /sys/block/dm-0/queue/scheduler

    IBM doc


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    Lovely stuff, I'll give that a go!


  • Registered Users, Registered Users 2 Posts: 8,061 ✭✭✭10-10-20


    The scheduler in square brackets is the chosen scheduler [cfq].
    What I don't get is why you aren't showing schedulers for all of your dm-x devices, but here we have a clue:
    http://blog.famzah.net/2010/01/25/why-sys-block-dm-0-queue-scheduler-exists-on-my-linux-system/

    A couple of ideas & questions:
    What kernel are you running?
    What does multipath -ll look like?
    Any idea as to what type of IO is generated by the application? fsynch...? direct, cached...
    Try forcing the scheduler to noop in the kernel line in grub and run a test and see what results you obtain. Perhaps leave "iostat -mx 5" running and piped into a logfile for the duration of the test.
    If you run fio with a 4k randrw and enable logging and then use the fio gnu plotter to make a graph, do you see similar spikes? Make the test file size (filesize) and the (size) larger than the target device cache.

    [global]
    filesize=4G
    size=4G
    ....
    runtime=2m
    write_bw_log
    write_lat_log
    write_iops_log

    [spec_4k_randrw]
    randrw
    bs=4k

    Run fio_generate_plots from the directory from where fio was run to create the PNG's.


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    10-10-20 wrote: »
    We found that CFQ queues IO for 5 seconds before flushing down to disk

    Why in the name of Jesus would it do that? Seems mental. I'm sure there's some reason for it though.


  • Registered Users, Registered Users 2 Posts: 8,061 ✭✭✭10-10-20


    I'm told that CFQ likes to maintain a queue of IO so that it can merge (re-order) the IO's before committing to disk. If you watch the rrqm and wrqm stats for CFQ over noop, noop doesn't merge as much IO as it doesn't have the IO in a buffer for it to merge IO's.
    Apparently CFQ is better for individual physical disks where it tries to apply some logic so that it can re-order blocks to match the LBA as best it can (like NCQ), but this doesn't always work best where there is a volume on a RAID card and the LBA's aren't sequential. But I'm not 100% on that.
    But look for articles on where noop and deadline perform better than CFQ, usually these are latency sensitive applications.


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    10-10-20 wrote: »

    A couple of ideas & questions:
    What kernel are you running?
    What does multipath -ll look like?
    Any idea as to what type of IO is generated by the application? fsynch...? direct, cached...
    Try forcing the scheduler to noop in the kernel line in grub and run a test and see what results you obtain. Perhaps leave "iostat -mx 5" running and piped into a logfile for the duration of the test.
    If you run fio with a 4k randrw and enable logging and then use the fio gnu plotter to make a graph, do you see similar spikes? Make the test file size (filesize) and the (size) larger than the target device cache.

    [global]
    filesize=4G
    size=4G
    ....
    runtime=2m
    write_bw_log
    write_lat_log
    write_iops_log

    [spec_4k_randrw]
    randrw
    bs=4k

    Run fio_generate_plots from the directory from where fio was run to create the PNG's.

    1. Using Linux 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 11:18:01 PST 2012 x86_64 x86_64 x86_64 GNU/Linux
    2. Command not found.
    3. No idea regarding the IO, however I have asked our vendor as they wrote the binary which is causing this.
    4. I'll try all four: noop, deadline, anticipatory and cfq. I wish to benchmark them all and go with the best one. :)

    The comments section in the link your provided say the following:
    Things seem even better on my test box with Ubuntu server having kernel “3.2.0-29-generic”. Kernel developers cleared up all confusion. The “dm-0″ scheduler has only one possible value when used with LVM — “none”. Example:

    famzah@VBox:~$ cat /sys/block/dm-0/queue/scheduler
    none

    I'm not sure if that kernel version is an option for us yet - is what this guy says true, is it impossible to change the scheduler for dm devices. I hope that's not the case.


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    There is a “/etc/multipath.conf” configuration file where the local disks not residing on SAN have to be “blacklisted” to prevent dm-multipath from creating additional primary and secondary paths to the disks. The multipathing feature is not needed on local disks because local disk is always RAID0+1, RAID5 or RAID6 to satisfy the High Availability Requirements.

    So the first place to look is /etc/multipath.conf to make sure the local disk is blacklisted.

    I’ve just completed that. There is no SAN here, Multipathing agents are not running and not installed.

    As a result it's safe to say dm2 is using cfq.


  • Registered Users, Registered Users 2 Posts: 8,061 ✭✭✭10-10-20


    So what are the disks which are exhibiting the performance issue attached to? A PCI RAID controller with JBOD?
    Can you play with the cache settings? Have you tried WT, WB and disabled? Is cache-on disk enabled?
    What was the result when you disabled the
    journaling on ext4?

    In one of you posts on the previous page you mentioned setting an alarm to notify you when the %util went to 100%. Ideally that's where you want to me at any time of high load as that's showing you that the path is 100% optimised. If you can't him 100% I'd be looking at why.

    I can see from iostat that you're doing 4k block writes, so is there a chance that you are spindle-bound?
    15 disks will do 200 IOPS per disk, so 1577 IOPS = ~7.6 physical disks (plus n for parity).
    Are these SATA disks?


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    10-10-20 wrote: »
    So what are the disks which are exhibiting the performance issue attached to? A PCI RAID controller with JBOD?
    Can you play with the cache settings? Have you tried WT, WB and disabled? Is cache-on disk enabled?
    What was the result when you disabled the
    journaling on ext4?

    In one of you posts on the previous page you mentioned setting an alarm to notify you when the %util went to 100%. Ideally that's where you want to me at any time of high load as that's showing you that the path is 100% optimised. If you can't him 100% I'd be looking at why.

    I can see from iostat that you're doing 4k block writes, so is there a chance that you are spindle-bound?
    15 disks will do 200 IOPS per disk, so 1577 IOPS = ~7.6 physical disks (plus n for parity).
    Are these SATA disks?

    /dev/sdb is the main culprit, and also dm-2.
    201% dmesg|grep sdb
    sd 0:0:0:1: [sdb] 860051248 512-byte logical blocks: (440 GB/410 GiB)
    sd 0:0:0:1: [sdb] Write Protect is off
    sd 0:0:0:1: [sdb] Mode Sense: 6b 00 00 08
    sd 0:0:0:1: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
     sdb:
     sdb1
    sd 0:0:0:1: [sdb] Attached SCSI disk
    dracut: Scanning devices sda3 sdb1  for LVM logical volumes vglocal/root
    
    

    I can certainly play with the cache settings, although I have not tried any of the settings you have asked about.

    When I disabled journaling, there was no difference. The only difference was that iotop was able to tell me which process was responsible for this behaviour.

    Not quite sure what spindle-bound is, sorry! Hehe.

    Here's some further output from dmesg:
    scsi0 : hpsa
    hpsa 0000:03:00.0: RAID              device c0b3t0l0 added.
    hpsa 0000:03:00.0: Direct-Access     device c0b0t0l0 added.
    hpsa 0000:03:00.0: Direct-Access     device c0b0t0l1 added.
    scsi 0:3:0:0: RAID              HP       P410i            5.70 PQ: 0 ANSI: 5
    scsi 0:0:0:0: Direct-Access     HP       LOGICAL VOLUME   5.70 PQ: 0 ANSI: 5
    scsi 0:0:0:1: Direct-Access     HP       LOGICAL VOLUME   5.70 PQ: 0 ANSI: 5
    ata_piix 0000:00:1f.2: version 2.13
    ata_piix 0000:00:1f.2: PCI INT B -> GSI 17 (level, low) -> IRQ 17
    ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
    ata_piix 0000:00:1f.2: setting latency timer to 64
    scsi1 : ata_piix
    scsi2 : ata_piix
    ata1: SATA max UDMA/133 cmd 0x1080 ctl 0x1088 bmdma 0x10a0 irq 17
    ata2: SATA max UDMA/133 cmd 0x1090 ctl 0x1098 bmdma 0x10a8 irq 17
    ata2.00: SATA link down (SStatus 4 SControl 300)
    ata2.01: SATA link down (SStatus 4 SControl 300)
    ata1.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
    ata1.01: SATA link down (SStatus 4 SControl 300)
    ata1.01: link offline, clearing class 3 to NONE
    ata1.00: ATAPI: DV-28S-W, C.2D, max UDMA/100
    ata1.00: configured for UDMA/100
    scsi 1:0:0:0: CD-ROM            TEAC     DV-28S-W         C.2D PQ: 0 ANSI: 5
    sd 0:0:0:0: [sda] 585871964 512-byte logical blocks: (299 GB/279 GiB)
    sd 0:0:0:1: [sdb] 860051248 512-byte logical blocks: (440 GB/410 GiB)
    sd 0:0:0:0: [sda] Write Protect is off
    sd 0:0:0:0: [sda] Mode Sense: 6b 00 00 08
    sd 0:0:0:1: [sdb] Write Protect is off
    sd 0:0:0:1: [sdb] Mode Sense: 6b 00 00 08
    sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
    sd 0:0:0:1: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
     sdb:
     sda: sda1 sda2 sda3
     sdb1
    sd 0:0:0:1: [sdb] Attached SCSI disk
    sd 0:0:0:0: [sda] Attached SCSI disk
    sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
    Uniform CD-ROM driver Revision: 3.20
    sr 1:0:0:0: Attached scsi CD-ROM sr0
    dracut: Scanning devices sda3 sdb1  for LVM logical volumes vglocal/root
    dracut: inactive '/dev/appsvg/stub' [4.00 MB] inherit
    dracut: inactive '/dev/appsvg/appsfs' [146.48 GB] inherit
    dracut: inactive '/dev/appsvg/apps2fs' [48.83 GB] inherit
    dracut: inactive '/dev/appsvg/nwtextfs' [195.31 GB] inherit
    dracut: inactive '/dev/vglocal/fisc' [6.00 GB] inherit
    dracut: inactive '/dev/vglocal/var' [6.00 GB] inherit
    dracut: inactive '/dev/vglocal/log' [4.00 GB] inherit
    dracut: inactive '/dev/vglocal/tmq' [4.00 GB] inherit
    dracut: inactive '/dev/vglocal/root' [12.00 GB] inherit
    dracut: inactive '/dev/vglocal/perf' [4.00 GB] inherit
    dracut: inactive '/dev/vglocal/home' [2.00 GB] inherit
    dracut: inactive '/dev/vglocal/backup' [2.00 GB] inherit
    dracut: inactive '/dev/vglocal/opt' [2.00 GB] inherit
    EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts:
    dracut: Remounting /dev/mapper/vglocal-root with -o noatime,ro
    EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts:
    dracut: Mounted root filesystem /dev/mapper/vglocal-root
    dracut: Switching root
    readahead-collector: starting
    EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts:
    EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts:
    EXT4-fs (dm-11): mounted filesystem with ordered data mode. Opts:
    EXT4-fs (dm-9): mounted filesystem with ordered data mode. Opts:
    EXT4-fs (dm-8): mounted filesystem with ordered data mode. Opts:
    EXT4-fs (dm-10): mounted filesystem with ordered data mode. Opts:
    EXT4-fs (dm-12): mounted filesystem with ordered data mode. Opts:
    EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts:
    EXT4-fs (dm-7): mounted filesystem with ordered data mode. Opts:
    kjournald starting.  Commit interval 5 seconds
    EXT3-fs (dm-1): using internal journal
    EXT3-fs (dm-1): mounted filesystem with ordered data mode
    EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts:
    EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts:
    EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts:
    

    I hope that info helps. :)


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    Do any of these settings stand out as problem areas?
    223% cat scheduler
    noop anticipatory deadline [cfq]
    224% cat read_ahead_kb
    128
    225% cat nr_requests
    128
    226% cat max_sectors_kb
    512
    227% cat max_hw_sectors_kb
    512
    228% cat rq_affinity
    1
    229% cat rotational
    1
    230% cat physical_block_size
    512
    231% cat optimal_io_size
    0
    232% cat nomerges
    0
    233% cat minimum_io_size
    512
    234% cat max_segment_size
    65536
    235% cat max_segments
    543
    236% cat logical_block_size
    512
    237% cat iostats
    1
    238% cat hw_sector_size
    512
    239% cat discard_zeroes_data
    0
    240% cat discard_max_bytes
    0
    241% cat discard_granularity
    0
    242% cat add_random
    1


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    Regarding the writing behviour, it is fsynch.


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    Making an app change to use fdatasync instead of fsync. More to follow.


  • Advertisement
  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    Fixed.

    I can't post the fix due to NDA ****, but fsync to fdatasync helped. It was an application issue, due to how they handle a certain item. I figured it out when I completely reverse engineered their binary (because they wouldn't give me parts of their code) and I observed some crap that shouldn't be happening for every write.

    Job done.

    Thank you all kindly!

    On the plus side, I learned a hell of a fecking lot!


  • Closed Accounts Posts: 3,981 ✭✭✭[-0-]


    In the mean time, it's late in the night but I think a celebratory whiskey is in order for a 3 - 4 month long bug being resolved.

    Thank funk!


Advertisement