High i/o usage

[-0-] · 20-06-2013 01:29PM #1

Hi there,

We have a bunch of Linux systems, 32 bit, and we've built some 64 bit systems to replace them.

I'm doing some web app testing against the 64 bit systems and I notice response time issues, every now and again response time goes through the roof and the transactions per second drops to zero.

I'm digging into the system level stats now, and one thing I notice is that there are periods of the day which seem to line up with the application issues, whereby the I/O pressure for /dev/sdb is at 100%.

When I run iostat -x 5 I get the following:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    40.20    0.00   12.60     0.00   422.40    33.52     0.02    1.57   0.24   0.30
sdb               0.00   151.60    0.00 1426.60     0.00 12625.60     8.85     3.94    2.76   0.09  13.26
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00 1577.20     0.00 12617.60     8.00     4.37    2.77   0.08  13.14
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    1.00     0.00     8.00     8.00     0.00    3.00   1.20   0.12
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-6              0.00     0.00    0.00   44.20     0.00   353.60     8.00     0.04    0.95   0.01   0.06
dm-7              0.00     0.00    0.00    0.80     0.00     6.40     8.00     0.00    0.00   0.00   0.00
dm-8              0.00     0.00    0.00    5.20     0.00    41.60     8.00     0.01    2.81   0.35   0.18
dm-9              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-10             0.00     0.00    0.00    2.60     0.00    20.80     8.00     0.00    0.46   0.23   0.06
dm-11             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-12             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

I have highlighted dm2 and sdb as the problem areas. A regular iostat command shows:

Linux 2.6.32-279.14.1.el6.x86_64 (fmdnwsdal23)  06/20/2013      _x86_64_        (64 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.22    0.00    0.30    0.10    0.00   99.38

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               5.87         2.95       180.77   11054874  678013692
sdb             144.42        10.63      1627.51   39869274 6104476914
scd0              0.00         0.00         0.00       4928          0
dm-0              0.04         0.49         0.23    1833380     876248
dm-1              0.00         0.00         0.00       5388         10
dm-2            199.35         8.71      1592.89   32659282 5974602176
dm-3              0.05         0.01         0.38      49786    1412160
dm-4              4.43         1.90        34.25    7113474  128462568
dm-5              0.73         0.18         5.76     687634   21606912
dm-6             16.25         0.17       129.99     649874  487580768
dm-7              3.41         1.48        27.21    5561034  102056168
dm-8              0.60         0.23         4.76     849242   17837760
dm-9              0.00         0.00         0.00      17386       5120
dm-10             0.39         0.01         3.12      19346   11711776
dm-11             0.01         0.03         0.04      99770     157312
dm-12             1.21         0.33         9.64    1220354   36155504

I'm trying to find out which processes are responsible for this, as the 32 bit systems look fine:

Linux 2.6.9-100.0.0.0.1.ELhugemem (fmdnewsmmk025)       06/20/2013

avg-cpu:  %user   %nice    %sys %iowait   %idle
           1.33    0.00    0.30    0.00   98.37

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
cciss/c0d0        3.16         0.45        42.07    8015048  752074180
cciss/c0d0p1      0.00         0.00         0.00      73286          4
cciss/c0d0p2      0.00         0.00         0.00      33926         16
cciss/c0d0p3      0.00         0.00         0.00      31100          0
cciss/c0d0p4      0.00         0.00         0.00         64          0
cciss/c0d0p5      0.90         0.14        16.81    2491864  300527424
cciss/c0d0p6      0.02         0.15         0.24    2684050    4324408
cciss/c0d0p7      2.24         0.15        25.02    2695070  447222328
cciss/c0d1        2.24         0.83        42.19   14824504  754201732
dm-0              1.79         0.11        14.28    2018882  255339136
dm-1              0.00         0.35         0.00    6254242         16
dm-2              0.00         0.00         0.00      33498         16
dm-3              0.00         0.00         0.00      10658         10
dm-4              0.12         0.00         0.23      12356    4164722
dm-5              0.00         0.00         0.00      31098         16
dm-6              0.00         0.00         0.00      28720         10
dm-7              3.34         0.01        26.73     131834  477861216
dm-8              0.00         0.03         0.00     453042         16
dm-9              0.00         0.00         0.00      30994         16
dm-10             0.00         0.00         0.00      30970         16
dm-11             0.12         0.14         0.94    2516218   16768088
dm-12             0.00         0.00         0.00      29768         10
dm-13             0.00         0.02         0.00     269878        572
dm-14             0.00         0.16         0.00    2865306      67808
dm-15             0.01         0.00         0.04      37146     689312
dm-16             1.38         0.06        11.03    1090946  197131104
dm-17             0.72         0.05         5.74     887698  102551200
dm-18             0.00         0.00         0.00      31026         16
dm-19             0.00         0.02         0.01     367610     155776
dm-20             0.00         0.00         0.00      33482         16

These systems don't have iotop or atop, which is quite unfortunate.

Can you guys offer any pointers as to how I can find what is causing this increased I/O activity?

Thanks!

[-0-] · 20-06-2013 01:46PM

More I/O stat output, I'm only outputting every second so I'm sure I'll miss it going to 100% in real time, but we do have a tool that is alerting us when it hits 100%. when the alert came out, my iostat command grabbed this (65% util):

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00   828.00    0.00 10327.00     0.00 89240.00     8.64    46.30    4.48   0.06  65.80
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00 11155.00     0.00 89240.00     8.00   143.26   12.84   0.06  65.80
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-8              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-9              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-10             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-11             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-12             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

[-0-] · 20-06-2013 02:43PM

sdb               0.00   744.00    0.00 6650.00     0.00 59096.00     8.89   136.70   18.08   0.15 100.00
dm-2              0.00     0.00    0.00 7463.00     0.00 59704.00     8.00   153.08   18.08   0.13 100.00
sdb               0.00   116.00    0.00  554.00     0.00  5136.00     9.27   147.64  249.87   1.81 100.00
dm-2              0.00     0.00    0.00  680.00     0.00  5440.00     8.00   174.66  237.63   1.47 100.10
sdb               0.00   101.00    0.00  556.00     0.00  5456.00     9.81   146.65  280.98   1.80 100.00
dm-2              0.00     0.00    0.00  657.00     0.00  5256.00     8.00   173.45  283.56   1.52 100.00
sdb               0.00   227.00    0.00  506.00     0.00  5272.00    10.42   180.75  303.91   1.98 100.10
dm-2              0.00     0.00    0.00  736.00     0.00  5888.00     8.00   314.23  310.59   1.36 100.00
sdb               0.00   124.00    0.00  504.00     0.00  4496.00     8.92   150.78  264.77   1.99 100.10
dm-2              0.00     0.00    0.00  648.00     0.00  5184.00     8.00   186.58  229.06   1.54 100.10
sdb               0.00    69.00    0.00  504.00     0.00  5120.00    10.16   153.53  326.82   1.98 100.00
dm-2              0.00     0.00    0.00  531.00     0.00  4248.00     8.00   205.07  445.68   1.88 100.00
sdb               0.00    42.00    0.00  519.00     0.00  4528.00     8.72   144.24  272.94   1.93 100.10
dm-2              0.00     0.00    0.00  554.00     0.00  4432.00     8.00   159.14  281.20   1.81 100.10
sdb               0.00    64.00    0.00  458.00     0.00  4064.00     8.87   145.51  309.05   2.18 100.00
dm-2              0.00     0.00    0.00  506.00     0.00  4048.00     8.00   157.17  301.00   1.98 100.00
sdb               0.00    48.00    0.00  405.00     0.00  3504.00     8.65   182.75  357.02   2.47 100.10
dm-2              0.00     0.00    0.00  567.00     0.00  4536.00     8.00   197.91  273.99   1.77 100.10
sdb               0.00   971.00    0.00  302.00     0.00  5480.00    18.15   200.53  542.03   3.31 100.00
dm-2              0.00     0.00    0.00 1149.00     0.00  9192.00     8.00   474.96  325.96   0.87 100.00
sdb               0.00    13.00    0.00  273.00     0.00  7112.00    26.05   145.04  796.65   3.66 100.00
dm-2              0.00     0.00    0.00  270.00     0.00  2160.00     8.00   242.51 1416.94   3.70 100.00
sdb               0.00    25.00    0.00  413.00     0.00  3464.00     8.39   144.67  354.37   2.42 100.00
dm-2              0.00     0.00    0.00  450.00     0.00  3600.00     8.00   151.72  340.05   2.22 100.00
sdb               0.00    15.00    0.00  440.00     0.00  3688.00     8.38   144.86  333.67   2.27 100.00
dm-2              0.00     0.00    0.00  446.00     0.00  3568.00     8.00   151.40  347.17   2.24 100.00

I'm going to install iotop to see what the craic is. More details to follow.

Khannie · 20-06-2013 02:47PM

[-0-] wrote: »

I'm going to install iotop to see what the craic is. More details to follow.

That was going to be my suggestion. Should nail it in seconds.

[-0-] · 24-06-2013 02:12PM

So it appears this is caused by jbd2, the journal block device.

I'm wondering if we could maybe disable journaling? There may be one of my processes performing indexing regularly which in turn causes jbd2 to go mental.

Has anyone encountered this before? Google shows up lots of threads whereby people are experiencing the same thing.

[-0-] · 24-06-2013 03:31PM

I should also state that these systems are all ext4.

The following article is interesting: http://www.phoronix.com/scan.php?page=article&item=ext4_linux35_tuning&num=1

Disabling journaling is risky business and can lead to a completely corrupted disk if the box has power issues. Not something we want with Production systems which need to be up 24/7.

[-0-] · 25-06-2013 12:38AM

Our apps are trigger high write spikes, which in turn is causing jbd2 to block I/O.

Next steps are to increase the I/O queue size, and the consider disabling journaling if that does not work.

Khannie · 25-06-2013 09:27AM

Interested to hear how this plays out. I'd bet money there is something configurable in the kernel to ease your journaling woes. Have you tried optimising the filesystem mount options (noatime, that kind of thing)?

[-0-] · 25-06-2013 04:14PM

Khannie wrote: »

Interested to hear how this plays out. I'd bet money there is something configurable in the kernel to ease your journaling woes. Have you tried optimising the filesystem mount options (noatime, that kind of thing)?

Sure have.

No joy with that.

I'm hoping increasing nr_requests to 100000 (it's at 128 now) works.

Will know shortly.

[-0-] · 26-06-2013 07:19PM

Updated nr_requests to 2048, and disabled journaling.

Now we wait.

Khannie · 26-06-2013 07:32PM

Ah disabling journaling will almost certainly box it off, but as you said earlier - at a cost. Have you battery backed raid cards perchance?

[-0-] · 26-06-2013 09:32PM

Khannie wrote: »

Ah disabling journaling will almost certainly box it off, but as you said earlier - at a cost. Have you battery backed raid cards perchance?

We do indeed, and they have a backup supply as well.

I would rather be able to tweak journaling instead of turning it off. Turning it off now to see what happens is just a way of us trying to identify if there's something going on with the hardware. Maybe the BUS is slow, for instance.

I'll know how we look by 9pm EST tonight.

[-0-] · 27-06-2013 12:26PM

No dice. It's still happening with journaling disabled.

Digging into the application now to see what is going on. Looks like it does some sort of cleanup at this time, and I'm seeing differences between 64 bit configs & arguments vs 32 bit.

The fun continues.

[-0-] · 26-07-2013 03:33PM

Hey there,

Do you know of a tool that will gather OS settings, such as:
Drive mount options
buffer i/o settings
RAM
CPU
Library versions
Kernel config parameters
ETC

Basically, I'm still seeing this issue (although some tuning has made a small improvement), and our vendor is not seeing the issue.

I want to compare our OS settings to theirs, without having to grab all of these settings manually and scripting it. I was wondering if a tool existed that would do this.

Yes, it is Friday and I am lazy!

[-0-] · 26-07-2013 07:28PM

Using fio.... very interesting behaviour.

64 bit ext3
read : io=5120.0MB, bw=2203.1MB/s, iops=563993, runt= 2324msec
read : io=5120.0MB, bw=1570.8MB/s, iops=401938, runt= 3261msec
write: io=5120.0MB, bw=169289KB/s, iops=42322, runt= 30970msec

64 bit ext4
read : io=5120.0MB, bw=162158KB/s, iops=40539, runt= 32332msec
read : io=5120.0MB, bw=914828KB/s, iops=228707, runt= 5731msec
write: io=5120.0MB, bw=806473KB/s, iops=201618, runt= 6501msec

10-10-20 · 29-07-2013 09:18PM

Do:

cat /sys/block/*/queue/scheduler

and report back.

[-0-] · 30-07-2013 04:30PM

64bit ext4:

201% cat /sys/block/*/queue/scheduler
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]

64bit ext3:

201% cat /sys/block/*/queue/scheduler
none
none
none
none
noop anticipatory deadline [cfq]
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
none
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]

[-0-] · 30-07-2013 04:32PM

For the particular partitions in question (dm-2), there is none.

For example:
fmdnwsuat31:fmdadm,207% cat /sys/block/dm-2/queue/scheduler
none

10-10-20 · 30-07-2013 04:39PM

Try this:
Change to noop or deadline instead of CFQ. We found that CFQ queues IO for 5 seconds before flushing down to disk, causing latency spikes with fio.

1. At boot time, the kernel parameter elevator=noop is typically appended to a kernel menu line in the appropriate boot path (such as /boot/grub/menu.lst, /boot/grub/grub.conf)

2. At runtime, change the scheduler by echoing the name of the scheduler into /sys/block/$devicename/queue/scheduler, where the device name is the basename of the block device (such as sda), for /dev/sda. Some versions of Linux also apply the scheduler to multipath devices, and these also need to be amended, such as the following examples:

echo noop > /sys/block/sda/queue/scheduler
echo noop > /sys/block/dm-0/queue/scheduler

IBM doc

[-0-] · 30-07-2013 05:51PM

Lovely stuff, I'll give that a go!

10-10-20 · 30-07-2013 09:00PM

The scheduler in square brackets is the chosen scheduler [cfq].
What I don't get is why you aren't showing schedulers for all of your dm-x devices, but here we have a clue:
http://blog.famzah.net/2010/01/25/why-sys-block-dm-0-queue-scheduler-exists-on-my-linux-system/

A couple of ideas & questions:
What kernel are you running?
What does multipath -ll look like?
Any idea as to what type of IO is generated by the application? fsynch...? direct, cached...
Try forcing the scheduler to noop in the kernel line in grub and run a test and see what results you obtain. Perhaps leave "iostat -mx 5" running and piped into a logfile for the duration of the test.
If you run fio with a 4k randrw and enable logging and then use the fio gnu plotter to make a graph, do you see similar spikes? Make the test file size (filesize) and the (size) larger than the target device cache.

[global]
filesize=4G
size=4G
....
runtime=2m
write_bw_log
write_lat_log
write_iops_log

[spec_4k_randrw]
randrw
bs=4k

Run fio_generate_plots from the directory from where fio was run to create the PNG's.

Khannie · 31-07-2013 09:25AM

10-10-20 wrote: »

We found that CFQ queues IO for 5 seconds before flushing down to disk

Why in the name of Jesus would it do that? Seems mental. I'm sure there's some reason for it though.

10-10-20 · 31-07-2013 09:43AM

I'm told that CFQ likes to maintain a queue of IO so that it can merge (re-order) the IO's before committing to disk. If you watch the rrqm and wrqm stats for CFQ over noop, noop doesn't merge as much IO as it doesn't have the IO in a buffer for it to merge IO's.
Apparently CFQ is better for individual physical disks where it tries to apply some logic so that it can re-order blocks to match the LBA as best it can (like NCQ), but this doesn't always work best where there is a volume on a RAID card and the LBA's aren't sequential. But I'm not 100% on that.
But look for articles on where noop and deadline perform better than CFQ, usually these are latency sensitive applications.

[-0-] · 31-07-2013 01:28PM

10-10-20 wrote: »

A couple of ideas & questions:
What kernel are you running?
What does multipath -ll look like?
Any idea as to what type of IO is generated by the application? fsynch...? direct, cached...
Try forcing the scheduler to noop in the kernel line in grub and run a test and see what results you obtain. Perhaps leave "iostat -mx 5" running and piped into a logfile for the duration of the test.
If you run fio with a 4k randrw and enable logging and then use the fio gnu plotter to make a graph, do you see similar spikes? Make the test file size (filesize) and the (size) larger than the target device cache.

[global]
filesize=4G
size=4G
....
runtime=2m
write_bw_log
write_lat_log
write_iops_log

[spec_4k_randrw]
randrw
bs=4k

Run fio_generate_plots from the directory from where fio was run to create the PNG's.

1. Using Linux 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 11:18:01 PST 2012 x86_64 x86_64 x86_64 GNU/Linux
2. Command not found.
3. No idea regarding the IO, however I have asked our vendor as they wrote the binary which is causing this.
4. I'll try all four: noop, deadline, anticipatory and cfq. I wish to benchmark them all and go with the best one.

The comments section in the link your provided say the following:

Things seem even better on my test box with Ubuntu server having kernel “3.2.0-29-generic”. Kernel developers cleared up all confusion. The “dm-0″ scheduler has only one possible value when used with LVM — “none”. Example:

famzah@VBox:~$ cat /sys/block/dm-0/queue/scheduler
none

I'm not sure if that kernel version is an option for us yet - is what this guy says true, is it impossible to change the scheduler for dm devices. I hope that's not the case.

[-0-] · 31-07-2013 04:11PM

There is a “/etc/multipath.conf” configuration file where the local disks not residing on SAN have to be “blacklisted” to prevent dm-multipath from creating additional primary and secondary paths to the disks. The multipathing feature is not needed on local disks because local disk is always RAID0+1, RAID5 or RAID6 to satisfy the High Availability Requirements.

So the first place to look is /etc/multipath.conf to make sure the local disk is blacklisted.

I’ve just completed that. There is no SAN here, Multipathing agents are not running and not installed.

As a result it's safe to say dm2 is using cfq.

10-10-20 · 31-07-2013 07:37PM

So what are the disks which are exhibiting the performance issue attached to? A PCI RAID controller with JBOD?
Can you play with the cache settings? Have you tried WT, WB and disabled? Is cache-on disk enabled?
What was the result when you disabled the
journaling on ext4?

In one of you posts on the previous page you mentioned setting an alarm to notify you when the %util went to 100%. Ideally that's where you want to me at any time of high load as that's showing you that the path is 100% optimised. If you can't him 100% I'd be looking at why.

I can see from iostat that you're doing 4k block writes, so is there a chance that you are spindle-bound?
15 disks will do 200 IOPS per disk, so 1577 IOPS = ~7.6 physical disks (plus n for parity).
Are these SATA disks?

[-0-] · 31-07-2013 08:10PM

10-10-20 wrote: »

So what are the disks which are exhibiting the performance issue attached to? A PCI RAID controller with JBOD?
Can you play with the cache settings? Have you tried WT, WB and disabled? Is cache-on disk enabled?
What was the result when you disabled the
journaling on ext4?

In one of you posts on the previous page you mentioned setting an alarm to notify you when the %util went to 100%. Ideally that's where you want to me at any time of high load as that's showing you that the path is 100% optimised. If you can't him 100% I'd be looking at why.

I can see from iostat that you're doing 4k block writes, so is there a chance that you are spindle-bound?
15 disks will do 200 IOPS per disk, so 1577 IOPS = ~7.6 physical disks (plus n for parity).
Are these SATA disks?

/dev/sdb is the main culprit, and also dm-2.

201% dmesg|grep sdb
sd 0:0:0:1: [sdb] 860051248 512-byte logical blocks: (440 GB/410 GiB)
sd 0:0:0:1: [sdb] Write Protect is off
sd 0:0:0:1: [sdb] Mode Sense: 6b 00 00 08
sd 0:0:0:1: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
 sdb:
 sdb1
sd 0:0:0:1: [sdb] Attached SCSI disk
dracut: Scanning devices sda3 sdb1  for LVM logical volumes vglocal/root

I can certainly play with the cache settings, although I have not tried any of the settings you have asked about.

When I disabled journaling, there was no difference. The only difference was that iotop was able to tell me which process was responsible for this behaviour.

Not quite sure what spindle-bound is, sorry! Hehe.

Here's some further output from dmesg:

scsi0 : hpsa
hpsa 0000:03:00.0: RAID              device c0b3t0l0 added.
hpsa 0000:03:00.0: Direct-Access     device c0b0t0l0 added.
hpsa 0000:03:00.0: Direct-Access     device c0b0t0l1 added.
scsi 0:3:0:0: RAID              HP       P410i            5.70 PQ: 0 ANSI: 5
scsi 0:0:0:0: Direct-Access     HP       LOGICAL VOLUME   5.70 PQ: 0 ANSI: 5
scsi 0:0:0:1: Direct-Access     HP       LOGICAL VOLUME   5.70 PQ: 0 ANSI: 5
ata_piix 0000:00:1f.2: version 2.13
ata_piix 0000:00:1f.2: PCI INT B -> GSI 17 (level, low) -> IRQ 17
ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
ata_piix 0000:00:1f.2: setting latency timer to 64
scsi1 : ata_piix
scsi2 : ata_piix
ata1: SATA max UDMA/133 cmd 0x1080 ctl 0x1088 bmdma 0x10a0 irq 17
ata2: SATA max UDMA/133 cmd 0x1090 ctl 0x1098 bmdma 0x10a8 irq 17
ata2.00: SATA link down (SStatus 4 SControl 300)
ata2.01: SATA link down (SStatus 4 SControl 300)
ata1.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.01: SATA link down (SStatus 4 SControl 300)
ata1.01: link offline, clearing class 3 to NONE
ata1.00: ATAPI: DV-28S-W, C.2D, max UDMA/100
ata1.00: configured for UDMA/100
scsi 1:0:0:0: CD-ROM            TEAC     DV-28S-W         C.2D PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 585871964 512-byte logical blocks: (299 GB/279 GiB)
sd 0:0:0:1: [sdb] 860051248 512-byte logical blocks: (440 GB/410 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 6b 00 00 08
sd 0:0:0:1: [sdb] Write Protect is off
sd 0:0:0:1: [sdb] Mode Sense: 6b 00 00 08
sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:1: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
 sdb:
 sda: sda1 sda2 sda3
 sdb1
sd 0:0:0:1: [sdb] Attached SCSI disk
sd 0:0:0:0: [sda] Attached SCSI disk
sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 1:0:0:0: Attached scsi CD-ROM sr0
dracut: Scanning devices sda3 sdb1  for LVM logical volumes vglocal/root
dracut: inactive '/dev/appsvg/stub' [4.00 MB] inherit
dracut: inactive '/dev/appsvg/appsfs' [146.48 GB] inherit
dracut: inactive '/dev/appsvg/apps2fs' [48.83 GB] inherit
dracut: inactive '/dev/appsvg/nwtextfs' [195.31 GB] inherit
dracut: inactive '/dev/vglocal/fisc' [6.00 GB] inherit
dracut: inactive '/dev/vglocal/var' [6.00 GB] inherit
dracut: inactive '/dev/vglocal/log' [4.00 GB] inherit
dracut: inactive '/dev/vglocal/tmq' [4.00 GB] inherit
dracut: inactive '/dev/vglocal/root' [12.00 GB] inherit
dracut: inactive '/dev/vglocal/perf' [4.00 GB] inherit
dracut: inactive '/dev/vglocal/home' [2.00 GB] inherit
dracut: inactive '/dev/vglocal/backup' [2.00 GB] inherit
dracut: inactive '/dev/vglocal/opt' [2.00 GB] inherit
EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts:
dracut: Remounting /dev/mapper/vglocal-root with -o noatime,ro
EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts:
dracut: Mounted root filesystem /dev/mapper/vglocal-root
dracut: Switching root
readahead-collector: starting
EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts:
EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts:
EXT4-fs (dm-11): mounted filesystem with ordered data mode. Opts:
EXT4-fs (dm-9): mounted filesystem with ordered data mode. Opts:
EXT4-fs (dm-8): mounted filesystem with ordered data mode. Opts:
EXT4-fs (dm-10): mounted filesystem with ordered data mode. Opts:
EXT4-fs (dm-12): mounted filesystem with ordered data mode. Opts:
EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts:
EXT4-fs (dm-7): mounted filesystem with ordered data mode. Opts:
kjournald starting.  Commit interval 5 seconds
EXT3-fs (dm-1): using internal journal
EXT3-fs (dm-1): mounted filesystem with ordered data mode
EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts:
EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts:
EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts:

I hope that info helps.

[-0-] · 31-07-2013 08:18PM

Do any of these settings stand out as problem areas?

223% cat scheduler
noop anticipatory deadline [cfq]
224% cat read_ahead_kb
128
225% cat nr_requests
128
226% cat max_sectors_kb
512
227% cat max_hw_sectors_kb
512
228% cat rq_affinity
1
229% cat rotational
1
230% cat physical_block_size
512
231% cat optimal_io_size
0
232% cat nomerges
0
233% cat minimum_io_size
512
234% cat max_segment_size
65536
235% cat max_segments
543
236% cat logical_block_size
512
237% cat iostats
1
238% cat hw_sector_size
512
239% cat discard_zeroes_data
0
240% cat discard_max_bytes
0
241% cat discard_granularity
0
242% cat add_random
1

[-0-] · 01-08-2013 08:42PM

Regarding the writing behviour, it is fsynch.

[-0-] · 14-08-2013 07:18PM

Making an app change to use fdatasync instead of fsync. More to follow.

[-0-] · 15-08-2013 03:26AM

Fixed.

I can't post the fix due to NDA ****, but fsync to fdatasync helped. It was an application issue, due to how they handle a certain item. I figured it out when I completely reverse engineered their binary (because they wouldn't give me parts of their code) and I observed some crap that shouldn't be happening for every write.

Job done.

Thank you all kindly!

On the plus side, I learned a hell of a fecking lot!

High i/o usage

Comments