HDD - weird behaviour

Khannie · 27-11-2007 07:50PM #1

Soo........

I have this hard drive. It was recently part of a LVM setup, but now I just want to use it normally. Initially I copied all of the data off the LVM to my raid 5 (a few threads below). That went fine. There were some issues with the filesystem, but I suspected that was my own fault for ignoring fsck requests for a check.

So anyway....I had /terrible/ trouble removing it from the LVM. I eventually compiled LVM support out of my kernel. Was troubled me though, was that there was a /dev/nvidia/<letters_and_stuff> device listed with 183GB (same size as my ~200G drive) that I couldn't explain.

This is all gone, so I fdisk'd the drive, but couldn't write a filesystem to it....the drive was always "busy". I thought that this might have been some hangover from LVM, so used dd if=/dev/sda of=/dev/sdb to try and overcome this. Stupidly, I ctrl + c'd this after about 2 hours, because I thought it should have easily copied over all the data from /dev/sda (74G raptor).

So....now fdisk refuses to talk to it:

computa ~ # fdisk /dev/sdb

Unable to read /dev/sdb

and dd says the device is full after 512 bytes.

computa ~ # dd if=/dev/zero of=/dev/sdb bs=512 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000158639 s, 3.2 MB/s

computa ~ # dd if=/dev/zero of=/dev/sdb bs=512 count=2
dd: writing `/dev/sdb': No space left on device
2+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000236404 s, 2.2 MB/s

I recently (i.e. last week) ran smartctl on it with absolutely _no_ problems showing....but now it seems f*cked.....

computa ~ # smartctl -a /dev/sdb
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     External Disk 0
Serial Number:    0000000__________0_A
Firmware Version: RGL10364
User Capacity:    512 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 2
Local Time is:    Tue Nov 27 18:47:42 2007 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Disabled

Warning! SMART Attribute Data Structure error: invalid SMART checksum.
Warning! SMART Attribute Thresholds Structure error: invalid SMART checksum.

.... this continues on in a similarly negative fashion. I'm hoping it's just a configuration issue.

Anyone got any suggestions?

Khannie · 28-11-2007 12:48PM

update:

I switched the sata port that the drive was connected to......fdisk worked and smartctl says the drive is healthy, but I couldn't make a file system. Now the box has been booted fresh this morning and I can't mount either of the drives (says they're busy, but it's not the mount point for sure). I mounted /dev/sdd* no problem with the suse livecd yesterday.

parted has a /dev/sdb listed, but I have only 3 hard drives on the box:

(parted) print all                                                        
Model: ATA WDC WD740GD-50FL (scsi)
Disk /dev/sda: 74.4GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type      File system  Flags
 1      32.3kB  107MB   107MB   primary   ext3         boot 
 2      107MB   72.3GB  72.1GB  primary   ext3              
 3      72.3GB  74.3GB  2097MB  extended                    
 5      72.3GB  74.3GB  2097MB  logical   linux-swap        


Error: /dev/sdb: unrecognised disk label                                  

Model: ATA SAMSUNG SP2004C (scsi)
Disk /dev/sdc: 200GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type     File system  Flags
 1      32.3kB  2056MB  2056MB  primary  ext2         boot 
 2      2056MB  4113MB  2056MB  primary  linux-swap        
 3      4113MB  200GB   196GB   primary  xfs               


Model: ATA SAMSUNG SP2004C (scsi)
Disk /dev/sdd: 200GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type     File system  Flags
 1      32.3kB  2056MB  2056MB  primary  ext3         boot 
 2      2056MB  4113MB  2056MB  primary                    
 3      4113MB  200GB   196GB   primary                    


Error: /dev/md0: unrecognised disk label                                  

(parted)

edit: /dev/md0 is for my raid 5 array, but it's physically disconnected atm. It is created by uuid anyway.

Khannie · 28-11-2007 10:44PM

Alright lads....these drives mount without any issues in a live cd environment. This is doing my f*cking noodle in.

Somebody has to know something.

pid() · 29-11-2007 06:33AM

OK - at least you know they are healthy seeing as the live cd can mount them. It could be a lot worse.

Can you post exactly how you are trying to mount these drives (the mount command with flags), and the output you get?

niallb · 29-11-2007 11:02AM

What distribution and kernel version are you using? Which LVM version?
Can you go back to the kernel that had LVM support in it?
What do 'lvscan' or 'pvdisplay' say?
Are you certain the new kernel can't run those commands?
Are there any entries in /etc/fstab with UUID identifiers?

fdisk it under the livecd and put some new partitions on it,
or even start by clearing the partition table (like dd should have done...)

Seems pretty certain at least it's just configuration.

Khannie · 29-11-2007 04:46PM

pid() wrote: »

OK - at least you know they are healthy seeing as the live cd can mount them. It could be a lot worse.

Definitely.

pid() wrote: »

Can you post exactly how you are trying to mount these drives (the mount command with flags), and the output you get?

They're currently disconnected. I only have room for 5 connected drives and I have the raid 5 and normal boot drive on atm, but it was just a standard mount command:

mount /dev/sdXn /mountpoint

error was like: device or mountpoint busy

What bugged me was the existance of /dev/sdb. No idea why it's there.

niallb wrote:

What distribution and kernel version are you using? Which LVM version?

Kernel is 2.6.22. Distro is gentoo.

I had 2.02.10 installed.

niallb wrote:

Can you go back to the kernel that had LVM support in it?

yeah, I have a 2.6.18 kernel that I used for donkeys that has lvm support in it.

niallb wrote:

What do 'lvscan' or 'pvdisplay' say?

I can't run them....When I try to install the lvm2 utilities again, I get a dependency failure....this could be a nice clue.....

computa ~ # emerge --pretend lvm2

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild U ] sys-fs/device-mapper-1.02.22-r5 [1.02.10-r1]
[ebuild N ] sys-fs/lvm2-2.02.28-r2 USE="readline -clvm -cman -gulm -nolvm1 -nolvmstatic (-selinux)"
[blocks B ] <sys-fs/udev-115-r1 (is blocking sys-fs/device-mapper-1.02.22-r5)

niallb wrote:

Are you certain the new kernel can't run those commands?

Yeah, I've removed the lvm2 utilities and removed the "dolvm" kernel parameter from the grub kernel params line (that bit is probably specific to gentoo).

niallb wrote:

Are there any entries in /etc/fstab with UUID identifiers?

Nah, I use /dev for fstab, though I probably should get my finger out and migrate.

fstab has:

/dev/sda1               /boot                   ext3            noauto,noatime  1 2
/dev/sda2               /                       ext3            noatime         0 1
/dev/sda5               none                    swap            sw              0 0
/dev/dvd                /mnt/dvd                iso9660         noauto,ro       0 0
/dev/dvdrw              /mnt/dvdrw              iso9660         noauto,ro       0 0
/dev/md0                /home/paul/media        xfs             noatime         1 2


# NOTE: The next line is critical for boot!
proc                    /proc           proc            defaults        0 0

# glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for 
# POSIX shared memory (shm_open, shm_unlink).
# (tmpfs is a dynamically expandable/shrinkable ramdisk, and will
#  use almost no memory if not populated with files)
shm                     /dev/shm        tmpfs           nodev,nosuid,noexec     0 0

edit: Currently removing device mapper and updating my udev version.

Khannie · 30-11-2007 02:56AM

Ok....I booted into the old kernel (but with the new version of udev) and can mount both drives. So I set them up with some raid 1 goodness and am rsyncing as I type.

That poxy /dev/sdb is still there. I googled around a bit on this and I'm not the first to come across it. I'm gonna get to the bottom of it tbh.

Khannie · 30-11-2007 11:17AM

Well....there were some errors in the boot log about /dev/sdb, so I just rm'd it. :eek: Actually worked out just dandy.

All is well now in the old kernel (2.6.18, built by hand) with these two drives, but still can't mount them in the new one (2.6.22, built with genkernel).

Khannie · 01-12-2007 08:51PM

Well....I ended up just putting in the five drives (2 in raid 1 and 3 in raid 5) and deciding that I'd fix any problem from a livecd environment. After I had tried to boot from raid 1, it was falling on its arse, but that was my fault, hands up, I had ext3 specified in fstab (old drive) but the new array is xfs. This wasn't the cause of the original trouble or anything.

So....long story short, I've no idea what caused the original problem but I have a working distro (at least in the 2.6.18 kernel....just about to reboot to check out the 2.6.22 one). edit: 2.6.22 failed. Not arsed looking into it atm. Working distro = happy Khannie. For now.

For what it's worth: Raid 1 seems no better for read performance than a single drive. I expected the striping (for read) to speed things considerably. Raid 5 still performs stellarly.

Snowbat · 01-12-2007 10:59PM

RAID 1 is mirroring. Did you want RAID 0?

Khannie · 02-12-2007 03:10AM

No. Definitely not. I'm too old to go with raid 0.

The mirror acts as a stripe for read purposes.

Snowbat · 02-12-2007 12:07PM

So it does - sorry, my bad. Here's one to chew on: these guys found RAID-1 reads slower than single disk reads for one benchmark process but faster than both single disk and RAID-0 as they added benchmark processes: http://catux.org/servidor/article-raid-en-angl-s-3.html

Khannie · 03-12-2007 11:55AM

No worries at all. That page is taking a while to come up, but I'll defo give it a read.

The system /feels/ slower than when I had the raptor, but it's hard to quantify. The redundancy is nice though (and cheap of course....2 x 200G drives = worth nothing).

edit: One thing I had read is that the raid 1 is good for seek / access times as the kernel will use whichever drives head is closest to the data you want.

Khannie · 04-12-2007 11:52AM

So....tbh, it doesn't really look like raid 1 does stripe reading. Read speeds are the same as for a single drive....but....You can read two files at once from the same filesystem without affecting performance.

i.e. 1 file takes 10 seconds to read. 2 files (same size) at the same time take 10 seconds to read, effectively doubling throughput. I have to say though, it's fairly rare for a home box to have two processes reading heavily from the hdd at once. Would be nice for a database or similar though.

HDD - weird behaviour

Comments