'RAID management support coming in OpenBSD 3.8'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openbsd-misc
Subject:    RAID management support coming in OpenBSD 3.8
From:       Theo de Raadt <deraadt () cvs ! openbsd ! org>
Date:       2005-09-09 21:18:58
Message-ID: 200509092118.j89LIw0O003389 () cvs ! openbsd ! org
[Download RAW message or body]

I thought it was time to give some details about the (minimal) RAID
management stuff coming in OpenBSD 3.8.  Most of this code has been
written by Marco Peereboom with some help from David Gwynne and
Michael Shalayeff.  Moral support and direction from me and Bob Beck
who has a pile of these AMI setups.

Here is a demonstration.  First, a piece of dmesg output, so that we can
see which device is going to be handled:

ami0 at pci1 dev 8 function 0 "Symbios Logic MegaRAID" rev 0x01: apic 9 int 8 (irq 10) Dell 518/64b/lhc
ami0: FW 350O, BIOS v1.09, 128MB RAM
ami0: 2 channels, 0 FC loops, 2 logical drives
scsibus2 at ami0: 40 targets
sd0 at scsibus2 targ 0 lun 0: <AMI, Host drive #00, > SCSI2 0/direct fixed
sd0: 349400MB, 44542 cyl, 255 head, 63 sec, 512 bytes/sec, 715571200 sec total
sd1 at scsibus2 targ 1 lun 0: <AMI, Host drive #01, > SCSI2 0/direct fixed
sd1: 349400MB, 44542 cyl, 255 head, 63 sec, 512 bytes/sec, 715571200 sec total
scsibus3 at ami0: 16 targets
ses0 at scsibus3 targ 6 lun 0: <DELL, PV22XS, E.17> SCSI3 3/processor fixed
scsibus4 at ami0: 16 targets
ses1 at scsibus4 targ 6 lun 0: <DELL, PV22XS, E.17> SCSI3 3/processor fixed

OK, this is an AMI raid controller.  It has come up with 3 scsi
busses; one for the virtual RAID volumes which there are two of, and
two SCSI busses which match the real SCSI busses that are on the
controller (to expose the SES or SAFTE enclosure management
controllers, and so that we can talk pass-through to the real disks).

If we wish to probe further details, we use

# bioctl ami0
Volume  Status     Size           Device  
 ami0 0 Online       366372454400 sd0     RAID5
      0 Online        73403465728 0:0.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      1 Online        73403465728 0:2.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      2 Online        73403465728 0:4.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      3 Online        73403465728 0:8.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      4 Online        73403465728 1:10.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      5 Online        73403465728 1:12.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
 ami0 1 Online       366372454400 sd1     RAID5
      0 Online        73403465728 0:1.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      1 Online        73403465728 0:3.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      2 Online        73403465728 0:5.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      3 Online        73403465728 1:9.0   ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      4 Online        73403465728 1:11.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      5 Online        73403465728 1:13.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
 ami0 2 Unused        73403465728 1:14.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
 ami0 3 Hot spare     73403465728 1:15.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>

Here we can see which physical drives are on the controller, and how
they are configured into volumes.  Two volumes have been created, both
of which are rather large.  The drives are on two scsi busses, for
instance, 1:12.0 means SCSI bus 1, scsi target 12, lun 0.  With
additional options to bioctl(4), we could find out some more (mostly
irrelevant) information.

There are also two additional devices which we know about: one is
unused (ie. not registered with the AMI firmware at the moment), and
one is a Hot Spare.

Let's cause some havoc.  First, I want to pick a drive that I am going
to unplug, to mimic a failure.  Let's see... 1:9.0 looks good to me.

# bioctl -b 1.9 ami0

When I look at the array, one of the drives is now blinking.  I made
it blink just because I prefer to pull drives out of my sd1
filesystems rather than the sd0 filesystems.  And otherwise I wouldn't
be able to show off the blink support.  Anyways, I pull that
particular drive.

Immediately some churning starts, and if I re-run bioctl I can see what
has happened:

# bioctl ami0
Volume  Status     Size           Device  
 ami0 0 Online       366372454400 sd0     RAID5
      0 Online        73403465728 0:0.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      1 Online        73403465728 0:2.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      2 Online        73403465728 0:4.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      3 Online        73403465728 0:8.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      4 Online        73403465728 1:10.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      5 Online        73403465728 1:12.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
 ami0 1 Degraded     366372454400 sd1     RAID5
      0 Online        73403465728 0:1.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      1 Online        73403465728 0:3.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      2 Online        73403465728 0:5.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      3 Rebuild       73403465728 1:15.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      4 Online        73403465728 1:11.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      5 Online        73403465728 1:13.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
 ami0 2 Unused        73403465728 1:14.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>

Drive 1:15 automatically became a part of the "sd1" volume, and is
currently rebuilding.  If I access a filesysdtem on sd1, I will notice
that it is a little bit slower.

Of course the RAID array is beeping so loudly I think my ears are going to
burst, so I must shut it up:

# bioctl -a quiet ami0

When I reinsert the drive that I previously unplugged, I see:

# bioctl ami0 
Volume  Status     Size           Device  
 ami0 0 Online       366372454400 sd0     RAID5
      0 Online        73403465728 0:0.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      1 Online        73403465728 0:2.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      2 Online        73403465728 0:4.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      3 Online        73403465728 0:8.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      4 Online        73403465728 1:10.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      5 Online        73403465728 1:12.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
 ami0 1 Degraded     366372454400 sd1     RAID5
      0 Online        73403465728 0:1.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      1 Online        73403465728 0:3.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      2 Online        73403465728 0:5.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      3 Rebuild       73403465728 1:15.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      4 Online        73403465728 1:11.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      5 Online        73403465728 1:13.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
 ami0 2 Unused        73403465728 1:9.0   ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
 ami0 3 Unused        73403465728 1:14.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>

Drive 1:9 has come back as "Unused".  Let's make it a Hot Spare, so that I can
use it later.

# bioctl -H 1:9 ami0
# bioctl ami0        
Volume  Status     Size           Device  
 ami0 0 Online       366372454400 sd0     RAID5
      0 Online        73403465728 0:0.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      1 Online        73403465728 0:2.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      2 Online        73403465728 0:4.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      3 Online        73403465728 0:8.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      4 Online        73403465728 1:10.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      5 Online        73403465728 1:12.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
 ami0 1 Degraded     366372454400 sd1     RAID5
      0 Online        73403465728 0:1.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      1 Online        73403465728 0:3.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      2 Online        73403465728 0:5.0   ses0   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      3 Rebuild       73403465728 1:15.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      4 Online        73403465728 1:11.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
      5 Online        73403465728 1:13.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
 ami0 2 Hot spare     73403465728 1:9.0   ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>
 ami0 3 Unused        73403465728 1:14.0  ses1   <MAXTOR  ATLAS15K2_73SCA JNZ6>

Now if I get another failure, there is a drive to perform a failover to again.

Earlier we had mentioned the SES and SAFTE enclosure monitors.  Their statistics
are also available as well.

# sysctl hw.sensors
hw.sensors.0=ses0, psu0, OK, indicator, On
hw.sensors.1=ses0, psu1, OK, indicator, On
hw.sensors.2=ses0, fan0, OK, percent, 33.33%
hw.sensors.3=ses0, fan1, OK, percent, 33.33%
hw.sensors.4=ses0, fan2, OK, percent, 33.33%
hw.sensors.5=ses0, fan3, OK, percent, 33.33%
hw.sensors.6=ses0, temp0, OK, temp, 26.00 degC / 78.80 degF
hw.sensors.7=ses0, temp1, OK, temp, 25.00 degC / 77.00 degF
hw.sensors.8=ses0, temp2, OK, temp, 27.00 degC / 80.60 degF
hw.sensors.9==ses0, temp3, OK, temp, 28.00 degC / 82.40 degF
hw.sensors.10=ses1, psu0, OK, indicator, On
hw.sensors.11=ses1, psu1, OK, indicator, On
hw.sensors.12=ses1, fan0, OK, percent, 33.33%
hw.sensors.13=ses1, fan1, OK, percent, 33.33%
hw.sensors.14=ses1, fan2, OK, percent, 33.33%
hw.sensors.15=ses1, fan3, OK, percent, 33.33%
hw.sensors.16=ses1, temp0, OK, temp, 26.00 degC / 78.80 degF
hw.sensors.17=ses1, temp1, OK, temp, 25.00 degC / 77.00 degF
hw.sensors.18=ses1, temp2, OK, temp, 27.00 degC / 80.60 degF
hw.sensors.19=ses1, temp3, OK, temp, 28.00 degC / 82.40 degF

We can use sensorsd(8) to watch these status indicators for problems.
When this code was first written, I used to toggle one of the RAID
enclosure power switches for kicks, just so that I could see the
values change.

I would like to make it clear that for 3.8, this support will only
work for the ami(4) raid controllers.  Hopefully some other people
will come helping us to make controllers from other vendors work too.
About half of the code is a framework to permit RAID controller
drivers to do the right thing.

The amount of code to support this is very small compared to typical
vendor RAID management solutions.  The functionality supplied is also
very basic, almost minimal.  But this is done like this on purpose,
since we believe that we could support this functionality on all RAID
controllers in the same way, without special "but that controller is
so different" mindsets entering the picture.  RAID management should
(and can be) be no more complicated than ifconfig managing network
interfaces.  The typical administrator needs

	to know when something is wrong
	automatic Hot Swap allocation on volume degrade
	to blink and unblink drives (to find them),
	to be able to upgrade newly inserted drives to Hot Swap status
	to shut off the damn beeper.

Everything else is just icing.  These are the micro operations which
really matter.  All other operations on the volumes make it OK to
reboot into the card BIOS.

At this point in this mail, I would love to show the output of the
RAID array back in normal status, but it will take a couple of hours
for that volume to be rebuilt.

If anyone is serious about attempting to write the back-end code for
another RAID driver already in our tree, please contact
marco@openbsd.org.  But don't bother him with other stuff...

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic