docs icon indicating copy to clipboard operation
docs copied to clipboard

md: document how to use software RAID with CoreOS

Open philips opened this issue 11 years ago • 49 comments

People may want to software RAID CoreOS. Can we do this today? Can we help with the coreos-install script?

/cc @marineam

philips avatar Jun 04 '14 17:06 philips

No, I would rather not do raid in the current installer script. If we want to start working on a full featured Linux installer we should do that in a language that isn't shell.

marineam avatar Jun 04 '14 17:06 marineam

What about software raid outside of the installation script?

robszumski avatar Jun 04 '14 18:06 robszumski

At least it would be useful to document what needs to be done, if anything, besides building the raid and running the installer against it. I tried this but couldn't get my machine to boot.

jsierles avatar Jun 04 '14 18:06 jsierles

@robszumski software raid is what we are talking about.

@jsierles sounds like we have bugs to fix because my intent is to make that work.

marineam avatar Jun 04 '14 18:06 marineam

Any news on the software raid documentation?

Would be rather useful

nekinie avatar Aug 09 '14 14:08 nekinie

Also would see this as very useful functionality.

ghost avatar Aug 10 '14 13:08 ghost

Yes, it would be great :) @philips I saw this commit But yeah... Does anybody can tell me where to start if I want raid software? emerge mdadm ??

pierreozoux avatar Aug 13 '14 18:08 pierreozoux

@pierreozoux mdadm is included base images but we haven't played with it at all. Setting up non-root raid volumes should work just the same as any other distro. Same ol' mdadm command for creating and assembling volumes. You may need to enable mdadm.service if you want to assemble volumes on boot via /etc/mdadm.conf as opposed to using the raid-autodetect partition type and letting the kernel do it. It might be possible to move the root filesystem as long as the raid-autodetect partition type is used but for that you are almost certainly better off with using multi-device support in btrfs.

marineam avatar Aug 13 '14 20:08 marineam

What certainly won't work right now is installing all of coreos on top of software raid, the update and boot processes both assume the ESP and /usr partitions are plain disk partitions.

marineam avatar Aug 13 '14 20:08 marineam

@marineam Would this constraint of CoreOS also apply to btrfs-raids?

brejoc avatar Sep 02 '14 09:09 brejoc

@brejoc multi-device btrfs for the root filesystem should work

marineam avatar Sep 02 '14 23:09 marineam

What about migrating after install. Eg to RAID 1 from installed /dev/sda (one partition sda1 for demonstration) should be something like this from a Rescue CD or similar:

sfdisk -d /dev/sda | sfdisk /dev/sdb
sfdisk --id /dev/sdb 1 fd
mdadm --zero-superblock /dev/sdb1
mdadm --create /dev/md0 --level 1 --raid-devices=2 missing /dev/sdb1
mkfs.btrfs /dev/md0
mkdir /mnt/source;mount /dev/sda /mnt/source
mkdir /mnt/target;mount /dev/md0 /target
cp -a /mnt/source/* /mnt/target

Thereafter the disk mount configuration needs to be changed and the kernel root device in the bootloader, as well as the bootloader installed to both disks.

modify /mnt/target/etc/fstab Replace /dev/sda1 with /dev/md0 - but this is non-existent on CoreOS bootloader since 435 seems to be GRUB which helps but I cannot find a grub binary only config in /usr/boot

Thoughts?

warwickchapman avatar Oct 20 '14 20:10 warwickchapman

@warwickchapman just in case you finished your exploration into this topic and came up with a complete solution - or if someone else has - I'd appreciate if you shared it. I know too little about setting up and messing with RAID / mounts / boot in order to complete this myself. It's not a hard requirement for my use case but it would help being able to have RAID to be able to use both/all disks in a system. I understand it's also possible to set up a distributed file system like Ceph and let it manage the disks without RAID, and that would work for the use cases I have in mind, but for now I'm happy about any additional complexity I can avoid!

seeekr avatar Dec 11 '14 02:12 seeekr

As noted on IRC, for btrfs if raid0 or raid1 is all you need then it is easiest to just add devices to btrfs and rebalance: https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices

As for md raid if the partition types are the raid-autodetect type then the raid volume will be assembled automatically. But you can only put the ROOT filesystem on raid, we don't currently support putting the other partitions on anything other than plain disk devices.

marineam avatar Dec 11 '14 03:12 marineam

@marineam Perfect -- again thanks for the pointer, that was all I needed! Here's a gist with instructions, a script and helper file, plus some reference links to help people get this done quickly and easily :) I've verified that my instance reboots just fine but haven't checked beyond that if I might have messed up things, which could easily be the case given I'm not experienced at messing with the Linux file system!

https://gist.github.com/seeekr/1afa1e5ce3ad6e998367

seeekr avatar Dec 11 '14 03:12 seeekr

Thanks, very interesting - I left it at the point I got to and have stuck with OpenVZ for now. Will start testing again.

warwickchapman avatar Dec 11 '14 05:12 warwickchapman

forgive me my ignorance - does it mean that if i add drives using your script from the gist - i dont need to put any mount units in my cloud-config. Right now i'm testing it on virtualbox installation and looks like btfrs can see all drives (sudo btrfs fi show) after restart and no mount units.

agend07 avatar Dec 23 '14 21:12 agend07

@agend07 when adding devices to a btrfs filesystem they become a required part of that filesystem so all of them need to be available in order to mount the filesystem in the first place. The discovery of the devices happens automatically so there isn't any extra configuration.

marineam avatar Dec 23 '14 21:12 marineam

@agend07 I am not that knowledgeable about btrfs (and CoreOS) myself, but as far as I can tell no other changes are necessary, i.e. no additional mount points, and things just keep working after a restart. From the btrfs docs I also get the matching impression that btrfs is a "self-managing" system for lack of a better term.

seeekr avatar Dec 23 '14 21:12 seeekr

all clear now - thanks, i was just afraid that even after restart it works, it could stop working after system upgrade without something special in cloud-config. Now I can sleep better.

agend07 avatar Dec 23 '14 21:12 agend07

i believe docs are little misleading on this topic:

https://coreos.com/docs/cluster-management/debugging/btrfs-troubleshooting/#adding-a-new-physical-disk - links to: https://coreos.com/docs/cluster-management/setup/mounting-storage/ which looks like mount unit in cloud-config is the only way.

I'd probably never got it working without finding this issue.

agend07 avatar Dec 23 '14 22:12 agend07

@agend07 ah, yes, that is misleading, you either would want to mount device(s) as an independent volume or add them to the ROOT volume, not both. Also, referencing that ephemeral storage documentation in the context of adding devices to ROOT is also bad. You do NOT want to add ephemeral devices to the persistent ROOT because the persistent volume will become unusable as soon as the ephemeral devices are lost.

@robszumski ^^

marineam avatar Dec 23 '14 22:12 marineam

@agend07 I'm a little unclear what was misleading, a PR to that doc would be greatly appreciated :)

robszumski avatar Dec 24 '14 03:12 robszumski

@robszumski I'm not a native english speaker, and I'm not always sure if i understood everything correctly, and I'm probably not best person to write docs for other people, but:

here are the steps which worked for me:

  • find the new drives' name with 'sudo fdisk -l', lets say it's /dev/sdc
  • create one partition on this drive with 'sudo fdisk /dev/sdc' - then 'n' for new partition, choose all defaults with enter, then 'p' to see the changes, 'w' to write them to disk and quit fdisk
  • 'sudo mount /dev/disk/by-label/ROOT /mnt'
  • 'sudo btrfs device add /dev/sdc1 /mnt'
  • 'sudo btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt' - with link to btrfs-balance(8) man
  • 'sudo btrfs fi df /mnt' - to see if it worked
  • 'sudo umount /mnt' - clean up

Easiest thing to do would be:

  • remove the link to "mounting storage" from "adding a new physical disk"
  • add link to seeekr's gist: https://gist.github.com/1afa1e5ce3ad6e998367.git
  • add comment - that if all u need is raid 0, 1 or 10 + snapshots (5 and 6 are not stable as far as i understand) - you dont need to mess with software raid, lvm - btrfs has it all and more. Which is basically marineam comment from above, starting with "As noted on IRC ..."

actually another marineam comment starting with "What certainly won't work right now" says that coreos on top of software raid would not work at all - its aug 13 comment, not sure whats the status for today

I understand making docs that everybody would find helpful is not an easy task. Thanks for your work, and btw - can u speak polish, cause your lastname sounds polish.

agend07 avatar Dec 24 '14 10:12 agend07

Maybe I'm missing something, but when I install CoreOS using the latest stable release I get a large ext4 filesystem on sda9, not btrfs. Is the information in this thread outdated or beta-only?

tobia avatar Mar 31 '15 09:03 tobia

@tobia check this: https://coreos.com/releases/#561.0.0

agend07 avatar Mar 31 '15 13:03 agend07

Is there a guide for a root filesystem on raid1 now that we are on ext4?

zjeraar avatar Apr 07 '15 15:04 zjeraar

@agend07 thanks, it wasn't obvious from the rest of the documentation. I didn't understand what all that talk about btrfs was coming from! So let me add my voice to those asking for support for SW RAID1 on the root fs. Rotating disk failure in a server is a very common occurrence. Many leased bare metal servers come with two identical disks for this very purpose, but not with a HW RAID controller, which can have a monthly fee as large as the server itself.

It makes sense to let the user setup the raid themselves with mdadm, because the configurations are too many to have a script handle them. But then the install script, the boot process, and the update should accept—and keep—the given mdX as the root device.

tobia avatar Apr 07 '15 20:04 tobia

Haven't tried this in a very long time but it should be possible after writing the base disk image to change the ROOT partition type to raid autodetect, wipe the existing FS, and set up a md device on it, and then create a new filesystem, label it ROOT, and create a /usr directory in that filesystem. The rest of the fs should get initialized on boot. There is a major limitation though: we don't have a mechanism for applying updates to USR-A/USR-B across multiple disks or on top of a md device. This means that although you can use raid for ROOT for performance, volume size, or disaster recovery purposes it isn't going to help keep a server running in the event of a disk failure.

Given the complexity of doing this by hand right now and the limitation I'm not sure how worth it it is to do for ROOT. In many cases it will be much easier to place any data you need some durability for on a volume created separately from the coreos boot disk, that extra volume could be md, lvm, btrfs, etc.

marineam avatar Apr 07 '15 21:04 marineam

I read that btrfs was not stable, and so coreos changed to ext4 with overlayfs.

Maybe its time to have a look at btrfs. The main guy behind btrfs - Mr merlin - is funded by google after all.

ghost avatar Apr 07 '15 21:04 ghost