The XEN hypervisor uses big files (a couple of gigabytes) as filesystem
images for virtual machines. Unlike other virtualisation solutions XEN
does not impose its own internal structure on the image file. The big
file simply has to contain an ordinary ext3 filesystem and, optionally,
a partition table just as if it were a real hard disk.
The ability to use big files as hard disks comes in handy if you are
running short of space on your main hard disk. With an external hard
disk you should be well prepared to run a number of virtual machines
as big files.
However, having the filesystem of a virtual machine in a big file
raises the question of how to boot the virtual machine.
Essentially there are two options to do that:
- Provide the VM's kernel and the init-ramdisk, which are usually stored
inside the filesystem (in the /boot directory), as separate files
together with the big file, and modify the VM's configuration to use
them.
- leave the kernel and the init-ramdisk in the big file and provide
a working boot sector that accesses the kernel inside the big file,
using the native XEN pygrub bootloader to start the virtual machine.
Both options require that the big file must be associated with a real, special
device file (i.e /dev/loop0) in order to create a filesystem on the big file.
While for the first option it is sufficient to simply connect the big file
with the loop device, using the "losetup /dev/loop0 bigfile" command, the
second option is much more complex, as the big file has to be partitioned like
an ordinary hard disk before the filesystem can be created.
For the rest of this article we will focus on the second option which is much
more appealing as everything is kept inside the big file. I will show you how
exactly the big file is turned into a virtual hard disk and how you can access
and modify the information stored in the virtual machine's own filesystem.
Getting Partitions And Filesystem Sizes Sorted
Our journey through the big file's internal structure naturally begins with
the creation of the big file.
dd if=/dev/zero of=bigfile bs=1M count=3950
As a second step we use this chunk of 4141875200 bytes to act as a hard disk
and try to partition the bigfile as usual:
losetup /dev/loop0 bigfile
fdisk /dev/loop0
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help):
As expected, the fdisk program throws a number of error messages at us,
because we have given a big file instead of a real hard disk to the program.
But let's see how the fdisk program recognizes our new hard disk in detail
.
Disk /dev/loop0: 4141 MB, 4141875200 bytes
255 heads, 63 sectors/track, 503 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
Obviously there is no partition table yet, but the program assumes that the
big file represents a hard disk with 255 heads and 63 sectors of 512 bytes
data each. Every cylinder of our virtual hard disk is made of
255 x 63 x 512 bytes = 8225280 bytes which represents the units in which we
can chop the hard disk space into partitions now. All in all there are
503 cylinders in our virtual hard disk which makes a total of
503 x 8225280 bytes = 4137315840 bytes to spend on partitions.
But wait, didn't we create 4141875200 bytes in the first place? That's
4559360 bytes less than what we had originally. Well, this loss is due to
the fact that for the 504th cylinder we'd need 8225280 bytes which we don't
have, so this loss is inevitable. But the important consequence of this
reduction of space is that we cannot create a filesystem on the whole bunch
of data we supplied. At the moment the size of our filesystem is not
determined at all.
The next step is to create a new primary partition inside our big file using
all the space we have:
Disk /dev/loop0: 4141 MB, 4141875200 bytes
255 heads, 63 sectors/track, 503 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/loop0p1 1 503 4040316 83 Linux
After having written the partition table to the big file, have you checked for
the new device file /dev/loop0p1? Don't worry, it does not exist!
Adding p1 to the disk label is fdisk's way to denote partitions, that
does not mean that you'll find such a thing in the /dev directory.
Poking Inside The Big File
From the partition table you can see that 4040316 blocks have been allocated
for the new partition. With each block storing 1024 bytes we now know our first
partition size, it's 4040316 x 1024 bytes = 4137283584 bytes. This is another
number we never saw before! After having written off some 4.5 megabytes
because we cannot use half a cylinder, we now face another loss of exactly
4137315840 - 4137283584 = 32256 bytes.
Of course these 32256 bytes at the beginning of the big file are there for a
purpose, which is to store the partition table. Our first partition begins
right after this amount of data, at an offset of 32256 inside the big file.
The amount of 32256 bytes results from the fact that one track (63 sectors
of 512 bytes for one head) are put away for the partition table.
Now it's time to use a second loop device (/dev/loop1) to poke inside the
big file at exactly the point where our first partition begins and create
a new filesystem there:
losetup -o 32256 /dev/loop1 bigfile
mkfs -t ext3 -c /dev/loop1 4040316
It's essential that we supply the number of blocks as a parameter to the
mkfs command to ensure, that our new filesystem on the first partition fits
exactly in the space we have allocated. Without this parameter our filesystem
would become too big, as the 4.5 megabytes after the first partition would
be used for the filesystem too, and when the virtual machine is going to
use the filesystem its actual size would conflict with the numbers in the
partition table. Either the partition table or the filesystem's superblock
is lying, which will cause distress for the virtual machine that expects a
consistent filesystem to operate.
Writing The Master Boot Record
You can fill up the filesystem with whatever carefully selected quality open
source software you can find on the planet, but in the end we need to write
the new virtual disk's master boot record to boot the jewel. There is one step
of preparation to be done before we can use the grub shell to write the MBR.
We have to make a symbolic link named /dev/loop to the device that points to
the master boot record, that is to the beginning of the big file, /dev/loop0
in the example above.
grub> device (hd0) /dev/loop
grub> root (hd0,0)
grub> setup (hd0)
grub> quit
Now your spick-and-span virtual hard disk is ready to boot.
Recent Comments