Wiki

Understanding the boot process on Linux servers

Intro

Hardware failure is a fact of life when working with computers. When your business revolves around keeping servers running smoothly at all times, the ability to diagnose and fix failures isn't optional. At Anchor we take pride in our analytical skills, and use them to make things work.

This information is great when you have time to diagnose a faulty server, but it probably won't help you when a frontline machine falls over. The best thing to do is get everything running in a spare system. You do keep spares on hand, don't you..?

We'll run through the typical bootup sequence of one of our rackmount dedicated servers running Red Hat Enterprise Linux. The exact configuration is variable, but these are typically equipped with two CPUs, at least a few gig of RAM and SCSI hard drives in a RAID-1 configuration. We'll discuss the possible failure modes at each step and what can be done. The early stages of the boot sequence assume some knowledge of computer hardware architecture at a low level.

CPU/Memory

BIOS

Description

Timeline

  1. Onboard clock generators and CPU come up in a known state. Assume it's running correctly.
  2. CPU jumps to 0xffff0 and starts executing.

    1. This is at the top of the first meg of RAM, just 16 bytes away. There's not enough room here for the whole BIOS, so this is invariably a machine-code jump (JMP) to another memory address where the BIOS code really is.
  3. As soon as convenient, BIOS copies itself to a location in RAM, then fiddles the program counter so execution continues from RAM, as this is faster than hitting the ROM.
  4. BIOS performs system selftests; CPU, DMA, timers, PICs, etc.
  5. Buses are given a hardware reset to initialise them.
  6. Looks for a video BIOS starting at 0xc0000. A valid BIOS has the magic number at the start 0xAA55 (bytes are reversed to 0x55AA due to little-endianness).

    1. If present, initialises the video BIOS. This is when your screen turns on and displays the model no. and logo, etc.
  7. Starts scanning the space from 0xc8000 to 0xdf800 (meaning the BBS area ends at 0xe0000) in a similar manner on 2KiB boundaries, looking for other BIOSes to initialise. This is when SCSI cards and PXE-bootable network cards are discovered. Control passes to the secondary BIOS which can initialise devices, or run interactive frontends (eg. SCSI BIOS, low level format tools). The secondary BIOS registers bootable devices with the main BIOS via the BIOS Boot Specification API. Once each secondary BIOS has run, control is passed back to the main BIOS to continue scanning or booting.

    1. Before each option-BIOS is run, a checksum test is run on the BIOS before passing execution to it. If any of these fail, it should produce a "helpful" error message to the screen.
  8. Check the memory location at 0x472. If 0x1234 is found, this is a warm boot, and further POST tests are skipped

    1. During POST, status updates are written to "port 80". On some motherboards, this is a 2-digit 7-segment LED display that can tell you what the system is up to, and where failures are occurring.
  9. The BIOS is now ready to think about starting an OS. Based on default or preferences stored in the CMOS memory, the BIOS pick a device and attempt to boot it.
  10. To boot a device, the BIOS copies the first sector (512byte chunk) to memory location 0x7c00 (31KiB in memory) and starts executing it. Before considering it a valid boot sector, the BIOS checks that it ends with the standard 0xAA55 signature.

  11. For things like PXE-booting there's more complexity involved, but the important point is that this 512byte boot sector is an executable chunk of code that can load the next step in getting the system booted. We continue assuming that the chosen bootsector belongs to a hard drive. This bootsector is called the Master Boot Record, and is relevant to the whole disk.
  12. A standard HDD boot sector has 446bytes of bootloader code, 64bytes of DOS-style partition table (4x 16bytes per entry) and 2bytes of 0xAA55 signature.

  13. This bootloader code varies between OS, but its purpose is to find the first bootable partition on the disk and invoke its boot record (a partition has a bootsector as well, correctly termed a Volume Boot Record). For old DOS-type systems, the partition table would have one of the four partitions flagged as "Active". The MBR code would use the data in the partition table to find its location on disk and invoke the VBR.

Reference links and related articles

GRUB

GRUB is a boot loader, and is the first piece of software to run when a computer is started. Its role is to transfer control to an operating system kernel. The kernel then in turn initialises the rest of the operating system.

GRUB consists of 3 stages.

Stage1 is stored in the MBR of the physical boot media. The MBR is the first 512 byte sector of the device. This limits what features can be provided by the stage1 GRUB install. Stage1 can load stage1.5 or go directly to stage2 of GRUB.

Stage1.5 is located in the first 30 kilobytes of a device following the MBR. The role of stage1.5 is to load stage2. Stage1.5 exists as a convenience. Stage1 is only smart enough to point to a disk address where stage1.5 can be found. Stage1.5 groks filesystems, so it can find stage2 even if its on-disk location changes. If stage1.5 is not installed, stage1 will have to point directly to stage2 and has to be kept updated, much like lilo.

When installed, stage1.5 lives in the spare 30KiB of space between the MBR and the start of the first partition. This is possible because old-school fdisk leaves the rest of the first track free (63 x 512-byte sectors per track, one sector used by the MBR).

Stage2 is where the boot loader presents an interface to the user. Typically this is a menu that presents a list of available kernels and boot options saved in the configuration file /boot/grub/grub.conf. Stage2 also provides a command shell for users to make edits to parameters prior to booting the kernel. All commands available in the command shell can be used in the configuration file and vice-versa.

The boot process can be interrupted at any stage by many problems, a summary of these are:

Reference links and related articles

Kernel

Information on the kernel and initrd goes here

Initscripts

Init

Once the kernel has loaded it launches the init user process. Its config file is /etc/inittab. The first thing init does is set the runlevel, which should be 3 on any of our RedHat-based machines. After this it will launch /etc/rc.d/rc.sysinit which takes care of the higher level startup functions.

Stuff that can go wrong

rc.sysinit

This bash script (which you should never touch) does a lot. This has been edited for space due to the hundreds of boot-time parameters it sets up and the scripts it runs.

Stuff that can go wrong

rcX.d

The system now will call /etc/rc.d/rc $RUNLEVEL to run whatever scripts are defined. These scripts reside in /etc/rc.d/rc$RUNLEVEL.d/ and are (or should be) symlinks from scripts in /etc/init.d/. They should start with either an S (start-order when entering this runlevel) or a K (reverse kill-order when leaving this runlevel) and have a number to indicate priority. There are far too many application problems to list specifically, since this is just about the scripts. On RedHat/Fedora systems the chkconfig command is used to set up symlinks automatically.

Stuff that can go wrong

More system commands

What to do during Control-Alt-Delete, UPS-notified power failure, and power recovery are set.

Stuff that can go wrong

Start terminals and respawn processes

init now spawns six terminals, assuming you're in runlevel 2-5 and haven't messed with it. These are set to spawn again when they die. Depending on config it may also spawn serial terminals. By default, RedHat looks like it respawns xdm in runlevel 5, which none of our servers should be set to use.

Stuff that can go wrong

Wiki: dedicated/ServerBootProcess (last edited 2009-12-11 13:20:33 by PaulDeAudney)