kexec on Galaxy S3/Note 2

So, we have a booting kernel, and a bunch of nice[citation needed] dts files describing the hardware of each phone. One problem: the proprietary bootloader, "S-BOOT", has no device tree support - it just loads an Android "boot.img", consisting of a kernel, ramdisk, and some configuration. Fortunately, Linux has a mechanism to load a device tree appended to the end of the kernel image. This allows us to use a single fixed device tree, but won't let us boot a single kernel/initramfs combo on all the devices we're aiming to support. Along with this, the boot partition only allows 8MB for kernel + initramfs, which is... not that much.

So, how can we achieve one image to rule them all?

We could try replacing the bootloader with something like u-boot. There are several drawbacks to this approach:

  • Replacing the bootloader is a risky propsition - if something goes wrong, the device may be irrecoverable (without finding a JTAG port, anyway...)
  • I don't even know if the bootloader is replacable - it's probably signature checked by the previous stage.
  • There is no documentation at all on how the first-stage bootloader hands off execution to the second-stage one. What state is the hardware in? What format should the second-stage bootloader be in?

It's probably easier just to stick with S-BOOT in the near future - so what can we do?

Enter kexec.

Kexec enables us to boot another kernel in-place, providing our own kernel, DTB and initramfs. Of course, kexec doesn't work right away - that would be no fun. I get invalid argument when attempting to load the kexec image. Some searching shows that we're missing the cpu_kill SMP operation. Fortunately, this is quite easy to implement.

So it works now, right?

Unfortunately not. There's earlyprintk output, but it fails quite early on trying to access some registers that aren't mapped. Almost like this isn't being executed - which would happen if !soc_is_exynos4() - so samsung_cpu_id isn't set properly. So for some reason the CHIPID block has garbage in its registers. Weird, right?

A wild guess led me to check the clocks available on exynos4412. Sure enough, there is a CHIPID clock, and it is disabled by the first kernel during bootup. So, we need to re-enable it before booting the second kernel.

Finally, all that's needed is to stop all i2c bus transfers, and mask interrupts before booting - and voila, the kernel can successfully kexec!

The "bootloader"

Now all we need is something to load and execute the kernel image. While kexec-tools does exist, it doesn't offer enough flexibility for our needs: For instance, the Note 2 has two different LCD panels, and a GPIO that is pulled high or low depending on the panel. To use the correct LCD, the bootloader needs to check the GPIO, and apply an "overlay" to the device tree. Plus, writing a "bootloader" isn't something I've done before - sounds like an adventure!

The bootloader uses inih to parse its configuration, and AOSP's libufdt to apply overlays. Once we've loaded the kernel, initrd, and device tree + overlays into RAM, we need to tell the kernel how to boot them.

There are a couple of for booting a new kernel: 1. Each "segment" must be page-aligned 2. The zImage must have enough space immediately after it to decompress

Both of these tasks are left as excercises for userspace. What we need to know is the physical address of the system RAM (see /proc/iomem), the pagesize (see getpagesize(2)), and how big the zImage is when uncompressed. Or, as kexec-tools does, we could just leave a big amount of RAM after the zImage (a much easier solution!).

My actual implementation is here. Some of the code isn't especially nice (this tends to happen when debugging weird boot issues, I find), but as always, PRs/patches are welcome!