by John Tsiombikas
Last update: 30 April 2018.
Back to index
This week in bare metal hacking I want to talk about loading the main program. If you remember from previous articles in the series, I've mentioned that one of my requirements for this project, is to be able to utilize any boot medium supported by the BIOS, and that means I want to use the BIOS routines for loading (int 13h, call 2).
The problem with this approach, is that I can only call the BIOS from 16bit real mode, not from 32bit protected mode, so I must load the whole program before switching to protected mode. But the main program might be more than a few hundred kilobytes, so it won't fit in the part of RAM addressable from real mode. Moreover I want to avoid low memory altogether, and load the main program starting from the 1MB mark.
To understand the solution to our problem, it's important to first explain how the processor uses segment descriptors, and how it emulates its 16bit 8086 precursor in real mode.
Segments in the 386 are defined, as I mentioned in the previous article, by segment descriptors in the Global Descriptor Table. When a segment selector is loaded into one of the segment registers (cs, ds, es, gs, fs, or ss), the processor reads the descriptor from memory, and keeps the segment base and limit in a (per-selector) internal descriptor cache. On each memory access, the descriptor cache is used to calculate the linear address for that access, and whether it's valid, within the segment limit or not.
On reset, the processor starts up in real mode (PE bit 0 in CR0), but still uses the same mechanism. The only difference being that loading segment registers do not affect the descriptor cache, but merely set the value that gets left-shifted by 4 and added to the offset, in order to form linear addresses. To emulate the 8086 with its 16bit (64k) segments, the descriptor cache for all segment registers is initialized with a segment limit of 0xffff.
Now here comes the non-obvious bit, an old demoscene trick I've read about ages ago called "unreal mode": we can switch to protected mode temporarily, just to define segment descriptors with 4GB limit (as we would to access the whole memory in protected mode), load the appropriate selectors to all data segment registers, thus changing their descriptor cache, and drop back to real mode by flipping the PE bit back to 0.
With the PE bit being 0, addresses are formed exactly like in regular 16bit real mode with segment and offset, but now the descriptor caches specify that our segments are 4GB long, allowing us to access the whole 32bit address space with zeroing out all segments and using the whole 32bit values as offset (for example using es:edi with es being 0, and edi holding any 32bit value.
Let me make this explicit. By performing this trick, we can run in real mode, with regular segment:offset address calculations, and be perfectly able to call 16bit BIOS calls, while still having access to the whole 4GB address space, because our offsets are not limited to 16bit values any more.
Disclaimer: some people seem to argue on the internet about whether this is what's called unreal mode, or if this should be called "flat real mode", and only call it unreal mode when you're also setting a 4GB limit to the code segment as well. Others seem to think the term as being silly altogether and that it should not be used. But that's how I first saw this trick called in old PC democoding articles, and that's what I'll be calling it. It makes sense. It's like real mode but unbelievably awesome, hence: unreal mode.
So here's the code from my boot loader, which sets up unreal mode, before calling the load_main routine:
# same initial GDT we're also using for protected mode .align 4 .word 0 gdt_lim: .word 23 gdt_base:.long gdt .align 8 gdt: # 0: null segment .long 0 .long 0 # 1: code - base:0, lim:4g, G:4k, 32bit, avl, pres|app, dpl:0, type:code/non-conf/rd .long 0x0000ffff .long 0x00cf9a00 # 2: data - base:0, lim:4g, G:4k, 32bit, avl, pres|app, dpl:0, type:data/rw .long 0x0000ffff .long 0x00cf9200 .code16 unreal: # use the same GDT, will use data segment: 2 lgdt (gdt_lim) # enter protected mode: PE=1 mov %cr0, %eax or $1, %ax mov %eax, %cr0 jmp 0f # jmp to clear instruction cache # load data segment selectors to set descriptor caches to 4GB limits 0: mov $0x10, %ax mov %ax, %ds mov %ax, %es mov %ax, %fs mov %ax, %gs mov %ax, %ss # back to real (unreal) mode: PE=0 mov %cr0, %eax and $0xfffe, %ax mov %eax, %cr0 jmp 0f # zero all data segments to not affect address calculations 0: xor %ax, %ax mov %ax, %ds mov %ax, %es mov %ax, %fs mov %ax, %gs mov %ax, %ss ret
The BIOS call I'm using to load sectors, still probably can't cope with 32bit offsets, so I'm using a low memory buffer to read whole tracks at once, and then copy them over to above 1MB one track at a time.
dest_ptr: .long 0 load_main: movl $0x100000, dest_ptr ... ldloop: movzxw sect_per_track, %ecx sub trk_sect, %ecx push %ecx # this will call int 0x13/call 2 to load a bunch of sectors into track_buffer call read_track # copy to high memory mov $track_buffer, %esi mov dest_ptr, %edi mov (%esp), %ecx shl $9, %ecx add %ecx, dest_ptr shr $2, %ecx # addr32 prefix needed to use the whole edi instead of di addr32 rep movsl incl cur_track # other than the first track which might be partial, all the rest start from 0 movl $0, trk_sect pop %ecx sub %ecx, sect_left ja ldloop
Discuss this article
Back to index