Discussion:
Single MIPS kernel
Ralf Baechle
2014-10-22 08:34:37 UTC
Permalink
This question comes up every once in a while and I've also been approac=
hed
during ELCE in D=FCsseldorf why there is no single MIPS kernel for all
platforms, so I thought I should post a writeup on the topic.

The primary reason is that MIPS kernels are using non-PIC kernels. Thi=
s
means code is linked to a particular absolute address. The link addres=
s
depends on the memory range available on a particular system's availabl=
e
memory range - there is no one size that fits all systems, not even a
large fraction of supported systems.

What does it take to make kernels relocatable? A current kernel is not
relocatable. One might do something along the lines of userland where
the dynamic linker is in a similar situation and needs to first relocat=
e
itself before it can perform its actual job.

Two approaches. First keeping the non-PIC code. That requires keeping
the entire relocation. A lasat_defconfig vmlinux is 5733098 bytes but
built with --emit-relocs to keep the reloc information in the final
binary the vmlinux file grows to 7217342 bytes! A quick look at the
reloc sections:

Section Headers:
[Nr] Name Type Addr Off Size ES Flg =
Lk Inf Al
[ 2] .rel.text REL 00000000 461538 0eedf8 08 =
34 1 4
[ 4] .rel__ex_table REL 00000000 550330 0040e0 08 =
34 3 4
[ 8] .rel.rodata REL 00000000 554410 0310e0 08 =
34 7 4
[10] .rel.pci_fixup REL 00000000 5854f0 000998 08 =
34 9 4
[12] .rel__ksymtab REL 00000000 585e88 00b3b0 08 =
34 11 4
[14] .rel__ksymtab_gpl REL 00000000 591238 007180 08 =
34 13 4
[17] .rel__param REL 00000000 5983b8 000858 08 =
34 16 4
[19] .rel__modver REL 00000000 598c10 000038 08 =
34 18 4
[21] .rel.data REL 00000000 598c48 00a130 08 =
34 20 4
[23] .rel.init.text REL 00000000 5a2d78 00f008 08 =
34 22 4
[25] .rel.init.data REL 00000000 5b1d80 001d08 08 =
34 24 4
[27] .rel.exit.text REL 00000000 5b3a88 000b78 08 =
34 26 4

The approach could probably be optimized but as a first order approxima=
tion
this demonstrates there would be plenty of bloat to the binary. Positi=
ve
side of this approach: no runtime penalty.

Alternatively: make the kernel PIC code. Over the thumb that'd going t=
o
inflate the kernel by 10 or 15%. Less than above approach but there'd
also be significant runtime overhead. Probably nothing for a world whe=
re
benchmarks like network performance on 64 byte packets decide on the
fate of a product on the market.

Obviously there is the difference between 32 and 64 bit kernels. 64 bi=
t
uses additional instructions that are not available on 32 bit processor=
s
and using just 32 bit instructions won't fly on 64 bit kernels.

Hardware detection. That's all easy in a device tree world but in all
reality many of the existing systems don't support device tree yet so a
generic kernel would have to figure out what platform it's running on
which would end up in something like an ISA style device probe.

Ralf
John Crispin
2014-10-22 10:53:08 UTC
Permalink
Hi Ralf,
Post by Ralf Baechle
This question comes up every once in a while and I've also been
approached during ELCE in D=FCsseldorf why there is no single MIPS
kernel for all platforms, so I thought I should post a writeup on
the topic.
=20
for the SoCs supported by OpenWrt this is a no-go. we are already
having a hard time fighting bloat. to get the images to fit we need to
already build device specific images that only hold the DT, drivers,
=2E.. that a specific board needs. having a kernel that can boot on X
devices wont even fit into flash and if it does there is not space
left for the userland.

I think this feature is only interesting for the older platforms and
the upcoming mobile SoC based in MIPS. i.e. the users are debian and
android type device.

John
Florian Fainelli
2014-10-22 17:36:42 UTC
Permalink
Post by John Crispin
Hi Ralf,
=20
Post by Ralf Baechle
This question comes up every once in a while and I've also been
approached during ELCE in D=C3=BCsseldorf why there is no single MIP=
S
Post by John Crispin
Post by Ralf Baechle
kernel for all platforms, so I thought I should post a writeup on
the topic.
=20
for the SoCs supported by OpenWrt this is a no-go. we are already
having a hard time fighting bloat. to get the images to fit we need t=
o
Post by John Crispin
already build device specific images that only hold the DT, drivers,
... that a specific board needs. having a kernel that can boot on X
devices wont even fit into flash and if it does there is not space
left for the userland.
A multi-platform kernel should allow to compile in/out specific
platforms (like what ARM does), such that we can still achieve a small
kernel goal in OpenWrt.
Post by John Crispin
=20
I think this feature is only interesting for the older platforms and
the upcoming mobile SoC based in MIPS. i.e. the users are debian and
android type device.
You mean, older systems such as Sibyte, the SGI's IPxx and similar?
Those may have memory architecture requirements (spaces.h) that make it
difficult if possible to support.
--
=46lorian
David Daney
2014-10-22 17:56:05 UTC
Permalink
This question comes up every once in a while and I've also been appro=
ached
during ELCE in D=FCsseldorf why there is no single MIPS kernel for al=
l
platforms, so I thought I should post a writeup on the topic.
The primary reason is that MIPS kernels are using non-PIC kernels. T=
his
means code is linked to a particular absolute address. The link addr=
ess
depends on the memory range available on a particular system's availa=
ble
memory range - there is no one size that fits all systems, not even a
large fraction of supported systems.
Another reason is that the protocol between the bootloader and the=20
kernel varies by platform. So you would have to have several different=
=20
entry points, one for each booting protocol.

I am not sure how the bootloaders would know which entry point to use.

David Daney
Ralf Baechle
2014-10-22 19:05:15 UTC
Permalink
Another reason is that the protocol between the bootloader and the kernel
varies by platform. So you would have to have several different entry
points, one for each booting protocol.
I am not sure how the bootloaders would know which entry point to use.
That's where I foresaw the needs for the ISA style platform probe right
at the kernel entry point before fanning out to a platform-specific
entry point.

Since we already support compressed kernels I'm wondering if relocation
might also be performed by the compression wrapper along with the
hardware probe. That would leave the vmlinux itself untouched and
the wrapper could be installed on the target.

It's just that all this for the sake of a unified kernel images seems
way to painful!

Ralf
Maciej W. Rozycki
2014-10-22 19:19:07 UTC
Permalink
Post by Ralf Baechle
Another reason is that the protocol between the bootloader and the kernel
varies by platform. So you would have to have several different entry
points, one for each booting protocol.
I am not sure how the bootloaders would know which entry point to use.
That's where I foresaw the needs for the ISA style platform probe right
at the kernel entry point before fanning out to a platform-specific
entry point.
Since we already support compressed kernels I'm wondering if relocation
might also be performed by the compression wrapper along with the
hardware probe. That would leave the vmlinux itself untouched and
the wrapper could be installed on the target.
Wouldn't it make sense to make a unified kernel virtually mapped? That
would avoid the issue with RAM being present at different locations across
systems and also if big pages were used, that I believe are available
almost universally across the MIPS family, any performance hit would be
minimal. There would be hardly any increase in the binary image size too.
Run-time mappings such as `kmalloc' or `ioremap' could continue using
unmapped segments.

Thoughts?

Maciej
Ralf Baechle
2014-10-22 20:42:09 UTC
Permalink
Post by Maciej W. Rozycki
Post by Ralf Baechle
Another reason is that the protocol between the bootloader and the kernel
varies by platform. So you would have to have several different entry
points, one for each booting protocol.
I am not sure how the bootloaders would know which entry point to use.
That's where I foresaw the needs for the ISA style platform probe right
at the kernel entry point before fanning out to a platform-specific
entry point.
Since we already support compressed kernels I'm wondering if relocation
might also be performed by the compression wrapper along with the
hardware probe. That would leave the vmlinux itself untouched and
the wrapper could be installed on the target.
Wouldn't it make sense to make a unified kernel virtually mapped? That
would avoid the issue with RAM being present at different locations across
systems and also if big pages were used, that I believe are available
almost universally across the MIPS family, any performance hit would be
minimal. There would be hardly any increase in the binary image size too.
Run-time mappings such as `kmalloc' or `ioremap' could continue using
unmapped segments.
I think some MIPS III CPUs were restricted to just 4MB max. page size.
NEC VR4xxx I think. Still a pair would map 8MB which on the affected
small memory systems should suffice. 16MB, 64MB are more typical sizes.

R3000 is a different kettle. To 4k or not to 4k is not a question ;-)

Now mapping the kernel alone wouldn't solve the security issue mentioned
by David. The image would still lie around in KSEG0 / XKPHYS for whatever
wants to run over so that should ideally also be a flexible address.

Otoh the mapped kernel certainly would have the lowest size overhead.
I have faint memories of restrictions for TLB instructions or was it
TLB exception handlers into mapped space, would have to do some rtfming
on that topic.

Years ago I did test the impact of one less available TLB entry with
lmbench; the loss was around 2%. That was on a CPU with 64 entries.

Ralf
David Daney
2014-10-22 21:10:36 UTC
Permalink
Post by Ralf Baechle
Post by Maciej W. Rozycki
Post by Ralf Baechle
Another reason is that the protocol between the bootloader and the kernel
varies by platform. So you would have to have several different entry
points, one for each booting protocol.
I am not sure how the bootloaders would know which entry point to use.
That's where I foresaw the needs for the ISA style platform probe right
at the kernel entry point before fanning out to a platform-specific
entry point.
Since we already support compressed kernels I'm wondering if relocation
might also be performed by the compression wrapper along with the
hardware probe. That would leave the vmlinux itself untouched and
the wrapper could be installed on the target.
Wouldn't it make sense to make a unified kernel virtually mapped? That
would avoid the issue with RAM being present at different locations across
systems and also if big pages were used, that I believe are available
almost universally across the MIPS family, any performance hit would be
minimal. There would be hardly any increase in the binary image size too.
Run-time mappings such as `kmalloc' or `ioremap' could continue using
unmapped segments.
I think some MIPS III CPUs were restricted to just 4MB max. page size.
NEC VR4xxx I think. Still a pair would map 8MB which on the affected
small memory systems should suffice. 16MB, 64MB are more typical sizes.
R3000 is a different kettle. To 4k or not to 4k is not a question ;-)
Now mapping the kernel alone wouldn't solve the security issue mentioned
by David. The image would still lie around in KSEG0 / XKPHYS for whatever
wants to run over so that should ideally also be a flexible address.
Otoh the mapped kernel certainly would have the lowest size overhead.
I have faint memories of restrictions for TLB instructions or was it
TLB exception handlers into mapped space, would have to do some rtfming
on that topic.
Years ago I did test the impact of one less available TLB entry with
lmbench; the loss was around 2%. That was on a CPU with 64 entries.
We have a private patch that does exactly this, the main motivation was
to place the kernel in the same virtual address 256MB region as the
modules, so that a direct calling sequence can be used in modules.

The resulting module code is much faster, so depending on the work load
it may be a performance win. We see things like IPv6 forwarding
improving something like 6% when IPv6 is built as a module.

Also we have many more TLB entries (128, or 256) so losing one is not a
big deal.

David Daney
James Hogan
2014-10-22 21:53:13 UTC
Permalink
Hi,
Post by Ralf Baechle
Post by Maciej W. Rozycki
Wouldn't it make sense to make a unified kernel virtually mapped? That
would avoid the issue with RAM being present at different locations across
systems and also if big pages were used, that I believe are available
almost universally across the MIPS family, any performance hit would be
minimal. There would be hardly any increase in the binary image size too.
Run-time mappings such as `kmalloc' or `ioremap' could continue using
unmapped segments.
Otoh the mapped kernel certainly would have the lowest size overhead.
I have faint memories of restrictions for TLB instructions or was it
TLB exception handlers into mapped space, would have to do some rtfming
on that topic.
Yeh, KVM puts all tlb handling in arch/mips/kvm/tlb.c, which is built
statically rather than being included in the kvm kernel module, exactly
for this reason, so that it resides in unmapped memory space.

You'd have to guarantee not to get a TLB exception while the TLB
registers contain important values, since they'll get clobbered by the
taking of the exception itself (e.g. EntryHi gets set to failing
address, EntryLo* undefined), or the TLB entry pointed to by CP0_Index
may be replaced.

There's always CP0_Wired - its use in the kernel is a bit of a mess atm
IIRC.

Cheers
James
David Daney
2014-10-22 22:18:27 UTC
Permalink
Post by James Hogan
Hi,
Post by Ralf Baechle
Post by Maciej W. Rozycki
Wouldn't it make sense to make a unified kernel virtually mapped? That
would avoid the issue with RAM being present at different locations across
systems and also if big pages were used, that I believe are available
almost universally across the MIPS family, any performance hit would be
minimal. There would be hardly any increase in the binary image size too.
Run-time mappings such as `kmalloc' or `ioremap' could continue using
unmapped segments.
Otoh the mapped kernel certainly would have the lowest size overhead.
I have faint memories of restrictions for TLB instructions or was it
TLB exception handlers into mapped space, would have to do some rtfming
on that topic.
Yeh, KVM puts all tlb handling in arch/mips/kvm/tlb.c, which is built
statically rather than being included in the kvm kernel module, exactly
for this reason, so that it resides in unmapped memory space.
You'd have to guarantee not to get a TLB exception while the TLB
registers contain important values, since they'll get clobbered by the
taking of the exception itself (e.g. EntryHi gets set to failing
address, EntryLo* undefined), or the TLB entry pointed to by CP0_Index
may be replaced.
There's always CP0_Wired - its use in the kernel is a bit of a mess atm
IIRC.
The current kernel.org kernel respects CP0_Wired. We use a single TLB
entry (index 0) to map the entire kernel, and set CP0_Wired accordingly.
Everything works.

EBase still points to an unmapped address, so exception handlers still
work as before, except they may have to use more code to directly call
into the kernel due to the 256MB jump range thing. The general
exception handler looks up the addresses in a table to that is not
effected, only if you have a dedicated interrupt vector, do you have to
use an indirect jump to reach the kernel.

David Daney
David Daney
2014-10-22 18:03:01 UTC
Permalink
This question comes up every once in a while and I've also been appro=
ached
during ELCE in D=FCsseldorf why there is no single MIPS kernel for al=
l
platforms, so I thought I should post a writeup on the topic.
The primary reason is that MIPS kernels are using non-PIC kernels. T=
his
means code is linked to a particular absolute address. The link addr=
ess
depends on the memory range available on a particular system's availa=
ble
memory range - there is no one size that fits all systems, not even a
large fraction of supported systems.
There is another reason to have a relocatable kernel: The security=20
people are starting to demand it so that they can randomize the load=20
address.
What does it take to make kernels relocatable? A current kernel is n=
ot
relocatable. One might do something along the lines of userland wher=
e
the dynamic linker is in a similar situation and needs to first reloc=
ate
itself before it can perform its actual job.
Two approaches. First keeping the non-PIC code. That requires keepi=
ng
the entire relocation. A lasat_defconfig vmlinux is 5733098 bytes bu=
t
built with --emit-relocs to keep the reloc information in the final
binary the vmlinux file grows to 7217342 bytes! A quick look at the
[Nr] Name Type Addr Off Size ES F=
lg Lk Inf Al
[ 2] .rel.text REL 00000000 461538 0eedf8 08 =
34 1 4
[ 4] .rel__ex_table REL 00000000 550330 0040e0 08 =
34 3 4
[ 8] .rel.rodata REL 00000000 554410 0310e0 08 =
34 7 4
[10] .rel.pci_fixup REL 00000000 5854f0 000998 08 =
34 9 4
[12] .rel__ksymtab REL 00000000 585e88 00b3b0 08 =
34 11 4
[14] .rel__ksymtab_gpl REL 00000000 591238 007180 08 =
34 13 4
[17] .rel__param REL 00000000 5983b8 000858 08 =
34 16 4
[19] .rel__modver REL 00000000 598c10 000038 08 =
34 18 4
[21] .rel.data REL 00000000 598c48 00a130 08 =
34 20 4
[23] .rel.init.text REL 00000000 5a2d78 00f008 08 =
34 22 4
[25] .rel.init.data REL 00000000 5b1d80 001d08 08 =
34 24 4
[27] .rel.exit.text REL 00000000 5b3a88 000b78 08 =
34 26 4
The approach could probably be optimized but as a first order approxi=
mation
this demonstrates there would be plenty of bloat to the binary. Posi=
tive
side of this approach: no runtime penalty.
This is the approach I was thinking of taking. There would be a small=20
PIC wrapper that applied the relocations, and then passed control to th=
e=20
real entry point.

We would have to be careful of the ex_table, as that is now sorted at=20
build time. For that, we could go to the scheme used by x86, and have=20
that addresses in the ex_table be relative, build time sorting is=20
already working for x86 relocatable kernels.

David Daney.
Ralf Baechle
2014-10-22 19:20:19 UTC
Permalink
There is another reason to have a relocatable kernel: The security people
are starting to demand it so that they can randomize the load address.
That may work for some platforms - but in the MIPS world we still have to
deal with very claustrophobic systems which barely leave any space to
move a kernel around.
This is the approach I was thinking of taking. There would be a small PIC
wrapper that applied the relocations, and then passed control to the real
entry point.
We would have to be careful of the ex_table, as that is now sorted at build
time. For that, we could go to the scheme used by x86, and have that
addresses in the ex_table be relative, build time sorting is already working
for x86 relocatable kernels.
That's probably more of an implementation detail. I'm more concerned about
the overall bloat. I think many embedded users are so addivted to benchmark
results that this going to make or break the whole scheme.

Ralf
Ben Hutchings
2014-10-22 22:15:40 UTC
Permalink
Post by Ralf Baechle
There is another reason to have a relocatable kernel: The security people
are starting to demand it so that they can randomize the load address.
That may work for some platforms - but in the MIPS world we still have to
deal with very claustrophobic systems which barely leave any space to
move a kernel around.
This is the approach I was thinking of taking. There would be a small PIC
wrapper that applied the relocations, and then passed control to the real
entry point.
We would have to be careful of the ex_table, as that is now sorted at build
time. For that, we could go to the scheme used by x86, and have that
addresses in the ex_table be relative, build time sorting is already working
for x86 relocatable kernels.
That's probably more of an implementation detail. I'm more concerned about
the overall bloat. I think many embedded users are so addivted to benchmark
results that this going to make or break the whole scheme.
If you can make relocation a configuration option (as on x86), it would
allow distributions to build multiplatform kernels without preventing
embedded users from building a kernel optimised for their specific
system. But I know very little about MIPS or how intrusive the changes
for relocation would have to be. Perhaps it would be too much of a
maintenance burden to make this an option.

Ben.
--
Ben Hutchings
For every action, there is an equal and opposite criticism. - Harrison
Ralf Baechle
2014-10-22 23:22:34 UTC
Permalink
Post by Ben Hutchings
Post by Ralf Baechle
That's probably more of an implementation detail. I'm more concerned about
the overall bloat. I think many embedded users are so addivted to benchmark
results that this going to make or break the whole scheme.
If you can make relocation a configuration option (as on x86), it would
allow distributions to build multiplatform kernels without preventing
embedded users from building a kernel optimised for their specific
system. But I know very little about MIPS or how intrusive the changes
for relocation would have to be. Perhaps it would be too much of a
maintenance burden to make this an option.
The scope of the changes is relativly limited - we're much more concerned
about the impact on binary size, memory size or performance of the
various approaches under discussion.

I wonder kernels for which platforms would Debian want to unify?

Ralf
Ben Hutchings
2014-10-23 01:02:11 UTC
Permalink
Post by Ralf Baechle
Post by Ben Hutchings
Post by Ralf Baechle
That's probably more of an implementation detail. I'm more concerned about
the overall bloat. I think many embedded users are so addivted to benchmark
results that this going to make or break the whole scheme.
If you can make relocation a configuration option (as on x86), it would
allow distributions to build multiplatform kernels without preventing
embedded users from building a kernel optimised for their specific
system. But I know very little about MIPS or how intrusive the changes
for relocation would have to be. Perhaps it would be too much of a
maintenance burden to make this an option.
The scope of the changes is relativly limited - we're much more concerned
about the impact on binary size, memory size or performance of the
various approaches under discussion.
I wonder kernels for which platforms would Debian want to unify?
I don't have high expectations for being able to unify those we
currently support. Realistically, I expect that most development effort
will go into new platforms. (What we saw with ARM was that
multi-platform was implemented for most ARMv7 platforms (for which we
now need only 2 configurations) but only slowly for older chips (4
configurations, and that's after dropping 2 platforms).)

Anyway, we have one 32-bit configuration for each byte order
(4kc-malta), and the following 64-bit configurations:

[big-endian]
r4k-ip22: CONFIG_SGI_IP22, CONFIG_CPU_R4X00
r5k-ip32: CONFIG_SGI_IP32, CONFIG_CPU_R5000
sb1-bcm91250a: CONFIG_SIBYTE_SWARM, CONFIG_CPU_SB1
5kc-malta: CONFIG_MIPS_MALTA, CONFIG_CPU_MIPS64_R1
octeon: CONFIG_CAVIUM_OCTEON_SOC

[little-endian]
sb1-bcm91250a: CONFIG_SIBYTE_SWARM, CONFIG_CPU_SB1
5kc-malta: CONFIG_MIPS_MALTA, CONFIG_CPU_MIPS64_R1
loongson-2e: CONFIG_MACH_LOONGSON, CONFIG_LEMOTE_FULOONG2E
loongson-2f: CONFIG_MACH_LOONGSON, CONFIG_LEMOTE_MACH2F
loongson-3: CONFIG_MACH_LOONGSON, CONFIG_LOONGSON_MACH3X

In general, I want our kernel packages to support any hardware that is
or has been generally available to buy, that can feasibly run a general
purpose distribution. I'm somewhat hopeful that Prpl members will be
introducing new platforms that fit this description in the near future.

But I also want the packages to build natively in a few hours on each
architecture. Currently, it takes about 17 hours on little-endian
(Loongson 3A, quad-core) and longer on big-endian (Octeon v0.3, 6 cores
used). So I can't accept a further increase in the number of
configurations as new MIPS platforms appear. Without multi-platform
support, we will have to drop one platform for each one we add, so we'll
have to be quite picky about adding them.

Ben.
--
Ben Hutchings
Q. Which is the greater problem in the world today, ignorance or apathy?
A. I don't know and I couldn't care less.
Joshua Kinard
2014-10-23 03:13:10 UTC
Permalink
Post by Ben Hutchings
Post by Ralf Baechle
Post by Ben Hutchings
Post by Ralf Baechle
That's probably more of an implementation detail. I'm more concerned about
the overall bloat. I think many embedded users are so addivted to benchmark
results that this going to make or break the whole scheme.
If you can make relocation a configuration option (as on x86), it would
allow distributions to build multiplatform kernels without preventing
embedded users from building a kernel optimised for their specific
system. But I know very little about MIPS or how intrusive the changes
for relocation would have to be. Perhaps it would be too much of a
maintenance burden to make this an option.
The scope of the changes is relativly limited - we're much more concerned
about the impact on binary size, memory size or performance of the
various approaches under discussion.
I wonder kernels for which platforms would Debian want to unify?
I don't have high expectations for being able to unify those we
currently support. Realistically, I expect that most development effort
will go into new platforms. (What we saw with ARM was that
multi-platform was implemented for most ARMv7 platforms (for which we
now need only 2 configurations) but only slowly for older chips (4
configurations, and that's after dropping 2 platforms).)
Anyway, we have one 32-bit configuration for each byte order
[big-endian]
r4k-ip22: CONFIG_SGI_IP22, CONFIG_CPU_R4X00
r5k-ip32: CONFIG_SGI_IP32, CONFIG_CPU_R5000
As far as I know, IRIX includes kernels specific to each SGI system (IPxx), but
it seems they're CPU agnostic. They are relocatable, though. Been awhile
since I watched sash boot followed by an IRIX kernel, but it does 3-4
relocations before finally booting. So a relocatable MIPS kernel on the SGI
platforms seems possible. Probably requires arcane knowledge of ARCS, though.

Bootloader-wise, Stan's 'arcload' can handle booting multiple kernels across
various SGI platforms. We used it on the Gentoo SGI LiveCD back in 2006 to
create a single CD that could boot on IP22, IP27, IP30, & IP32, using different
kernels for each system and CPU (I think there was one volume header slot left
at the end for arcload itself).
--
Joshua Kinard
Gentoo/MIPS
***@gentoo.org
4096R/D25D95E3 2011-03-28

"The past tempts us, the present confuses us, the future frightens us. And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic
Loading...