I reinstalled NetBSD-current recently on my shark (Digital DNARD) and, out of curiosity, I wanted to see if the new-style kernel modules worked fine on this platform. To test that, I attempted to load the puffs module and failed with an error message saying something like "kobj_reloc: unexpected relocation type 1". Similarly, the same error appeared when running the simpler regression tests in /usr/tests/modules.
After seeing that error message, I tracked it down in the source code and ended in src/sys/arch/arm/arm32/kobj_machdep.c. A quick look at it and at src/sys/arch/arm/include/elf_machdep.h revealed that the kernel was lacking support for the R_ARM_PC24 relocation type. "It can't be too difficult to implement", I thought. Hah!
Based on documentation, I understood that R_ARM_PC24 is used in "short" jumps. This relocation is used to signal the runtime system that the offset to the target address of a branch instruction has to be relocated. This offset is a 24-bit number and, when loaded, it has to be shifted two bits to the left to accommodate for the fact that instructions are 32-bit aligned. Before the relocation, there is some addend encoded in the instruction that has to be loaded, sign-extended and shifted two bits to the left and, after all that, added to the calculated address.
I spent hours trying to implement support for the R_ARM_PC24 relocation type because it didn't want to work as expected. I even ended up looking at the Linux code to see how they dealt with it, and I found out that I was doing exactly the same as them. So what was the problem? A while later I realized that this whole thing wasn't working because the relocated address to be stored in the branch instruction didn't fit in the 24 bits! That makes things harder to solve.
At that point, I looked at the port-arm mailing list and found that several other people were looking at this same issue. Great, some time "wasted" but a lot of new stuff learnt. Anyway, it turns out there are basically two solutions to the problem described above. The first involves generating jump trampolines for the addresses that fall too far away. The second one is simpler: just change the kernel to load the modules closer to the kernel text, and thus make the jump offsets fit into the 24 bits of the instructions. Effectively, there is a guy that has got almost everything working already.