embd.cc

Linux

Debugging STM32MP135 Kernel Decompression

Published 9 Jan 2026. Written by Jakob Kastelic.

This is Part 8 in the series: Linux on STM32MP135. See other articles.

My STM32MP135 board includes DDR3L RAM and initial tests shows that I can fill it up with pseudo-random data and read it back correctly. ST provides a DDR test utility with a suite of memory tests, all of which pass. I decided to take it a step further and test the memory on a more intensive real-world task: “unzipping” a compressed file.

Summary

The result of the decompression test was very bad: most of the file was uncompressed correctly, with just a few bits always wrong, and just a few of them only sometimes wrong. I spent two or three days tracing my way through the “unzip” code, instruction by instruction, to try to catch where exactly it goes wrong.

In the end, I made an embarrassing discovery: I have partially swapped byte lanes. DDR3L on this SoC has two byte lanes, each consisting of {data, mask, strobe}. I have connected the data bits correctly, but swapped the mask & strobe between the two bytes. (Six high speed traces, some on inner layers—there’s no fixing that by hand.) Had I also swapped the data bits, everything would have been fine; indeed, the eval board swaps all the wires, which led me astray. (Partially.)

Sadly, AI was of no help in this instance. Given my DDR3L wiring, I can convince it either way: the connections are good; the connections are not good. In the end, only Rev B will tell for sure.

Problem statement

In this article we will proceed with debugging boot of the compressed Linux kernel image (zImage) on a custom board populated with the STM32MP135 SoC. The starting point will be the build that runs on the evaluation board as described in the previous article.

Despite booting just fine, the zImage gets stuck on boot on the custom board, without any messages printed to the UART console. Following along with the debugger shows that the decompressor code does run, but it’s not clear where exactly it gets stuck.

Power supply

It is possible that the burst of DDR activity during the high-speed decompression draws more current than the 1.35V supply is able to provide, despite the decoupling capacitance.

Indeed, on the scope I see a 30mV drop in the 1.35V supply voltage for about 500ms. However, if I raise the supply voltage by the 30mV, the boot still gets stuck. This was with kernel being written to 0xC2008000 and the DTB to 0xC4008000, which means that relocation isn’t necessary. My interpretation is that the scope trace shows that decompression takes about half a second.

Interestingly, if the kernel is written to 0xC0008000 and DTB to 0xC2008000, in which case relocation is necessary, the 20mV supply drop is shorter, about 150ms, and is followed by 10ms of a bigger drop, 120mV. That drop is indeed enough to disturb the decompression, since raising the supply voltage setpoint to 1.38V makes the bigger voltage drop be followed by 500ms of the usual 30mV drop. My interpretation: relocation takes 150ms, followed by 500ms of decompression, but the power supply is not stiff enough for relocation/decompression.

Soldering 1000uF electrolytic capacitors to the 1.25V and 1.35V rails, the effect is that both relocation and decompression complete (according to the scope trace, i.e., the 150ms and 500ms voltage drops are visible) with the two rails at 1.35V, 1.30V, 1.25V, 1.20V, 1.15V, but not below that. Restoring the supply setpoint to 1.35V, we see that the relocation and decompression complete as expected.

In order to avoid wasting time with relocation, we will from now on load the kernel to 0xC2000000 and the device tree to 0xC4000000. The scope trace of the 1.35V rail shows a small voltage drop for 500ms (decompression).

UART print during decompression

It’s not reassuring that we get zero console output during decompression. Trying to get at least some output, I added CONFIG_DEBUG_LL=y to the .config file and accepted most of the default options suggested by make:

Kernel low-level debugging functions (read help!) (DEBUG_LL) [Y/n/?] y
  Kernel low-level debugging port
  > 1. Use STM32MP1 UART for low-level debug (STM32MP1_DEBUG_UART) (NEW)
    2. Kernel low-level debugging via EmbeddedICE DCC channel (DEBUG_ICEDCC) (NEW)
    3. Kernel low-level debug output via semihosting I/O (DEBUG_SEMIHOSTING) (NEW)
    4. Kernel low-level debugging via 8250 UART (DEBUG_LL_UART_8250) (NEW)
    5. Kernel low-level debugging via ARM Ltd PL01x Primecell UART (DEBUG_LL_UART_PL01X) (NEW)
  choice[1-5?]:
Enable flow control (CTS) for the debug UART (DEBUG_UART_FLOW_CONTROL) [N/y/?] (NEW)
Physical base address of debug UART (DEBUG_UART_PHYS) [0x40010000] (NEW)
Virtual base address of debug UART (DEBUG_UART_VIRT) [0xfe010000] (NEW)
Early printk (EARLY_PRINTK) [N/y/?] (NEW) y
Write the current PID to the CONTEXTIDR register (PID_IN_CONTEXTIDR) [N/y/?] n

However, no output appeared on the UART. Loading Image (rather than zImage) produces the early prints, but the decompression hang mystery persists.

JTAG

Note: follow along this section with the help of linusw’s article, “How the ARM32 Linux kernel decompresses”.

Let’s try to follow along the decompression using a J-Link debug probe. First, open the GDB server and connect to it:

JLinkGDBServer.exe -device STM32MP135F -if swd -port 2330
arm-none-eabi-gdb.exe -q -x load.gdb

Where the load.gdb script contains:

file build/main.elf
add-symbol-file build/compressed 0xc2000000
target remote localhost:2330
monitor reset
monitor flash device=STM32MP135F
load build/main.elf
monitor go
break handoff.S:93

Step instruction a few times till reaching just after the handoff code:

(gdb) bt
#0  0xc2000004 in _text () at arch/arm/boot/compressed/head.S:202

This shows that execution has begun at the beginning of the decompressor, in file arch/arm/boot/compressed/head.S, in the start: label. We can step through the code lines (n command in gdb) until reaching the line bne not_angel, which we have to step into (si):

(gdb) si
not_angel () at arch/arm/boot/compressed/head.S:245
245                     safe_svcmode_maskall r0

Go forward (n) a few steps till reaching the C function fdt_check_mem_start() (arch/arm/boot/compressed/fdt_check_mem_start.c), then call finish to get out of it and continue stepping through the not_angel section:

(gdb) finish
Run till exit from #0  fdt_check_mem_start (mem_start=1, fdt=0xc4000000) at
arch/arm/boot/compressed/fdt_check_mem_start.c:106
not_angel () at arch/arm/boot/compressed/head.S:312
312                     add     r4, r0, #TEXT_OFFSET
Value returned is $3 = 3221225472
(gdb) n
323                     mov     r0, pc
324                     cmp     r0, r4
325                     ldrcc   r0, .Lheadroom
326                     addcc   r0, r0, pc
327                     cmpcc   r4, r0
328                     orrcc   r4, r4, #1              @ remember we skipped cache_on
329                     blcs    cache_on

Step into cache_on and later call_cache_fn, and go through the many lines till reaching the return from __armv7_mmu_cache_on:. Thus we reach the restart: section:

(gdb) b 902
Breakpoint 3 at 0xc200055c: file arch/arm/boot/compressed/head.S, line 902.
(gdb) c
Continuing.

Breakpoint 3, __armv7_mmu_cache_on () at arch/arm/boot/compressed/head.S:902
902                     mcr     p15, 0, r0, c7, c5, 4   @ ISB
(gdb) n
903                     mov     pc, r12
(gdb) si
restart () at arch/arm/boot/compressed/head.S:331
331     restart:        adr     r0, LC1

Continue stepping through until reaching the wont_overwrite: section, and then not_relocated:, where we clear BSS. Step through that, and we reach the beginning of the decompression proper: the decompress_kernel() function in arch/arm/boot/compressed/misc.c. Interestingly, we step right past the putstr("Uncompressing Linux..."); line without seeing anything printed on the UART console.

The function decompress_kernel() calls do_decompress(), which calls __decompress which calls __gunzip. Calling finish on the latter exactly correlates with the 500ms of the voltage drop observed on the 1.35V supply as mentioned above. Now we’re back in the decompress_kernel() function, which should print " done, booting the kernel.\n" (but doesn’t, since there’s something wrong with my putstr function).

We return back to the not_relocated: section of the compressed head.S and call get_inflated_image_size to find out how large the decompressed kernel is:

not_relocated () at arch/arm/boot/compressed/head.S:636
636                     get_inflated_image_size r1, r2, r3
638                     mov     r0, r4                  @ start of inflated image
639                     add     r1, r1, r0              @ end of inflated image
(gdb) p/x $r0
$3 = 0xc0008000
(gdb) p/x $r1
$4 = 0xc1241f48
(gdb)

Subtracting the r1 and r0 values, we see that the uncompressed kernel is exactly 19111752 bytes in size, which is identical to the size of the arch/arm/boot/Image file. So far so good!

Next, the startup code cleans caches and turns them off again and jumps to __enter_kernel just like we may do directly, had we loaded the uncompressed image in memory with the bootloader. This places the pointer to the DTB into r2 and passes control to the kernel:

__enter_kernel () at arch/arm/boot/compressed/head.S:1435
1435                    mov     r0, #0                  @ must be 0
1436                    mov     r1, r7                  @ restore architecture number
1437                    mov     r2, r8                  @ restore atags pointer
1438     ARM(           mov     pc, r4          )       @ call kernel

Just before the jump to the kernel, we can check that the register values make sense: r0 and r1 are zero, r2 has the DTB address, and the decompressed kernel will run from location 0xC0008000 (= TEXT_OFFSET):

(gdb) p $r0
$5 = 0
(gdb) p $r1
$6 = 0
(gdb) p/x $r2
$8 = 0xc4000000
(gdb) p/x $r4
$9 = 0xc0008000
(gdb)

One fateful step and we’re running in the uncompressed kernel proper. Let’s load the symbols from the main kernel ELF file to see what’s going on:

(gdb) si
0xc0008000 in ?? ()
(gdb) add-symbol-file build/vmlinux 0xc0008000
add symbol table from file "build/vmlinux" at
        .text_addr = 0xc0008000
Reading symbols from build/vmlinux...
(gdb)

Interesting, just one more step and the debugger stops as some much later point:

gdb) si
0xc0114620 in perf_swevent_init_hrtimer (event=0xc0008000 <stext>) at kernel/events/core.c:10836
10836                   hwc->sample_period = event->attr.sample_period;
(gdb) bt
#0  0xc0114620 in perf_swevent_init_hrtimer (event=0xc0008000 <stext>) at kernel/events/core.c:10836
#1  perf_swevent_init_hrtimer (event=0xc0008000 <stext>) at kernel/events/core.c:10818
#2  cpu_clock_event_init (event=0xc0008000 <stext>) at kernel/events/core.c:10902
#3  0xc271e9f0 in ?? ()

But if we finish running the perf_swevent_init_hrtimer function, then somehow we end up back in arch/arm/kernel/head.S. Debugging from that point onwards appears to have gone totally insane!

Decompressor handoff to regular kernel code

Let’s start again from scratch. Set a breakpoint at the point where the uncompressed kernel is supposed to begin executing:

(gdb) b *0xc0008000
Breakpoint 6 at 0xc0008000: file arch/arm/kernel/head.S, line 501.
(gdb) c
Continuing.

Breakpoint 6, stext () at arch/arm/kernel/head.S:501
501             mov     r0, r0
(gdb) p $pc
$11 = (void (*)()) 0xc0008000 <stext>

This is strange: program counter is in the expected location, but we’re on line 501 into head.S, rather than closer to the beginning of the file. The reason is that we have incorrectly instructed GDB that the entire vmlinux starts at 0xC0008000, instead of just the first section. We can fix it by clearing the symbol file, re-loading the symbols at their natural link address, and verifying everything makes sense:

(gdb) symbol-file
Error in re-setting breakpoint 1: No source file named handoff.S.
No symbol file now.
(gdb) file build/vmlinux
Reading symbols from build/vmlinux...
(gdb) p/x &stext
$15 = 0xc0008000
(gdb) si
__hyp_stub_install () at arch/arm/kernel/hyp-stub.S:73
73              store_primary_cpu_mode  r4, r5
(gdb) finish
Run till exit from #0  __hyp_stub_install () at arch/arm/kernel/hyp-stub.S:73
stext () at arch/arm/kernel/head.S:105
105             safe_svcmode_maskall r9

Now we’re simply running through the beginning of the normal kernel start in section ENTRY(stext) in file arch/arm/kernel/head.S. By single stepping through the code, we can find the exact section where things go badly wrong:

stext () at arch/arm/kernel/head.S:162
162             badr    lr, 1f                          @ return (PIC) address
167             mov     r8, r4                          @ set TTBR1 to swapper_pg_dir
169             ldr     r12, [r10, #PROCINFO_INITFUNC]
170             add     r12, r12, r10
171             ret     r12

__v7_ca7mp_setup () at arch/arm/mm/proc-v7.S:302
302             do_invalidate_l1
0xc01197fc      302             do_invalidate_l1
0xc0119800      302             do_invalidate_l1
0xc0119804      302             do_invalidate_l1

v7_invalidate_l1 () at arch/arm/mm/cache-v7.S:40
40              mov     r0, #0
41              mcr     p15, 2, r0, c0, c0, 0   @ select L1 data cache in CSSELR
(gdb)
0x2fff2f08 in ?? ()

We see that after the last mcr instruction, the code lands up in SYSRAM instead of the DDR, from where we’ve been executing so far. That address corresponds to the vectors as have been installed by the bootloader; in particular, we have gotten into the dummy SVC handler.

Let’s examine the program instructions at the point just before where the failure occurs:

Breakpoint 7, v7_invalidate_l1 () at arch/arm/mm/cache-v7.S:40
40              mov     r0, #0
(gdb) x/4x $pc
0xc0118b2c <v7_invalidate_l1>:  0xe3a00000      0x2f400f10      0xffffffff      0xee300f10

Very interesting! The expected instruction, 0xe3a00000, is followed by 0x2f400f10 and 0xffffffff. The first one is the “mystery” SVC call, and the second one is simply undefined:

(gdb) set {int}0xc0000000 = 0x2f400f10
(gdb) x/i 0xc0000000
   0xc0000000:  svccs   0x00400f10
(gdb) set {int}0xc0000000 = 0xffffffff
(gdb) x/i 0xc0000000
   0xc0000000:                  @ <UNDEFINED> instruction: 0xffffffff

For comparison, here’s the instructions we expect to find from the disassembly of the ELF file:

$ arm-linux-gnueabi-objdump -d linux/vmlinux | grep -A 4 "v7_invalidate_l1"
c0118b2c <v7_invalidate_l1>:
c0118b2c:       e3a00000        mov     r0, #0
c0118b30:       ee400f10        mcr     15, 2, r0, cr0, cr0, {0}
c0118b34:       f57ff06f        isb     sy
c0118b38:       ee300f10        mrc     15, 1, r0, cr0, cr0, {0}

DDR corruption pattern

Let’s compare the binary pattern between the expected and actual instructions:

Expected: 0xee400f10 = 0b11101110010000000000111100010000
Actual:   0x2f400f10 = 0b00101111010000000000111100010000
---------------------------------------------------------
Diff:       ^^           ^^     ^

Three bits have been flipped in this instruction, changing it from mcr to svc. This could be explained if DDR is miswired or misconfigured. However, the pattern of data corruption is repeatable: reboot after reboot, the same instruction gets corrupted in exactly the same way!

To prove that the DDR is capable of holding data at this address, we can write it manually and step through the instructions without any weird jumps to vectors:

(gdb) x/4x $pc
0xc0118b2c <v7_invalidate_l1>:  0xe3a00000      0x2f400f10      0xffffffff      0xee300f10
(gdb) set {int}0xc0118b30 = 0xee400f10
(gdb) set {int}0xc0118b34 = 0xf57ff06f
(gdb) x/4x $pc
0xc0118b2c <v7_invalidate_l1>:  0xe3a00000      0xee400f10      0xf57ff06f      0xee300f10
(gdb) si
41              mcr     p15, 2, r0, c0, c0, 0   @ select L1 data cache in CSSELR
42              isb
43              mrc     p15, 1, r0, c0, c0, 0   @ read cache geometry from CCSIDR
45              movw    r3, #0x3ff

We can also load and run the decompressor as usual and set a breakpoint to 0xC0008000, where the uncompressed kernel is supposed to take over. Then, we simply overwrite whatever the decompressor has written from gdb:

(gdb) restore build/Image binary 0xc0008000
Restoring binary file build/Image into memory (0xc0008000 to 0xc1241f48)
(gdb) c

Nothing has been printed to the console, since apparently the decompressor disabled the console, but if we stop the debugger (Ctrl-C), we see that the kernel proceeded with the boot and finally came to a stop when mounting the root filesystem (understandable, since we haven’t given it a rootfs yet):

(gdb) bt
#0  0xc0b87034 in __timer_delay (cycles=63999) at arch/arm/lib/delay.c:50
#1  0xc0bb2238 in panic (fmt=0xc0defa0c "VFS: Unable to mount root fs on %s") at kernel/panic.c:451
#2  0xc1001878 in mount_block_root (name=0x51 <error: Cannot access memory at address 0x51>, name@entry=0xc0defaa0 "/dev/root", flags=3900) at init/do_mounts.c:432
#3  0xc1001b50 in mount_root () at init/do_mounts.c:592
#4  0xc1001cc8 in prepare_namespace () at init/do_mounts.c:644
#5  0xc1001448 in kernel_init_freeable () at init/main.c:1644
#6  0xc0bc5f18 in kernel_init (unused=<optimized out>) at init/main.c:1519
#7  0xc0100148 in ret_from_fork () at arch/arm/kernel/entry-common.S:148

Deterministic DDR corruption

Let’s assume that the data corruption is deterministic (repeatable) because it is caused by a voltage drop. Since the voltage drop corresponds to the CPU/DDR activity, the same activity causes the same voltage drop, which causes the same corruption.

Let’s check the same instruction at different supply voltages. At 1.35V, 1.30V, 1.25V, the corruption is:

0xc0118b2c <v7_invalidate_l1>:  0xe3a00000 0x2f400f10 0x00000000 0xee300f10

At 1.20V, the pattern is more interesting: the third instruction gets corrupted each time, but differently each reset:

0xc0118b2c <v7_invalidate_l1>:  0xe3a00000 0x2f400f10 0xe464f8f6 0xee300f10
# or this one:
0xc0118b2c <v7_invalidate_l1>:  0xe3a00000 0x2f400f10 0xcbfd2cb6 0xee300f10
# or this one:
0xc0118b2c <v7_invalidate_l1>:  0xe3a00000 0x2f400f10 0xaefc67e9 0xee300f10

Even more strange: restoring voltage back up to 1.35V, the third instruction now gets corrupted differently every time, while the first and last are always correct, and the second one is always corrupted the same way.

Check SD card and bootloader copy integrity

One obvious way that data corruption could happen is the if the compressed zImage was written wrong to the SD card, or if the bootloader writes it to DDR wrong. First, we check how big the zImage is, and then ask the debugger to dump the data from the DDR to a file, at the point just before the handoff from the bootloader into the decompressor:

$ ls -l linux/arch/arm/boot/zImage
-rwxr-xr-x 1 jk jk 7461288 Jan  7 11:09 linux/arch/arm/boot/zImage

Breakpoint 1, handoff_jump () at src/handoff.S:93
93         smc #0
(gdb) dump binary memory dump.bin 0xC2000000 0xC271d9a8

We see that the original image is identical to the one we obtained from the dump, so the SD card and bootloader writes are not corrupted:

9040ec8b8da5e613aa6e56060cc0cacf6779eec670c3a4123177cd07aff63300  zImage
9040ec8b8da5e613aa6e56060cc0cacf6779eec670c3a4123177cd07aff63300  dump.bin

Test DDR using STM32DDRFW-UTIL

ST provides a utility which they recommend to run as a part of any new PCB bring-up. I have done that already and did not think much of it since all tests passed. Let’s take a closer look.

My “version” of the utility can be found in this repository. I made two small changes: instead of requiring the complicated “Cube” software suite, there is a simple Makefile so that the whole utility can be compiled easily with a single make invocation. Second, I have commented out the three or so lines that initialize the STPMIC1, since my board does not use that power controller.

Let’s load the utility through the debugger, since it is running already:

(gdb) file build/fwutil.elf
Reading symbols from build/fwutil.elf...
(gdb) load
Loading section .RESET, size 0xe000 lma 0x2ffe0000
Loading section .ARM, size 0x8 lma 0x2ffee000
Loading section .init_array, size 0x4 lma 0x2ffee008
Loading section .fini_array, size 0x4 lma 0x2ffee00c
Loading section .data, size 0x7fa lma 0x2ffee010
Start address 0x2ffe0000, load size 59402
Transfer rate: 260 KB/sec, 7425 bytes/write.
(gdb) c
Continuing.

On the serial console, we are greeted with the expected prompt:

=============== UTILITIES-DDR Tool ===============
Model: STM32MP13XX_DK
RAM: DDR3-1066 bin F 1x4Gb 533MHz v1.53
0:DDR_RESET
DDR>

As the utility readme instructs us, let us enter the DDR_READY step and then execute all the tests:

DDR>step 3
step to 3:DDR_READY
1:DDR_CTRL_INIT_DONE
2:DDR_PHY_INIT_DONE
3:DDR_READY
DDR>test 0
result 1:Test Simple DataBus = Passed
result 2:Test DataBusWalking0 = Passed
result 3:Test DataBusWalking1 = Passed
result 4:Test AddressBus = Passed
result 5:Test MemDevice = Passed
result 6:Test SimultaneousSwitchingOutput = Passed
result 7:Test Noise = Passed
result 8:Test NoiseBurst = Passed
result 9:Test Random = Passed
result 10:Test FrequencySelectivePattern = Passed
result 11:Test BlockSequential = Passed
result 12:Test Checkerboard = Passed
result 13:Test BitSpread = Passed
result 14:Test BitFlip = Passed
result 15:Test WalkingZeroes = Passed
result 16:Test WalkingOnes = Passed
Result: Pass [Test All]

This takes about a second to complete, and on the scope trace monitoring the 1.35V supply we see a tiny (maybe 2-5mV) dip during this time.

After all the tests are done, we can use the save command to get the DDR parameters from the utility. Here are the dynamic ones, reporting on the status:

/* ctl.dyn */
#define DDR_STAT 0x00000001
#define DDR_INIT0 0x4002004e
#define DDR_DFIMISC 0x00000001
#define DDR_DFISTAT 0x00000001
#define DDR_SWCTL 0x00000001
#define DDR_SWSTAT 0x00000001
#define DDR_PCTRL_0 0x00000001

/* phy.dyn */
#define DDR_PIR 0x00000000
#define DDR_PGSR 0x0000001f
#define DDR_ZQ0SR0 0x80021dee
#define DDR_ZQ0SR1 0x00000000
#define DDR_DX0GSR0 0x00008001
#define DDR_DX0GSR1 0x00000000
#define DDR_DX0DLLCR 0x40000000
#define DDR_DX0DQTR 0xffffffff
#define DDR_DX0DQSTR 0x3db02001
#define DDR_DX1GSR0 0x00008001
#define DDR_DX1GSR1 0x00000000
#define DDR_DX1DLLCR 0x40000000
#define DDR_DX1DQTR 0xffffffff
#define DDR_DX1DQSTR 0x3db02001

All the other parameters returned from the utility are identical to the values already used in the bootloader. Thus, I hope I can assume that the DDR configuration in the bootloader is identical to the one used in the bootloader.

When does data get corrupted

Above we have found that while decompression appears to finish successfully, it in fact leaves behind lots of partially corrupted data. The uncompressed kernel starts executing, only the trip into the SVC handler because of a corrupted instruction. Now, let’s try to track down exactly when the data first gets corrupted.

As seen above, in the current configuration, decompression takes place in the __gunzip routine (decompress_inflate.c). The decompression is done by zlib_inflate() (lib/zlib_inflate/inflate.c). First, clear the memory location that we’re interested in observing:

set {unsigned int}0xc0118b2c = 0x0
set {unsigned int}0xc0118b30 = 0x0
set {unsigned int}0xc0118b34 = 0x0
set {unsigned int}0xc0118b38 = 0x0

Verify it has been cleared:

(gdb) x/4x 0xc0118b2c
0xc0118b2c:     0x00000000      0x00000000      0x00000000      0x00000000

Some interesting breakpoints:

(gdb) b *0xc2001878
Breakpoint 20 at 0xc2001878: file arch/arm/boot/compressed/../../../../lib/zlib_inflate/inflate.c, line 63.
(gdb) b *0xc2001fa4
Breakpoint 34 at 0xc2001fa4: file arch/arm/boot/compressed/../../../../lib/zlib_inflate/inflate.c, line 582.

As it turns out, the corruption appears after the second call to inflate_fast:

(gdb) c
Continuing.

Breakpoint 36, zlib_inflate (strm=0xc271ea44, strm@entry=0xc271e9c0, flush=1072676126, flush@entry=0) at arch/arm/boot/compressed/../../../../lib/zlib_inflate/inflate.c:582
582                     inflate_fast(strm, out);
(gdb) x/4x 0xc0118b2c
0xc0118b2c:     0x00000000      0x00000000      0x00000000      0x00000000
(gdb) c
Continuing.

Breakpoint 36, zlib_inflate (strm=0xc271ea44, strm@entry=0xc271e9c0, flush=1072590367, flush@entry=0) at arch/arm/boot/compressed/../../../../lib/zlib_inflate/inflate.c:582
582                     inflate_fast(strm, out);
(gdb) x/4x 0xc0118b2c
0xc0118b2c:     0xe3a00000      0x2f400f10      0xffedecfd      0xee300f1

While we press c (or continue) in GDB, inflate_fast() runs and very briefly (about 3.5ms), a voltage drop of about 30–40mV is observed on the 1.35V supply. In the same period, VREF_DDR0, VREF_DDR1, and VREF_DDR2 droops are barely perceptible.

We can go a step further and set a watchpoint, so the debugger triggers on the first access of the given memory location:

(gdb) watch *(uint32_t *)0xc0118b2c
Hardware watchpoint 38: *(uint32_t *)0xc0118b2c

Set the memory locations to zero as before, and after the watchpoint triggers, single step through the execution and each time check the memory. Skipping ahead many such steps, we see how the value gets progressively filled in:

0xc0118b2c:     0xe3a00000      0x00000000      0x00000000      0x00000000
0xc0118b2c:     0xe3a00000      0x00000010      0x00000000      0x00000000
0xc0118b2c:     0xe3a00000      0x00000f10      0x00000000      0x00000000
0xc0118b2c:     0xe3a00000      0x00400f10      0x00000000      0x00000000
0xc0118b2c:     0xe3a00000      0x2f400f10      0x00000000      0x00000000

We see how it fills up in steps of half byte: zero, 10, 0f, 40, 2f. That final 2f is erroneous; it should be ee as we have seen previously in the disassembly of vmlinux.

The code loop that populates this word can be found in lib/zlib_inflate/inffast.c, lines 119 through 308; in particular, the line that wrote the incorrect 2f is number 247, in the middle of this section:

/* Align out addr */
if (!((long)(out - 1) & 1)) {
   *out++ = *from++;
   len--;
}

Key insight: 8-bit corruption

Let’s recap the situation so far. DDR appears to work as far as my own tests are concerned: I can fill the memory with pseudo-random data and read it all back correctly. The STM32DDRFW-UTIL tests all pass. The kernel runs if it’s loaded into memory uncompressed, but the decompression fails. Remembering further back, when writing the bootloader I had to force all DDR writes to be 32-bit aligned. All of this brings to mind the quote from Jay Carlson:

if your design doesn’t work, length-tuning is probably the last thing you should be looking at. For starters, make sure you have all the pins connected properly — even if the failures appear intermittent. For example, accidentally swapping byte lane strobes / masks (like I’ve done) will cause 8-bit operations to fail without affecting 32-bit operations. Since the bulk of RAM accesses are 32-bit, things will appear to kinda-sorta work.

Let’s take a good hard look at the connections on my custom board (Rev A) between the memory chip (MT41K256M16TW-107:P TR) and the SoC (STM32MP135FAE):

DDR pin	DDR signal	SoC signal	SoC pin	Notes
`M2`	`BA0`	`BA0`	`G17`
`N8`	`BA1`	`BA1`	`L16`
`M3`	`BA2`	`BA2`	`G13`
`N3`	`A0`	`A0`	`G16`
`P7`	`A1`	`A1`	`K15`
`P3`	`A2`	`A2`	`F17`
`N2`	`A3`	`A3`	`G15`
`P8`	`A4`	`A4`	`M14`
`P2`	`A5`	`A5`	`E16`
`R8`	`A6`	`A6`	`M17`
`R2`	`A7`	`A7`	`G14`
`T8`	`A8`	`A8`	`L15`
`R3`	`A9`	`A9`	`F16`
`L7`	`A10/AP`	`A10`	`J14`
`R7`	`A11`	`A11`	`K13`
`N7`	`A12/BC#`	`A12`	`K17`
`T3`	`A13`	`A13`	`F14`
`T7`	`A14`	`A14`	`L17`
`D3`	`UDM`	`DQM0`	`D15`
`E7`	`LDM`	`DQM1`	`N14`
`B7`	`UDQS#`	`DQS0N`	`C16`
`C7`	`UDQS`	`DQS0P`	`C17`
`G3`	`LDQS#`	`DQS1N`	`R16`
`F3`	`LDQS`	`DQS1P`	`R17`
`E3`	`DQ0`	`DQ4`	`B16`
`F7`	`DQ1`	`DQ2`	`C13`
`F2`	`DQ2`	`DQ0`	`B17`
`F8`	`DQ3`	`DQ5`	`D16`
`H3`	`DQ4`	`DQ3`	`D17`
`H8`	`DQ5`	`DQ7`	`E15`
`G2`	`DQ6`	`DQ1`	`C15`
`H7`	`DQ7`	`DQ6`	`E14`
`D7`	`DQ8`	`DQ8`	`N16`
`C3`	`DQ9`	`DQ9`	`P17`
`C8`	`DQ10`	`DQ10`	`N15`
`C2`	`DQ11`	`DQ15`	`T16`
`A7`	`DQ12`	`DQ11`	`P15`
`A2`	`DQ13`	`DQ12`	`R15`
`B8`	`DQ14`	`DQ13`	`P16`
`A3`	`DQ15`	`DQ14`	`T17`
`K3`	`CASN`	`CASN`	`J15`
`K9`	`CKE`	`CKE`	`K14`	10k pulldown
`K7`	`CK#`	`CLKN`	`J17`	100R to CK at DDR
`J7`	`CK`	`CLKP`	`J16`
`L2`	`CS#`	`CSN`	`H16`
`K1`	`ODT`	`ODT`	`H15`
`J3`	`RAS#`	`RASN`	`H17`
`T2`	`RESET#`	`RESETN`	`E17`	10k pulldown
`L3`	`WE#`	`WEN`	`H13`

Let’s check carefully what the DDR datasheet considers “upper” vs “lower”:

DQ[7:0] Lower byte of bidirectional data bus for the x16 configuration.

DQ[15:8] Upper byte of bidirectional data bus for the x16 configuration.

In other words, we should have mapped DQ[7:0] together with the DDR signals LDM and LDQS, while the upper byte DQ[15:8] should have been placed together with UDM and USDQS. Looking at the table above, we see that the mask/strobe signals are swapped:

DDR:UDM → SoC:DQM0
DDR:LDM → SoC:DQM1

But the data bits are not swapped, so this is incorrect:

DDR:DQ[7:0]  → SoC[7:0]  (scrambled)
DDR:DQ[15:8] → SoC[15:8] (scrambled)

My confusion can be traced back to the eval board design, which similarly swaps the mask/strobe wires, except they also (correctly) swap the two DQ lanes. AI seems to be of little use: I can easy convince them either way regarding the correctness of my “semi-byte swap”.

Simple software test for DDR correctness

We saw above that the official ST DDR utility did not detect any problems with my incorrectly-wired DDR. After some prompting, Gemini 3 gave me the following test:

void ddr_align_test(int argc, uint32_t arg1, uint32_t arg2, uint32_t arg3)
{
    (void)argc; (void)arg1; (void)arg2; (void)arg3;
    uint32_t sctlr;

    // 1. READ SCTLR
    __asm__ volatile("mrc p15, 0, %0, c1, c0, 0" : "=r" (sctlr));
    
    // 2. DISABLE CACHE (Bit 2) AND MMU (Bit 0)
    uint32_t sctlr_disabled = sctlr & ~((1 << 2) | (1 << 0));
    __asm__ volatile("mcr p15, 0, %0, c1, c0, 0" : : "r" (sctlr_disabled));
    __asm__ volatile("isb sy"); // Instruction sync barrier

    my_printf("!!! CACHE DISABLED !!! Testing raw hardware wires...\r\n");

    volatile uint8_t *p8 = (volatile uint8_t *)0xc0001000;
    
    // Perform a partial write
    p8[0] = 0xAA;
    __asm__ volatile("dsb sy"); // Force pin toggle
    
    if (p8[0] != 0xAA) {
        my_printf("FAILURE DETECTED: Byte 0 is 0x%02x (expected 0xAA)\r\n", p8[0]);
    } else {
        my_printf("SUCCESS: Byte 0 worked without cache.\r\n");
    }

    // 3. RE-ENABLE CACHE
    __asm__ volatile("mcr p15, 0, %0, c1, c0, 0" : : "r" (sctlr));
    __asm__ volatile("isb sy");
}

On the evaluation board, the printout is:

Eval board: !!! CACHE DISABLED !!! Testing raw hardware wires... 
SUCCESS: Byte 0 worked without cache.

On my board:

!!! CACHE DISABLED !!! Testing raw hardware wires...
FAILURE DETECTED: Byte 0 is 0x55 (expected 0xAA)

Next steps

While the explanation in the previous section (swapped byte lanes) seems plausible enough to stop debugging at this point and wait for “Rev B”, in the process I noted other possible avenues to explore:

Lower slew rate / drive strength or increase output impedance, to reduce crosstalk
Disable data masking entirely, if DDR PHY supports it
Disable cache during decompression?
Try out slower slew-rate settings and increasing output impedance for DDR controller
Lower DDR frequency and see if the corruption pattern is the same, worse, better?
Experiment: Run bootloader, Run FWUTIL, Do NOT reset, Jump directly into Linux
Add more capacitance to the VREF nodes (1uF in parallel with the current 0.1uF)
Try to read out from DDR PHY registers the per-byte DQS delays, and per-bit DQ delays, and compare with PCB geometry
Repeat training again and again and see if there’s any variations (can I detect training failures?)
Read out write levelling and DQS delay (read leveling) calibration results
My usual CPU-based DDR tests do not uncover a single bit flip, while the heavily cached kernel decompressor shows huge corruption in the decompressed output. How to reproduce that in my own code? Could the caches be misconfigured, so they are somehow inappropriate for my PCB while being fine on the eval board? Maybe caches don’t do the same kind of training that DDR does.

LSB swizzling

Just because we found one issue with my connections, it does not mean we have found all of them. From the same article by Jay Carlson:

Because DDR memory doesn’t care about the order of the bits getting stored, you can swap individual bits — except the least-significant one if you’re using write-leveling — in each byte lane with no issues.

I have not been able to find any evidence of the LSB swapping restriction in ST literature (datasheet, reference manual, app notes). Indeed, one app note^[1] just says that the DDR3L connection features “two swappable bytes, and swappable bits in the same byte”.

However, the MT41K DDR3L datasheet includes a section on Write Leveling which explains what’s up:

For better signal integrity, DDR3 SDRAM memory modules have adopted fly-by topology for the commands, addresses, control signals, and clocks. Write leveling is a scheme for the memory controller to adjust or de-skew the DQS strobe (DQS, DQS#) to CK relationship at the DRAM with a simple feedback feature provided by the DRAM. Write leveling is generally used as part of the initialization process, if required. For normal DRAM operation, this feature must be disabled. […]

When write leveling is enabled, the rising edge of DQS samples CK, and the prime DQ outputs the sampled CK’s status. The prime DQ for a x4 or x8 configuration is DQ0 with all other DQ (DQ[7:1]) driving LOW. The prime DQ for a x16 configuration is DQ0 for the lower byte and DQ8 for the upper byte.

So, just in case, we should make sure not to “swizzle” the two LSBs in each byte.

All Articles in This Series

Application note AN5692: DDR memory routing guidelines for STM32MP13x product lines. January 2023. ↩︎

Agents

LLM problems observed in humans

Published 7 Jan 2026. Written by Jakob Kastelic.

While some are still discussing why computers will never be able to pass the Turing test, I find myself repeatedly facing the idea that as the models improve and humans don’t, the bar for the test gets raised and eventually humans won’t pass the test themselves. Here’s a list of what used to be LLM failure modes but that are now more commonly observed when talking to people.

Don’t know when to stop generating

This has always been an issue in conversations: you ask a seemingly small and limited question, and in return have to listen to what seems like hours of incoherent rambling. Despite exhausting their knowledge of the topic, people will keep on talking about stuff you have no interest in. I find myself searching for the “stop generating” button, only to remember that all I can do is drop hints, or rudely walk away.

Small context window

The best thing about a good deep conversation is when the other person gets you: you explain a complicated situation you find yourself in, and find some resonance in their replies. That, at least, is what happens when chatting with the recent large models. But when subjecting the limited human mind to the same prompt—a rather long one—again and again the information in the prompt somehow gets lost, their focus drifts away, and you have to repeat crucial facts. In such a case, my gut reaction is to see if there’s a way to pay to upgrade to a bigger model, only to remember that there’s no upgrading of the human brain. At most what you can do is give them a good night’s sleep and then they may possibly switch from the “Fast” to the “Thinking” mode, but that’s not guaranteed with all people.

Too narrow training set

I’ve got a lot of interests and on any given day, I may be excited to discuss various topics, from kernels to music to cultures and religions. I know I can put together a prompt to give any of today’s leading models and am essentially guaranteed a fresh perspective on the topic of interest. But let me pose the same prompt to people and more often then not the reply will be a polite nod accompanied by clear signs of their thinking something else entirely, or maybe just a summary of the prompt itself, or vague general statements about how things should be. In fact, so rare it is to find someone who knows what I mean that it feels like a magic moment. With the proliferation of genuinely good models—well educated, as it were—finding a conversational partner with a good foundation of shared knowledge has become trivial with AI. This does not bode well for my interest in meeting new people.

Repeating the same mistakes

Models with a small context window, or a small number of parameters, seem to have a hard time learning from their mistakes. This should not be a problem for humans: we have a long term memory span measured in decades, with emotional reinforcement of the most crucial memories. And yet, it happens all too often that I must point out the same logical fallacy again and again in the same conversation! Surely, I think, if I point out the mistake in the reasoning, this will count as an important correction that the brain should immediately make use of? As it turns out, there seems to be some kind of a fundamental limitation on how quickly the neural connections can get rewired. Chatting with recent models, who can make use the extra information immediately, has deteriorated my patience regarding having to repeat myself.

Failure to generalize

By this point, it’s possible to explain what happens in a given situation, and watch the model apply the lessons learned to a similar situation. Not so with humans. When I point out that the same principles would apply elsewhere, their response will be somewhere along the spectrum of total bafflement on the one end and on the other, a face-saving explanation that the comparison doesn’t apply “because it’s different”. Indeed the whole point of comparisons is to apply same principles in different situations, so why the excuse? I’ve learned to take up such discussions with AI and not trouble people with them.

Failure to apply to specific situation

This is the opposite issue: given a principle stated in general terms, the person will not be able to apply it in a specific situation. Indeed, I’ve had a lifetime of observing this very failure mode in myself: given the laws of physics, which are typically “obvious” and easy to understand, I find it very difficult to calculate how long before the next eclipse. More and more, rather than think these things through myself, I’d just send a quick prompt to the most recent big model, and receive a good answer in seconds. In other words, models threaten to sever me not only from other flawed humans, but from my own “slow” thinking as well!

Persistent hallucination

Understood in the medical sense, hallucination refers to when something appears to be real even as you know very well it isn’t. Having no direct insight into the “inner mental life” of models, we claim that every false fact they spit out is a form of hallucination. The meaning of the word is shifting from the medical sense towards the direction of “just being wrong, and persistently so”. This has plagued human speech for centuries. As a convenient example, look up some heated debate between proponents of science and those of religion. (As if the two need be in conflict!) When a model exhibits hallucination, often providing more context and evidence will dispel it, but the same trick does not appear to work so well on humans.

Conclusion

Where to go from here? One conclusion is that LLMs are damaging the connection people feel with each other, much like a decade before social networks threatened to destroy it by replacing it with a shallower, simulated versions. Another interpretation would be to conclude cynically that it’s time humans get either enhanced or replaced by a more powerful form of intelligence. I’d say we’re not there yet entirely, but that some of the replacement has been effected already: I’ll never again ask a human to write a computer program shorter than about a thousand lines, since an LLM will do it better.

Indeed, why am I even writing this? I asked GPT-5 for additional failure modes and found more additional examples than I could hope to get from a human:

Beyond the failure modes already discussed, humans also exhibit analogues of several newer LLM pathologies: conversations often suffer from instruction drift, where the original goal quietly decays as social momentum takes over; mode collapse, in which people fall back on a small set of safe clichés and conversational templates; and reward hacking, where social approval or harmony is optimized at the expense of truth or usefulness. Humans frequently overfit the prompt, responding to the literal wording rather than the underlying intent, and display safety overrefusal, declining to engage with reasonable questions to avoid social or reputational risk. Reasoning is also marked by inconsistency across turns, with contradictions going unnoticed, and by temperature instability, where fatigue, emotion, or audience dramatically alters the quality and style of thought from one moment to the next.

Linux

Build Linux for STM32MP135 in under 50 Lines of Makefile

Published 6 Jan 2026, modified 9 Jan 2026. Written by Jakob Kastelic.

This is Part 7 in the series: Linux on STM32MP135. See other articles.

In the previous article we took a custom STM32MP135 board from a simple LED blink to passing the kernel early boot stage, printing the “Booting Linux” message. Now, it’s time to finish the kernel initialization all the way up to running our first process: the init process.

We’ll do it in two steps. First, we make it run on the official evaluation board for the SoC. In a future article, we will consider what needs to be changed in order to make this work on a custom board.

Boot Linux on eval board

First, we need to obtain and build the bootloader. Note that we need to enable the STPMIC1, since it is used on the eval board:

git clone git@github.com:js216/stm32mp135-bootloader.git
cd stm32mp135-bootloader
make CFLAGS_EXTRA=-DUSE_STPMIC1x=1
cd ..

Next, we obtain the Linux kernel from the ST repository (contains a few non-standard ST-provided drivers):

git clone https://github.com/STMicroelectronics/linux.git
git checkout v6.1-stm32mp-r1.1

Let’s apply some patches (mainly to allow non-secure boot without U-Boot, OPTEE, or TF-A), and copy over the Device Tree Source (DTS), and the kernel configuration:

git clone git@github.com:js216/stm32mp135_test_board.git

cd linux
git linux apply ../configs/evb/patches/linux/*.patch
cd ..

cp config/evb/linux.config linux/.config
cp config/evb/board.dts linux/arch/arm/boot/dts/

Now we can build the Device Tree Blob (DTB) and the kernel itself:

cd linux
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- board.dtb
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- zImage
cd ..

Next, we need an init script. (Of course, you can also run the kernel without it, but be prepared for a kernel panic at the end of the boot, telling you the init is missing.) An init script can be essentially any program, even a “Hello, world!”, but if the init program quits, the kernel enters a panic again.

I asked AI to write a minimal init, without any C standard library dependencies (find the result here). Let’s compile it, making sure to tell the compiler to not link any extra code with it:

arm-linux-gnueabihf-gcc -Os -nostdlib -static -fno-builtin \
   -Wl,--gc-sections config/init.c -o build/init

Now that we have an init program, we need a root filesystem to put it on:

mkdir -p build/rootfs.dir/sbin
cp build/init build/rootfs.dir/sbin/init
dd if=/dev/zero of=build/rootfs bs=1M count=10
mke2fs -t ext4 -F -d build/rootfs.dir build/rootfs

Finally, we collect all the pieces together with a simple Python script included in the bootloader distribution:

python3 bootloader/scripts/sdimage.py build/sdcard.img \
   bootloader/build/main.stm32 \
   linux/arch/arm/boot/dts/board.dtb \
   linux/arch/arm/boot/zImage \
   --partition build/rootfs

Write this image to the SD card and start the system, and prepare to be greeted by the very useless shell implemented in the minimal init program):

[    1.940577] Run /sbin/init as init process
Hello, world!
$ ls
ls: command not found
$ Hey!
Hey!: command not found

That’s it!

The Makefile

Here’s the full 49 lines:

CONFIG_DIR := configs/custom
CROSS_COMPILE = arm-linux-gnueabihf-
LINUX_OPTS = ARCH=arm CROSS_COMPILE=$(CROSS_COMPILE)

all: boot config dtb kernel init root sd

boot:
	$(MAKE) -C bootloader -j$(shell nproc) CFLAGS_EXTRA=-DUSE_STPMIC1x=1

patch:
	for p in $(CONFIG_DIR)/patches/linux/*.patch; do \
		if git -C linux apply --check ../$$p; then \
			git -C linux apply ../$$p; \
		fi \
	done

config:
	cp $(CONFIG_DIR)/linux.config linux/.config

dtb:
	cp $(CONFIG_DIR)/board.dts linux/arch/arm/boot/dts/
	$(MAKE) -C linux $(LINUX_OPTS) board.dtb

kernel:
	$(MAKE) -C linux $(LINUX_OPTS) -j$(shell nproc) zImage

init:
	mkdir -p build
	$(CROSS_COMPILE)gcc -Os -nostdlib -static -fno-builtin \
		-Wl,--gc-sections $(CONFIG_DIR)/init.c -o build/init

root:
	rm -rf build/rootfs.dir
	mkdir -p build/rootfs.dir/sbin
	cp build/init build/rootfs.dir/sbin/init
	dd if=/dev/zero of=build/rootfs bs=1M count=10
	mke2fs -t ext4 -F -d build/rootfs.dir build/rootfs

sd:
	python3 bootloader/scripts/sdimage.py build/sdcard.img \
		bootloader/build/main.stm32
		linux/arch/arm/boot/dts/board.dtb \
		linux/arch/arm/boot/zImage \
		--partition build/rootfs

clean:
	$(MAKE) -C linux $(LINUX_OPTS) clean
	$(MAKE) -C bootloader clean
	rm -rf build

Discussion

The Makefile that reproduces the steps above is less than 50 lines long and creates a minimal, bootable SD card image in a very straightforward way: build the kernel, the DTB, and a userspace program (init), and package everything into a single SD card image. The next simplest thing to accomplish the same result is the “lightweight” Buildroot, which needs nearly 100k lines of make. What could possibly be happening in all that code!?

The sentiment has been captured by the Reddit user triffid_hunter in a recent comment:

I find that the hardest part about embedded is the horrendously obtuse manufacturer-provided toolchains.

If I can find a way to ditch them and switch to gcc+Makefile+basic C libraries, that’s the first thing I’ll do.

Buildroot is a relatively clean solution to the problem of supporting a huge number of packages on a wide variety of boards, but most of that complexity is not needed for a single-board project. (Yocto is an even more complex system, which we won’t cover here—its simplicity for the user comes at the cost of massive implementation complexity.) From my point of view, all these hundreds of thousands of lines of code are simply “accidental complexity” as articulated by ESR:

Accidental complexity happens because someone didn’t find the simplest way to implement a specified set of features. Accidental complexity can be eliminated by good design, or good redesign.^[1]

The “root cause” of the highly complex toolchains has been identified by Anna-Lena Marx (inovex GmbH) in a talk^[2] last year: the goals of SoC vendors and product manufacturers are not aligned. The SoC vendor wants to show off all the features of their devices, and they want a Board Support Package (BSP) that supports several, even all, of the devices in their portfolio. They want a “turnkey solution” that allows an engineer to go from nothing to a full-featured demo in ten minutes.

In contrast, a product manufacturer who wants to use embedded Linux in their application-specific product wants a minimal software stack, as close as possible to the upstream stable versions in order to be stable, secure, & maintainable. It’s the difference between merely using the system, and owning it.

From the product side, I can concur that the SoC BSPs can be a nightmare to work with! They are simple to get started with, being a packaged “turnkey solution”, but require a massive amount of work to unpeel all the abstraction layers that the SoC vendor found necessary to support their entire ecosystem of devices. ST, being perhaps the most “hacker friendly” vendor, likely has the cleanest, most “upstreamed” offering, and still there’s loads of cruft that must be removed before getting to something workable.

I would like a world where SoC vendors ship their product with simple, straightforward documentation, rather than monolithic code examples. Give me the smallest possible building blocks and tell me how to connect them together to accomplish something, rather than give the huge all-in-one example code that can take many tens of hours to pull apart and reassemble. In other words, I expect a Linux distribution to approach to the ideal of Unix philosophy much more closely, all the more so in an embedded, resource-constrained, highly reliable application.

All Articles in This Series

Eric S. Raymond: The Art of Unix Programming. Addison-Wesley, 2004. ↩︎
Anna-Lena Marx (inovex GmbH): Your Vendor’s BSP Is Probably Not Built for Product Longevity. Yocto Project Summit, December 2025. Quoted on 1/5/2026 from this URL ↩︎

Incoherent Thoughts

Let Reality Update You

Published 5 Jan 2026. Written by Jakob Kastelic.

Reality always makes sense because it is reality. When our ideas of it do not correspond to it, it’s the ideas that are suspect.

Insofar as our understanding involves a mapping of the fluid reality to fixed ideas, we will always end up confused when refusing to let go of fixed ideas that have lost their relevance.

Staying flexible in thought is a form of mental hygiene.

Agents

Agent To Read Electronic Datasheets

Published 2 Jan 2026. Written by Jakob Kastelic.

When an electronic design company accumulates large amounts of inventory, it can become overwhelming for engineers to go through the thousands of parts to find the one needed in a new design. Instead, they are likely to select a new part from one of the distributors that have a better search engine. This leads to an ever growing inventory: parts kept in stock and never used, a constant departure from the ideal of having a “lean” operation.

Nowadays, with everyone creating their own “agent” for just about anything, I wondered how hard it would be to create my own search engine. This article represents a day of work, proving that structured data extraction from semi-unstructured sources like datasheets has become almost a trivial problem.

I took the Gemma 3 model (12B parameters, 3-bit quantization) from Google, ran it in the llama.cpp inference framework, and fed it the datasheet for an opamp. To extract the text from the PDFs, I used the Docling Python library from IBM research. The output, generated in about four minutes on a GPU with 8 GB of memory, will be in this format for now:

"PSRR (Input offset voltage versus power supply)": {
   "min": 65,
   "typ": 100,
   "max": null,
   "unit": "dB"
 },

Let’s get started!

Running the model

Obtain and build llama.cpp:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -S . -DGGML_CUDA=ON
cmake --build build -j

Obtain the Gemma 3 model.

Start the LLM server:

llama-server -m ~/temp/gemma-3-12b-it-UD-IQ3_XXS.gguf \
   --port 8080 -c 4096 -ngl 999

Open localhost:8080 and feel free to chat with the model. How simple things have become!

Get datasheet text

Next, we need to convert the datasheets from the PDF format into plain text that we can feed to the model. Assuming docling is installed (install it with Pip if not), we can define the following function to convert the documents:

import sys
from pathlib import Path
from docling.document_converter import DocumentConverter

def convert_pdf_to_markdown(pdf_file):
    pdf_path = Path(pdf_file)
    converter = DocumentConverter()
    result = converter.convert(pdf_path)
    content = result.document.export_to_markdown()
    print(content)

This yields the output in a Markdown format.

Define agent with a simple prompt

Here’s the best part: the “source code” for the agent is in plain English. Here it is in its entirety:

You are a datasheet specification extraction agent. Your
only job is to extract specifications.

OUTPUT FORMAT:
{
  "Full parameter name (short name)": {
    "min": number or null,
    "typ": number or null,
    "max": number or null,
    "unit": "string"
  }
}

EXTRACTION RULES:
- Always include both the full and short spec name in the key.
- Full name goes first, and short name in brackets: "Operating Temperature (T)"
- If a typ value is a range like "-11.5 to 14.5", split it: min=-11.5, max=14.5
- Convert scientific notation: "10 12" → 1e12
- Convert ± values into min/max fields
- Omit parameters with no numeric values (all null)
- Omit footnotes like (1) and (2)
- If no specifications exist, return: {}

CRITICAL OUTPUT RULES:
- Return ONLY valid JSON
- NO explanations
- NO descriptions
- NO phrases like "this section", "no specifications", "I will skip"
- NO text before or after the JSON
- NO markdown code blocks
- Just the raw JSON object

The insistence on pure JSON is a hack to make it stop being too chatty. There’s probably a more sophisticated way to do it, but for a first attempt it’ll do just fine.

“Chunking”

The datasheet conversion from PDF includes lots of unnecessary text like document version information, copyright, ordering information. For now, we’d like to get just the electronic specifications. As a first approximation, assume that the information is always present in tables only.

ChatGPT assures me that the following regex magic will extract tables from a Markdown document:

import re

def get_chunks(filepath):
    """Return a list of Markdown tables as strings from a file."""
    with open(filepath, "r", encoding="utf-8") as f:
        content = f.read()

    table_pattern = re.compile(
        r"(?:^\|.*\|\s*\n)"           # Header row
        r"(?:^\|[-:\s|]+\|\s*\n)"     # Separator row
        r"(?:^\|.*\|\s*\n?)+",        # Body rows
        re.MULTILINE
    )

    tables = table_pattern.findall(content)
    return [t.strip() for t in tables]

Putting it together

We have all the pieces now: text data in small pieces, a model, the prompt to define an agent. Now just iterate over all the chunks as defined above, send them to the model together with the prompt, and observe what comes out. To automate the process from PDF to the final JSON, I used a Makefile defining the recipes for the three steps of the transformation. All of this is too straightforward to be worth including here.

For anyone interested, find the entire code presented above here.