Regression - kernel null pointer - MDIO mode issue
device: rockpro64
To reproduce: Flash a Manjaro-ARM-minimal-rockpro64-19.12.img. Linux rock5 5.4.2-2-MANJARO-ARM kernel is ok.
Do a full upgrade. (or just kernel upgrade). And you will get an oops at boot time The new kernel is Linux rock5 5.4.8-1-MANJARO-ARM.
Here
[ 64.747762] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000378
[ 64.748538] Mem abort info:
[ 64.748784] ESR = 0x96000004
[ 64.749055] EC = 0x25: DABT (current EL), IL = 32 bits
[ 64.749519] SET = 0, FnV = 0
[ 64.749789] EA = 0, S1PTW = 0
[ 64.750064] Data abort info:
[ 64.750317] ISV = 0, ISS = 0x00000004
[ 64.750652] CM = 0, WnR = 0
[ 64.750916] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e91f3000
[ 64.751478] [0000000000000378] pgd=0000000000000000
[ 64.751910] Internal error: Oops: 96000004 [#1] SMP
[ 64.752337] Modules linked in: cfg80211 rfkill 8021q garp mrp stp llc snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec rockchipdrm analogix_dp dw_mipi_dsi dw_hdmi cec rc_core panfrost drm_kms_helper pwm_fan hantro_vpu(C) gpu_sched snd_soc_simple_card drm snd_soc_simple_card_utils rockchip_rga dwmac_rk videobuf2_dma_contig drm_panel_orientation_quirks videobuf2_dma_sg snd
_soc_rockchip_i2s syscopyarea stmmac_platform v4l2_mem2mem stmmac snd_soc_rockchip_pcm sysfillrect videobuf2_vmalloc sysimgblt videobuf2_memops videobuf2_v4l2 fb_sys_fops dw_wdt phylink videobuf2_common rtc_rk808 rockchip_thermal rockchip_saradc gpio_keys
[ 64.757218] CPU: 3 PID: 580 Comm: dhcpcd Tainted: G C 5.4.8-1-MANJARO-ARM #1
[ 64.757945] Hardware name: Pine64 RockPro64 (DT)
[ 64.758350] pstate: 40000005 (nZcv daif -PAN -UAO)
[ 64.758781] pc : mdiobus_get_phy+0x4/0x20
[ 64.759162] lr : stmmac_open+0x6b8/0x850 [stmmac]
[ 64.759573] sp : ffff8000105d39b0
[ 64.759864] x29: ffff8000105d39b0 x28: ffff0000eda588c0
[ 64.760329] x27: ffff0000ed140a00 x26: 0000000000000000
[ 64.760795] x25: 0000000000000041 x24: 0000000000000000
[ 64.761260] x23: 0000000000001002 x22: ffff800009eb80b8
[ 64.761725] x21: 0000000000000000 x20: ffff0000eda58000
[ 64.762190] x19: 00000000ffffffff x18: 0000000000000000
[ 64.762655] x17: 0000000000000000 x16: 0000000000000000
[ 64.763120] x15: 0000000000000000 x14: ffffffffffffffff
[ 64.763585] x13: 0000000000000000 x12: 0000000000000020
[ 64.764050] x11: 0000000000000003 x10: 0101010101010101
[ 64.764515] x9 : ffffffffffffffff x8 : 7f7f7f7f7f7f7f7f
[ 64.764978] x7 : fefefeff646c606d x6 : 1e091448e4e5f6e9
[ 64.765443] x5 : 697665644814091e x4 : 8080808000000000
[ 64.765909] x3 : 8343c96b232bb348 x2 : ffff00000495e080
[ 64.766374] x1 : fffffffffffffff8 x0 : 0000000000000000
[ 64.766839] Call trace:
[ 64.767065] mdiobus_get_phy+0x4/0x20
[ 64.767389] __dev_open+0xfc/0x190
[ 64.767690] __dev_change_flags+0x194/0x1f0
[ 64.768058] dev_change_flags+0x20/0x60
[ 64.768397] devinet_ioctl+0x63c/0x6f8
[ 64.768727] inet_ioctl+0x2f4/0x360
[ 64.769037] sock_do_ioctl+0x44/0x2b0
[ 64.769360] sock_ioctl+0x260/0x528
[ 64.769670] do_vfs_ioctl+0x388/0x950
[ 64.769993] ksys_ioctl+0x78/0xa8
[ 64.770285] __arm64_sys_ioctl+0x1c/0x28
[ 64.770634] el0_svc_common.constprop.0+0x68/0x160
[ 64.771055] el0_svc_handler+0x20/0x80
[ 64.771387] el0_svc+0x8/0xc
[ 64.771645] Code: a8c17bfd d65f03c0 00000000 8b21cc01 (f941c020)
[ 64.772181] ---[ end trace 701692b0fbc6b1df ]---
The clash occurs because of a BUG in the kernel: mdiobus_get_phy function doesnt check if bus is NULL. But the root cause is somewhere else.
Between linux 5.4.6 and linux 5.4.7, stmmac-platform module was updated. If I reverse their patch (set mdio to true instead of false) then linux will rework again :
diff -Naur linux-5.4-7/./drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c linux-5.4-6/./drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
--- linux-5.4-7/./drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 2020-01-07 11:17:04.817218153 +0100
+++ linux-5.4-6/./drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 2020-01-07 11:16:25.770550353 +0100
@@ -320,7 +320,7 @@
static int stmmac_dt_phy(struct plat_stmmacenet_data *plat,
struct device_node *np, struct device *dev)
{
- bool mdio = false;
+ bool mdio = true;
static const struct of_device_id need_mdio_ids[] = {
{ .compatible = "snps,dwc-qos-ethernet-4.10" },
{},
They decided to set this value to false to fix "MDIO init for platforms without PHY" (see commit d3e014ec7d5ebe9644b5486bc530b91e62bbf624 in torvald's linux kernel.
For me this is a BUG in the mainline kernel