Hi all,
My desktop (i3-2120 from 2012) runs NetBSD 9 for the last 11 months. It runs 24/7 and crashes once a month. This is very bad as I rely on it being remotely accessible.
Before, it was not my computer and I don't know of any known hardware problems.
After the crash:
- the power LED is still on and the fan continues to spin. This makes me believe it's not a hardware problem.
- the monitor does not detect a signal from the graphics card. Here I'm not sure what that means. If the kernel crashes, what is supposed to be on the screen? Who is driving the graphics card/monitor?
- pings are not answered.
- there was no power outage - all other computers in the household are still running.
- BIOS is set up to "keep the power-state" after a power-failure, e.g. when it was previously on, it will be turned on. As NetBSD did not boot again, the power was never really lost.
- it was not a graceful shutdown. When NetBSD came up after a manual power-cycle, filesystem checks took place (well, the journal is replayed)
I follow NetBSD cvs netbsd-9 branch. I believe that is the stable branch. The NetBSD config is the default.
NetBSD XXX 9.1_STABLE NetBSD 9.1_STABLE (GENERIC) #2: Sun Jan 3 11:19:52 PST 2021 root@XXX:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
The last crash must have happened early morning of Feb 24 (I power-cyceled the computer at 9:00am. Last cron job activity at 3:00am.) Log files are "blank" in the early morning. /var/log/messages:
Code:
Feb 21 13:16:04 XXX /netbsd: [ 3620751.3847045] kern error: [drm:(/usr/src/sys/external/bsd/drm2/dist/drm/i915/intel_fifo_underrun.c:230)cpt_set_fifo_underrun_reporting] *ERROR* uncleared pch fifo underrun on pch transcoder A
Feb 21 13:16:04 XXX /netbsd: [ 3620751.3847045] kern error: [drm:(/usr/src/sys/external/bsd/drm2/dist/drm/i915/intel_fifo_underrun.c:381)intel_pch_fifo_underrun_irq_handler] *ERROR* PCH transcoder A FIFO underrun
Feb 21 14:33:25 XXX /netbsd: [ 3625392.1028982] kern error: [drm:(/usr/src/sys/external/bsd/drm2/dist/drm/i915/intel_fifo_underrun.c:230)cpt_set_fifo_underrun_reporting] *ERROR* uncleared pch fifo underrun on pch transcoder A
Feb 21 14:33:25 XXX /netbsd: [ 3625392.1028982] kern error: [drm:(/usr/src/sys/external/bsd/drm2/dist/drm/i915/intel_fifo_underrun.c:381)intel_pch_fifo_underrun_irq_handler] *ERROR* PCH transcoder A FIFO underrun
Feb 24 09:03:22 XXX syslogd[184]: restart
Feb 24 09:03:22 XXX /netbsd: [ 1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
No other log files have timestamps around that time (well, besides cron.log).
What do you suspect, hardware problem or kernel crash?
Is there a knob to keep the kernel in a debugger upon a crash?
What can I do to root-cause the crash?