Why did my Proxmox box randomly reboot itself?

steelghost

Ars Praefectus
5,444
Subscriptor++
So I have a Proxmox server, and this morning I noticed it had just ... rebooted itself. I have looked at syslog, netdata and the BMC's logs, and I can't see any particular reason for it, but the system appears to have done it 'deliberately', rather than just hard reset for other reasons. For something that is meant to be rock solid and always on, clearly this is not ideal. System typically runs at around 65-70% RAM utilisation, and 5-25% CPU. Connectivity is via one of the X550 ports on the board to a 10GBaseT SFP+ adapter in my switch. All VMs and containers share this connection with the host OS.

What's confusing me i s that it doesn't seem like the system experienced a problem (eg OOM error causing the watchdog to trip), it looks like someone logged on and clicked "system>reboot"

Except I'm 99.9% sure no-one did that because nobody else in the house can (or knows how to, or even cares) about this system. Hence my confusion!

ASRock Rack E3C246D4U2-2T
E-2146G CPU
96GB RAM
LSI HBA (passed to TrueNAS VM) - x8 SATA SSDs (x2 VM boot, x6 ZFS pool)
Rocket 1204 PLX HBA (passed to TrueNAS VM) - x4 NVME SSDs in ZFS pool
Coral PCIe TPU

x3 SATA SSDs connected to motherboard, x2 as Proxmox ZFS mirror boot pool, x1 as storage for Frigate container
x2 NVME SSDs connected to motherboard (one via add-in card, one via extension adapter) as fast storage for Proxmox host for VMs / containers.


There doesn't seem to be any issues with high RAM or CPU utilisation before it reboots at ~0830
1741779847673.png

(I've put syslog extracts in the next post because there is a 20k character limit for a single post)

I am not sure where else to look or what to check! If anyone has any ideas I'm happy to go grab other bits and pieces , but I didn't see much point in spamming the OP with too much possibly irrelevant stuff. Thanks for reading.
 
Last edited:

steelghost

Ars Praefectus
5,444
Subscriptor++
Mar 12 07:59:01 proxmox CRON[484972]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 07:59:01 proxmox CRON[484971]: pam_unix(cron:session): session closed for user root Mar 12 08:00:01 proxmox CRON[486152]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:00:01 proxmox CRON[486153]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:00:01 proxmox CRON[486152]: pam_unix(cron:session): session closed for user root Mar 12 08:01:01 proxmox CRON[487296]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:01:01 proxmox CRON[487297]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:01:01 proxmox CRON[487296]: pam_unix(cron:session): session closed for user root Mar 12 08:02:01 proxmox CRON[488463]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:02:01 proxmox CRON[488464]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:02:01 proxmox CRON[488463]: pam_unix(cron:session): session closed for user root Mar 12 08:03:01 proxmox CRON[489542]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:03:01 proxmox CRON[489543]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:03:01 proxmox CRON[489542]: pam_unix(cron:session): session closed for user root Mar 12 08:04:01 proxmox CRON[490720]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:04:01 proxmox CRON[490721]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:04:01 proxmox CRON[490720]: pam_unix(cron:session): session closed for user root Mar 12 08:05:01 proxmox CRON[491829]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:05:01 proxmox CRON[491830]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:05:01 proxmox CRON[491829]: pam_unix(cron:session): session closed for user root Mar 12 08:06:01 proxmox CRON[492982]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:06:01 proxmox CRON[492983]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:06:01 proxmox CRON[492982]: pam_unix(cron:session): session closed for user root Mar 12 08:07:01 proxmox CRON[494088]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:07:01 proxmox CRON[494089]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:07:01 proxmox CRON[494088]: pam_unix(cron:session): session closed for user root Mar 12 08:08:01 proxmox CRON[495262]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:08:01 proxmox CRON[495263]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:08:01 proxmox CRON[495262]: pam_unix(cron:session): session closed for user root Mar 12 08:09:01 proxmox CRON[496365]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:09:01 proxmox CRON[496366]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:09:01 proxmox CRON[496365]: pam_unix(cron:session): session closed for user root Mar 12 08:10:01 proxmox CRON[497535]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:10:01 proxmox CRON[497536]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:10:01 proxmox CRON[497535]: pam_unix(cron:session): session closed for user root Mar 12 08:11:01 proxmox CRON[498635]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:11:01 proxmox CRON[498636]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:11:01 proxmox CRON[498635]: pam_unix(cron:session): session closed for user root Mar 12 08:12:01 proxmox CRON[499819]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:12:01 proxmox CRON[499820]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:12:01 proxmox CRON[499819]: pam_unix(cron:session): session closed for user root Mar 12 08:13:01 proxmox CRON[500923]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:13:01 proxmox CRON[500924]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:13:01 proxmox CRON[500923]: pam_unix(cron:session): session closed for user root Mar 12 08:14:01 proxmox CRON[502093]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:14:01 proxmox CRON[502094]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:14:01 proxmox CRON[502093]: pam_unix(cron:session): session closed for user root Mar 12 08:15:01 proxmox CRON[503198]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:15:01 proxmox CRON[503199]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:15:01 proxmox CRON[503198]: pam_unix(cron:session): session closed for user root Mar 12 08:16:01 proxmox CRON[504353]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:16:01 proxmox CRON[504354]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:16:01 proxmox CRON[504353]: pam_unix(cron:session): session closed for user root Mar 12 08:17:01 proxmox CRON[505471]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:17:01 proxmox CRON[505472]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:17:01 proxmox CRON[505473]: (root) CMD (cd / && run-parts --report /etc/cron.hourly) Mar 12 08:17:01 proxmox CRON[505474]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:17:01 proxmox CRON[505471]: pam_unix(cron:session): session closed for user root Mar 12 08:17:01 proxmox CRON[505472]: pam_unix(cron:session): session closed for user root Mar 12 08:18:01 proxmox CRON[506634]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:18:01 proxmox CRON[506635]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:18:01 proxmox CRON[506634]: pam_unix(cron:session): session closed for user root Mar 12 08:19:01 proxmox CRON[507731]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:19:01 proxmox CRON[507732]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:19:02 proxmox CRON[507731]: pam_unix(cron:session): session closed for user root Mar 12 08:20:01 proxmox CRON[508868]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:20:01 proxmox CRON[508870]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:20:01 proxmox CRON[508868]: pam_unix(cron:session): session closed for user root Mar 12 08:21:01 proxmox CRON[509961]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:21:01 proxmox CRON[509962]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:21:01 proxmox CRON[509961]: pam_unix(cron:session): session closed for user root Mar 12 08:22:01 proxmox CRON[511198]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:22:01 proxmox CRON[511199]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:22:01 proxmox CRON[511198]: pam_unix(cron:session): session closed for user root Mar 12 08:23:01 proxmox CRON[512296]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:23:01 proxmox CRON[512297]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:23:01 proxmox CRON[512296]: pam_unix(cron:session): session closed for user root Mar 12 08:24:01 proxmox CRON[513446]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:24:01 proxmox CRON[513447]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:24:01 proxmox CRON[513446]: pam_unix(cron:session): session closed for user root Mar 12 08:25:01 proxmox CRON[514553]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:25:01 proxmox CRON[514554]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:25:01 proxmox CRON[514553]: pam_unix(cron:session): session closed for user root Mar 12 08:26:01 proxmox CRON[515716]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:26:01 proxmox CRON[515717]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:26:01 proxmox CRON[515716]: pam_unix(cron:session): session closed for user root Mar 12 08:26:10 proxmox pvedaemon[352169]: worker exit Mar 12 08:26:10 proxmox pvedaemon[2532]: worker 352169 finished Mar 12 08:26:10 proxmox pvedaemon[2532]: starting 1 worker(s) Mar 12 08:26:10 proxmox pvedaemon[2532]: worker 515925 started Mar 12 08:27:01 proxmox CRON[516821]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:27:01 proxmox CRON[516822]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:27:01 proxmox CRON[516821]: pam_unix(cron:session): session closed for user root Mar 12 08:28:01 proxmox CRON[517994]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:28:01 proxmox CRON[517995]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:28:01 proxmox CRON[517994]: pam_unix(cron:session): session closed for user root Mar 12 08:29:01 proxmox CRON[519087]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:29:01 proxmox CRON[519088]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:29:01 proxmox CRON[519087]: pam_unix(cron:session): session closed for user root Mar 12 08:30:01 proxmox CRON[520260]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:30:01 proxmox CRON[520261]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:30:01 proxmox CRON[520262]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:30:01 proxmox CRON[520263]: (root) CMD ([ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi) Mar 12 08:30:01 proxmox CRON[520261]: pam_unix(cron:session): session closed for user root Mar 12 08:30:01 proxmox CRON[520260]: pam_unix(cron:session): session closed for user root Mar 12 08:31:01 proxmox CRON[521357]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:31:01 proxmox CRON[521358]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:31:01 proxmox CRON[521357]: pam_unix(cron:session): session closed for user root Mar 12 08:32:01 proxmox CRON[522528]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:32:01 proxmox CRON[522529]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:32:01 proxmox CRON[522528]: pam_unix(cron:session): session closed for user root Mar 12 08:32:41 proxmox systemd[1]: Started anacron.service - Run anacron jobs. Mar 12 08:32:41 proxmox anacron[523285]: Anacron 2.3 started on 2025-03-12 Mar 12 08:32:41 proxmox anacron[523285]: Normal exit (0 jobs run) Mar 12 08:32:41 proxmox systemd[1]: anacron.service: Deactivated successfully. Mar 12 08:33:01 proxmox CRON[523626]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:33:01 proxmox CRON[523627]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:33:01 proxmox CRON[523626]: pam_unix(cron:session): session closed for user root Mar 12 08:34:01 proxmox CRON[524788]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:34:01 proxmox CRON[524789]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:34:01 proxmox CRON[524788]: pam_unix(cron:session): session closed for user root Mar 12 08:35:01 proxmox CRON[525886]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:35:01 proxmox CRON[525887]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:35:01 proxmox CRON[525886]: pam_unix(cron:session): session closed for user root Mar 12 08:36:01 proxmox CRON[527037]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:36:01 proxmox CRON[527038]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:36:01 proxmox CRON[527037]: pam_unix(cron:session): session closed for user root Mar 12 08:37:01 proxmox CRON[528150]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:37:01 proxmox CRON[528151]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:37:01 proxmox CRON[528150]: pam_unix(cron:session): session closed for user root Mar 12 08:38:01 proxmox CRON[529315]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:38:01 proxmox CRON[529316]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:38:01 proxmox CRON[529315]: pam_unix(cron:session): session closed for user root Mar 12 08:39:01 proxmox CRON[530446]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:39:01 proxmox CRON[530447]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:39:01 proxmox CRON[530446]: pam_unix(cron:session): session closed for user root Mar 12 08:40:01 proxmox CRON[531606]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:40:01 proxmox CRON[531607]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:40:01 proxmox CRON[531606]: pam_unix(cron:session): session closed for user root Mar 12 08:41:01 proxmox CRON[532696]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:41:01 proxmox CRON[532697]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:41:02 proxmox CRON[532696]: pam_unix(cron:session): session closed for user root Mar 12 08:42:01 proxmox CRON[533872]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:42:01 proxmox CRON[533873]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:42:01 proxmox CRON[533872]: pam_unix(cron:session): session closed for user root Mar 12 08:42:12 proxmox pveproxy[446579]: worker exit Mar 12 08:42:12 proxmox pveproxy[2568]: worker 446579 finished Mar 12 08:42:12 proxmox pveproxy[2568]: starting 1 worker(s) Mar 12 08:42:12 proxmox pveproxy[2568]: worker 534153 started Mar 12 08:43:01 proxmox CRON[534961]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:43:01 proxmox CRON[534962]: (root) CMD ( /usr/bin/ipmiutil wdt -r >/dev/null 2>&1) Mar 12 08:43:01 proxmox CRON[534961]: pam_unix(cron:session): session closed for user root -- Reboot -- Mar 12 08:45:38 proxmox kernel: Linux version 6.8.12-8-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-8 (2025-01-24T12:32Z) () Mar 12 08:45:38 proxmox kernel: Command line: initrd=\EFI\proxmox\6.8.12-8-pve\initrd.img-6.8.12-8-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs i915.disable_display=1 Mar 12 08:45:38 proxmox kernel: KERNEL supported cpus: Mar 12 08:45:38 proxmox kernel: Intel GenuineIntel Mar 12 08:45:38 proxmox kernel: AMD AuthenticAMD Mar 12 08:45:38 proxmox kernel: Hygon HygonGenuine Mar 12 08:45:38 proxmox kernel: Centaur CentaurHauls Mar 12 08:45:38 proxmox kernel: zhaoxin Shanghai Mar 12 08:45:38 proxmox kernel: BIOS-provided physical RAM map: Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000005efff] usable Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x000000000005f000-0x000000000005ffff] reserved Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x0000000000060000-0x000000000009ffff] usable Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x0000000000100000-0x000000003fffffff] usable Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x0000000040000000-0x00000000403fffff] reserved Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x0000000040400000-0x000000006832efff] usable Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x000000006832f000-0x000000006832ffff] ACPI NVS Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x0000000068330000-0x0000000068330fff] reserved Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x0000000068331000-0x000000006fed4fff] usable Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x000000006fed5000-0x0000000078061fff] reserved Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x0000000078062000-0x000000007810afff] ACPI data Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x000000007810b000-0x0000000078797fff] ACPI NVS Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x0000000078798000-0x000000007befefff] reserved Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x000000007beff000-0x000000007befffff] usable Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x000000007bf00000-0x000000007f7fffff] reserved Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x00000000fe000000-0x00000000fe010fff] reserved Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved Mar 12 08:45:38 proxmox kernel: BIOS-e820: [mem 0x0000000100000000-0x000000187f7fffff] usable Mar 12 08:45:38 proxmox kernel: NX (Execute Disable) protection: active Mar 12 08:45:38 proxmox kernel: APIC: Static calls initialized Mar 12 08:45:38 proxmox kernel: e820: update [mem 0x5ec6c018-0x5ec7bc57] usable ==> usable Mar 12 08:45:38 proxmox kernel: e820: update [mem 0x5ec6c018-0x5ec7bc57] usable ==> usable Mar 12 08:45:38 proxmox kernel: e820: update [mem 0x5ec5c018-0x5ec6bc57] usable ==> usable Mar 12 08:45:38 proxmox kernel: e820: update [mem 0x5ec5c018-0x5ec6bc57] usable ==> usable Mar 12 08:45:38 proxmox kernel: e820: update [mem 0x5ec4c018-0x5ec5bc57] usable ==> usable Mar 12 08:45:38 proxmox kernel: e820: update [mem 0x5ec4c018-0x5ec5bc57] usable ==> usable Mar 12 08:45:38 proxmox kernel: e820: update [mem 0x5ec3c018-0x5ec4bc57] usable ==> usable Mar 12 08:45:38 proxmox kernel: e820: update [mem 0x5ec3c018-0x5ec4bc57] usable ==> usable Mar 12 08:45:38 proxmox kernel: e820: update [mem 0x5ebf0018-0x5ec3b457] usable ==> usable Mar 12 08:45:38 proxmox kernel: e820: update [mem 0x5ebf0018-0x5ec3b457] usable ==> usable Mar 12 08:45:38 proxmox kernel: e820: update [mem 0x5ebdf018-0x5ebef057] usable ==> usable Mar 12 08:45:38 proxmox kernel: e820: update [mem 0x5ebdf018-0x5ebef057] usable ==> usable
 

steelghost

Ars Praefectus
5,444
Subscriptor++
PBS log shows that VM getting rebooted 10m before the hypervisor goes down - suggests the system was trying to close itself down in an orderly manner?
Mar 12 06:00:01 pbs proxmox-backup-proxy[622]: starting new backup on datastore 'intel-sata' from ::ffff:192.168.0.2: "vm/101/2025-03-12T06:00:01Z" Mar 12 06:00:01 pbs proxmox-backup-proxy[622]: download 'index.json.blob' from previous backup. Mar 12 06:00:01 pbs proxmox-backup-proxy[622]: register chunks in 'drive-efidisk0.img.fidx' from previous backup. Mar 12 06:00:01 pbs proxmox-backup-proxy[622]: download 'drive-efidisk0.img.fidx' from previous backup. Mar 12 06:00:01 pbs proxmox-backup-proxy[622]: created new fixed index 1 ("vm/101/2025-03-12T06:00:01Z/drive-efidisk0.img.fidx") Mar 12 06:00:01 pbs proxmox-backup-proxy[622]: register chunks in 'drive-scsi0.img.fidx' from previous backup. Mar 12 06:00:01 pbs proxmox-backup-proxy[622]: download 'drive-scsi0.img.fidx' from previous backup. Mar 12 06:00:01 pbs proxmox-backup-proxy[622]: created new fixed index 2 ("vm/101/2025-03-12T06:00:01Z/drive-scsi0.img.fidx") Mar 12 06:00:01 pbs proxmox-backup-proxy[622]: add blob "/mnt/datastore/intel-sata/vm/101/2025-03-12T06:00:01Z/qemu-server.conf.blob" (419 bytes, comp: 419) Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: Upload statistics for 'drive-scsi0.img.fidx' Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: UUID: 0d3e7d6b7ff040cf80fb995c20198d32 Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: Checksum: e838775c629d789a04147b6bd7e1a2d0d6911046f0a822f252298fd2bbe0778d Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: Size: 3867148288 Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: Chunk count: 922 Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: Upload size: 3841982464 (99%) Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: Duplicates: 6+7 (1%) Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: Compression: 27% Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: successfully closed fixed index 2 Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: Upload statistics for 'drive-efidisk0.img.fidx' Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: UUID: a652ddb6153c425281fc5dd3584c176f Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: Checksum: 103472cccea4484ddd111d04f279d4c2a315f41326a0b0a53eb44f6ff104a796 Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: Size: 0 Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: Chunk count: 0 Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: successfully closed fixed index 1 Mar 12 06:00:17 pbs proxmox-backup-proxy[622]: add blob "/mnt/datastore/intel-sata/vm/101/2025-03-12T06:00:01Z/index.json.blob" (381 bytes, comp: 381) Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: successfully finished backup Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: backup finished successfully Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: retention options: --keep-last 20 Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: Starting prune on datastore 'intel-sata', root namespace group "vm/101" Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-09T18:00:03Z remove Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: removing backup snapshot "/mnt/datastore/intel-sata/vm/101/2025-03-09T18:00:03Z" Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-09T21:00:03Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-10T00:00:00Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-10T03:00:02Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-10T06:00:04Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-10T09:00:04Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-10T12:00:06Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-10T15:00:01Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-10T18:00:02Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-10T21:00:02Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-11T00:00:03Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-11T03:00:02Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-11T06:00:02Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-11T09:00:00Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-11T12:00:00Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-11T15:00:00Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-11T18:00:01Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-11T21:00:01Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-12T00:00:00Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-12T03:00:02Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: vm/101/2025-03-12T06:00:01Z keep Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: TASK OK Mar 12 06:00:18 pbs proxmox-backup-proxy[622]: Upload backup log to datastore 'intel-sata', root namespace vm/101/2025-03-12T06:00:01Z/client.log.blob Mar 12 06:17:01 pbs CRON[8749]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 06:17:01 pbs CRON[8750]: (root) CMD (cd / && run-parts --report /etc/cron.hourly) Mar 12 06:17:01 pbs CRON[8749]: pam_unix(cron:session): session closed for user root Mar 12 06:21:54 pbs proxmox-backup-proxy[622]: rrd journal successfully committed (33 files in 0.012 seconds) Mar 12 06:25:01 pbs CRON[8754]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 06:25:01 pbs CRON[8755]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; }) Mar 12 06:25:01 pbs CRON[8754]: pam_unix(cron:session): session closed for user root Mar 12 06:47:12 pbs systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities... Mar 12 06:47:12 pbs systemd[1]: apt-daily-upgrade.service: Deactivated successfully. Mar 12 06:47:12 pbs systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities. Mar 12 06:51:54 pbs proxmox-backup-proxy[622]: rrd journal successfully committed (33 files in 0.010 seconds) Mar 12 07:17:01 pbs CRON[8814]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 07:17:01 pbs CRON[8815]: (root) CMD (cd / && run-parts --report /etc/cron.hourly) Mar 12 07:17:01 pbs CRON[8814]: pam_unix(cron:session): session closed for user root Mar 12 07:21:55 pbs proxmox-backup-proxy[622]: rrd journal successfully committed (33 files in 0.010 seconds) Mar 12 07:51:55 pbs proxmox-backup-proxy[622]: rrd journal successfully committed (33 files in 0.014 seconds) Mar 12 08:17:01 pbs CRON[8852]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Mar 12 08:17:01 pbs CRON[8853]: (root) CMD (cd / && run-parts --report /etc/cron.hourly) Mar 12 08:17:01 pbs CRON[8852]: pam_unix(cron:session): session closed for user root Mar 12 08:21:55 pbs proxmox-backup-proxy[622]: rrd journal successfully committed (33 files in 0.008 seconds) -- Reboot -- Mar 12 09:17:50 pbs kernel: Linux version 6.8.12-8-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-8 (2025-01-24T12:32Z) () Mar 12 09:17:50 pbs kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-8-pve root=/dev/mapper/pbs-root ro quiet Mar 12 09:17:50 pbs kernel: KERNEL supported cpus: Mar 12 09:17:50 pbs kernel: Intel GenuineIntel Mar 12 09:17:50 pbs kernel: AMD AuthenticAMD Mar 12 09:17:50 pbs kernel: Hygon HygonGenuine Mar 12 09:17:50 pbs kernel: Centaur CentaurHauls Mar 12 09:17:50 pbs kernel: zhaoxin Shanghai Mar 12 09:17:50 pbs kernel: BIOS-provided physical RAM map: Mar 12 09:17:50 pbs kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000002ffff] usable Mar 12 09:17:50 pbs kernel: BIOS-e820: [mem 0x0000000000030000-0x000000000004ffff] reserved Mar 12 09:17:50 pbs kernel: BIOS-e820: [mem 0x0000000000050000-0x000000000009dfff] usable Mar 12 09:17:50 pbs kernel: BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved Mar 12 09:17:50 pbs kernel: BIOS-e820: [mem 0x0000000000100000-0x0000000000806fff] usable Mar 12 09:17:50 pbs kernel: BIOS-e820: [mem 0x0000000000807000-0x0000000000807fff] ACPI NVS Mar 12 09:17:50 pbs kernel: BIOS-e820: [mem 0x0000000000808000-0x000000000080ffff] usable Mar 12 09:17:50 pbs kernel: BIOS-e820: [mem 0x0000000000810000-0x000000000180ffff] ACPI NVS Mar 12 09:17:50 pbs kernel: BIOS-e820: [mem 0x0000000001810000-0x000000007e79dfff] usable Mar 12 09:17:50 pbs kernel: BIOS-e820: [mem 0x000000007e79e000-0x000000007e9ebfff] reserved Mar 12 09:17:50 pbs kernel: BIOS-e820: [mem 0x000000007e9ec000-0x000000007eaebfff] type 20 Mar 12 09:17:50 pbs kernel: BIOS-e820: [mem 0x000000007eaec000-0x000000007eb
 

Jonathon

Ars Legatus Legionis
16,962
Subscriptor
A clean reboot (one actually initiated from within the OS) will dump a bunch of shutdown messages from systemd and other services into the syslog immediately before the -- Reboot -- in the log-- so something forced a reboot here. Probably the IPMI watchdog, given the 2.5-minute gap between the last ipmiutil call and the reboot (where ipmiutil had been reliably running every minute prior to that)-- looks like the system hung and the watchdog did its job.
 

wobblytickle

Ars Scholae Palatinae
667
can't be bothered to fight the forum formatting but this gap:

Mar 12 08:21:55 pbs proxmox-backup-proxy[622]: rrd journal successfully committed (33 files in 0.008 seconds) -- Reboot -- Mar 12 09:17:50 pbs kernel: Linux version 6.8.12-8-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for

being much bigger than this gap:

Mar 12 08:43:01 proxmox CRON[534961]: pam_unix(cron:session): session closed for user root -- Reboot -- Mar 12 08:45:38 proxmox kernel: Linux version 6.8.12-8-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for

is interesting. What is feeding grafana? that some grafana is reporting between 0830 and the reboot is also curious. No sar? also not sure those graphs are particularly helpful
 
  • Like
Reactions: steelghost

steelghost

Ars Praefectus
5,444
Subscriptor++
was it an automated software update?
I don't think so - I generally run Debian on most things and I do have unattended upgrades with reboots enabled on some of those hosts, however that tends to be containers where the time to restart is minimal, and running services that I'm not worried if they are unavailable for 30 seconds now and again. I did wonder if I'd been dozy enough to enable such a thing on the hypervisor, but happily the answer is "no", at least this time.

That suggests you have a watchdog timer set, so a possible explanation is it expired and force rebooted. There might be an IPMI SEL entry for it.
Some fishing around in the manual suggests that it can be deactivated in the UEFI, so I'll look at that next time I reboot the server.

this gap:

Mar 12 08:21:55 pbs proxmox-backup-proxy[622]: rrd journal successfully committed (33 files in 0.008 seconds) -- Reboot -- Mar 12 09:17:50 pbs kernel: Linux version 6.8.12-8-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for

being much bigger than this gap:

Mar 12 08:43:01 proxmox CRON[534961]: pam_unix(cron:session): session closed for user root -- Reboot -- Mar 12 08:45:38 proxmox kernel: Linux version 6.8.12-8-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for

is interesting
The reason for that is that the Proxmox Backup Server VM doesn't automatically start up when Proxmox boots, because it's dependent on storage that is provided by the TrueNAS VM, and that VM takes a while to come up. I didn't realise that the server had rebooted itself until I logged on to Proxmox, saw that a 9am backup had failed and started PBS manually.

What is feeding grafana? that some grafana is reporting between 0830 and the reboot is also curious. No sar? also not sure those graphs are particularly helpful
I hadn't come across sar before, I'll look into it. The graphs are from the netdata monitoring solution, I was trying to show there didn't seem to be much change in memory pressure running up to the reboot, but having looked at it all again I can see that netdata basically stops logging most things at about 8.30, some things stop between then and the reboot, and a few things keep getting logged right up until the system actually does reboot.
As you say, none of that is actually very helpful.

clean reboot (one actually initiated from within the OS) will dump a bunch of shutdown messages from systemd and other services into the syslog immediately before the -- Reboot -- in the log-- so something forced a reboot here. Probably the IPMI watchdog, given the 2.5-minute gap between the last ipmiutil call and the reboot (where ipmiutil had been reliably running every minute prior to that)-- looks like the system hung and the watchdog did its job.
Sigh. Yeah, I see what you're getting at, which is honestly even more confusing because I don't have a good idea what would have caused the hang. It's not temps, it's unlikely to be memory pressure / OOM, it seems unlikely to be hardware or RAM, my best guess is that Proxmox got its tiny silicon underwear in a twist. Could it be something to do with the network setup, I wonder....
 

steelghost

Ars Praefectus
5,444
Subscriptor++
https://forum.proxmox.com/threads/w...og-issue-on-asrock-x670e-motherboards.158814/ The "fix" mentioned is to disable the watchdog. But it strikes me that the cron job to reset the timer would only fail if something had gone wrong....at which point the system would just hardlock until the sysadmin intervened?

I don't know if the system has done this before but I suspect it may have. Feels like it is worth disabling the swatchdog in the UEFI, either it fixes the issue somehow or more likely, I find my server has hardlocked at some point and I get to do the reset myself.
 

Burn24

Wise, Aged Ars Veteran
118
Just to offer another perspective, for the sake of discussion:

In my experience it was likely catastrophic hardware failure (temporary), or kernel crash, both circumstaces that generally don't leave diagnostic traces, unless you're already hooked into monitoring them, like having a serial console to see crashes. I was wondering about machine check events (like via mcelog), but that seems deprecated for rasdaemon I guess?

Often troubleshooting this kind of issue was beyond our expertise and time allotted to fix a problem. Kind of like, trying to figure out which RAM chip is busted. Sure just.. electronically check each chip, with.. whatever magic tools people use. Or just swap them out logically to isolate your issue if you're not an electrical engineer with all the expensive equipment. Scale this up to whole boxen, swap out, RMA.

Again, though, I would try to look into hardware event logs you might have, not for the specific event instigator, but if you have repeated warnings for bad memory chips or other hardware from something like mcelog, or fancy server bios/idrac hardware event logging with a BMC, it would be a pretty strong clue as to what caused the crash. It sure is easier to have an external device with visibility into your hardware to tell you if something breaks, instead of hoping the breaking hardware can usefully report errors to you.

Regarding watchdogs, I know it's all anecdata, but I've never seen a watchdog timer actually work and be useful in my experience.
 

kperrier

Ars Legatus Legionis
20,641
Subscriptor++
if its a kernel crash you can set up kdump (at least on RHEL and RHEL-alikes) and get core file. If kdump did its thing, then you know some either did an nmi-reset or there was a catastrophic failure. Now, analyzing the core a whole other issue, but if the crash kernel got involved, then shit got real.
 
Last edited:
  • Like
Reactions: steelghost

teubbist

Ars Scholae Palatinae
952
Some fishing around in the manual suggests that it can be deactivated in the UEFI, so I'll look at that next time I reboot the server.
The IPMI/BMC watchdog shouldn't need a BIOS round trip to disable. I'm unsure what Proxmox does to enable it, but generally it's a case of stopping the service and then disabling the watchdog via ipmitool/ipmiutil. Keep in mind that if you get the order wrong or don't correctly disable the watchdog you'll be doing a forced BIOS roundtrip, so only to be done when no one will notice.

As for the source of the hang, have you checked the SMART logs for your SSD's? If it was an IO issue you might get lucky and have something logged there.

Similarly, if Proxmox has a persistant journal you could try journalctl -b -1 to see if there's anything useful in there.

Depending on how complicated you want to get, and if you have another machine that can act as a log receiver, you could setup remote logging for syslog and netconsole for the kernel in the hopes something gets squirted across the wire before the reboot. Otherwise, disabling the watchdog and doing a manual recovery on a crash is your next best option to try capture some output.
 

steelghost

Ars Praefectus
5,444
Subscriptor++
I disabled the watchdog via some ipmiutil command, and (for what it's worth), the system has been stable since then.

I have looked at memory and drives and as far as I can see, they are not throwing any errors.

My router is a separate mini PC also running Proxmox and an OPNsense VM. I could even run a serial cable from the server to the router machine and capture logs there in a separate container, although exactly how I set up the software to do that is something I'll need to look into. Is there benefit to doing over serial rather than over the LAN?
 
Is there benefit to doing over serial rather than over the LAN?
Less to go wrong. You've got two network stacks and a bunch of network hardware involved with Ethernet, where serial is about as simple as it gets. It's more likely to actually get an SOS message out in the middle of a severe crash.
 
  • Like
Reactions: steelghost

Gandalf007

Ars Tribunus Angusticlavius
6,928
Subscriptor
If you're running a Proxmox cluster, IPMI watchdog can be a useful fencing device, which is necessary to avoid split-brain scenarios. It is tricky to get all the pieces right though and some of the useful information is in their older docs rather than the current version (specifically adding nmi_watchdog=0 to the kernel cmdline). Additionally we had to systemctl edit watchdog-mux.service to order it after the IPMI drivers:
Code:
[Unit]
# Don't stop openipmi during shutdown/reboot while watchdog-mux is still running.
After=openipmi.service
Without that it would sometimes timeout on shutdown and log a timer expired IPMI event. (This was somewhat dependent on the iDRAC firmware version.)

But it seems like you have a standalone host and don't need this.
 
  • Like
Reactions: steelghost

steelghost

Ars Praefectus
5,444
Subscriptor++
Thanks for this; you're right, I'm not running a cluster although I do have several PVE hosts on the network here.

My server has been stable since I disabled the watchdog, barring a brief blip (2s) in power that we had yesterday. (I really should get a UPS, albeit power is generally so reliable here it almost feels like a waste).