Linux Kernel 6 hangs with AMD CPU

Does your Linux Kernel hang randomly?

This guide will explain how to solve it, even though the settings may slightly differ between different CPUs, Chipsets, BIOS/UEFI version and so on.

Issue description

The issue can be seen with any AMD platform, supposedly every Ryzen CPU up to the series 7xxx. Someone reported the same with Bulldozer, but I couldn't verify the same, so keep it in mind.

I personally met the issue with the Linux Kernel v.  > 6 , up to 6.2.x.x.

Platform

  • 1st Ryzen series (possibility with Bulldozer too)
  • MSI B350M Motherboard (just consider the chipset)
  • X86/X64 (I didn't try ARM, but it should be the same)

Solution

You need to disable all the power saving settings in the UEFI/BIOS. Based on the manufactor, the settings may have a slightly different naming convention, but it's essentially that.

Below you find a screenshot of my MSI motherboard. Please note that the settings with a precise value is intentional, you should fine the same or similar in your Motherboard.




Save your setup

Before applying any change, make sure to have your UEFI/BIOS settings saved in a profile, as well as in an USB stick.

In general, I've seen cases where some settings brought me to clean up the CMOS and I've lost all my setup, nothing major, but it's annoying to spend hours of troubleshooting for the current issue plus making your setup again.

Before proceeding, it's worth looking at this guide: Traps in saving the BIOS/UEFI profile.

Choose a stable BIOS/UEFI version

Generally, it's highly advisable do not use beta BIOS/UEFI, unless extremely necessary.

When to make a BIOS/UEFI update?

Make an update only if you need to or if it was highly advised by the manufactor, but always check if the issue may affect you, any of those updates can brick your Motherboard, especially if you lose electricity...

Bitter solutions

You have a couple of options:

  1. Switch to Intel (despite no one likes it, this is one possibility)
  2. Use one distribution that doesn't use the newest kernels, let's say below version 6, this may limit your distro range, unless you are prone to deal with other Kernel issues. If you want stable releases, usually it's worth taking the server edition and keep going with that, maybe using VMs but it all depends from what you need to do with that machine. You could also install the GUI from the server edition, this should help, the kernel should be older.

How can I test it?

It's simple, leave it turned on with the less little load possible, at some point, one of the low power states will be activated by the UEFI/BIOS and you should see the result. I suggest leaving it for not longer than 3 days.

It may seriously happen at any time, but I experienced every "kernel hang status" in maximum 72h, though it depends what your machine is doing, that should be clear. Please note that the low power mode may be activated even if your machine is doing something like reproducing a video etc, it's all up to another layer, it's just BIOS/UEFI logic, it doesn't care of what you are actually doing. So, the "kernel hang status" can happen almost at any time, as long as you're not running a stress test I guess.

If you want to take it further, you can follow some guide online on how to troubleshoot the "kernel hang status", but due to the nature of the issue, you may not get much out of it...

Additional test

You may exclude Memory issues performing a memtest. It's quite common nowadays to see people with custom settings, especially frequency, due to gaming or other scenarios.

Believe it or not, many times the system crashes due to that. Make sure to start from the minimum (by default), then increase it to a supported level, until you don't need to raise the voltage. Run a memtest for each one of them and choose a value that makes sense for your scenario.

For common workloads, I'd say to pick up the first setting that works, it's pointless to spend time on this activity.

On top of custom settings, you may also have hardware issues.

Are you sure is it not the GPU?

Quite sure... The solution I found doesn't involve GPU or any PCI settings.

What will happen in future?

I don't know, as the rest of the community.

It seems that there isn't much traction from the kernel developers and the community didn't figure out what's missing or what's the wrong config, it's still not clear if a feature is actually missing to support the CPU power saving settings in an AMD platform.

Comments

Popular Posts