Message boards : GPUs : Recent frequent linux errors with amdgpu, kernel 5.10+, multiple projects
Message board moderation
Author | Message |
---|---|
Send message Joined: 8 Dec 13 Posts: 51 |
Two weekends ago I did a normal system update and got kernel 5.10. Since then, I've been getting a lot of this: Apr 01 22:24:41 hostname kernel: [drm:amdgpu_ttm_backend_bind [amdgpu]] *ERROR* failed to pin userptr Apr 01 22:24:41 hostname kernel: ------------[ cut here ]------------ Apr 01 22:24:41 hostname kernel: kernel BUG at mm/slub.c:304! Apr 01 22:24:41 hostname kernel: invalid opcode: 0000 [#1] SMP NOPTI Apr 01 22:24:41 hostname kernel: CPU: 1 PID: 11895 Comm: setiathome_8.22 Not tainted 5.5.11-200.fc31.x86_64 #1 Apr 01 22:24:41 hostname kernel: Hardware name: Gigabyte Technology Co., Ltd. B450 AORUS PRO WIFI/B450 AORUS PRO WIFI-CF, BIOS F50 11/27/2019 Apr 01 22:24:41 hostname kernel: RIP: 0010:kfree+0x23c/0x250 ... Apr 01 22:24:41 hostname kernel: ---[ end trace caf6b7bf7cc304f1 ]--- 80 in about 10 days. Same thing with Einstein@Home. I updated kernel to 5.11 and BIOS, but it came right back after boot. AFAICT, things were fine right before that. Is this a known thing? |
Send message Joined: 29 Aug 05 Posts: 15541 |
It's a problem with your kernel, not with BOINC or the applications thereunder. Search Google for "[drm:amdgpu_ttm_backend_bind [amdgpu]] *ERROR* failed to pin userptr" and you'll find a lot of patches for different kernel versions. E.g. This patch set is to fix a bug in amdgpu / radeon drm that results in a crash when dma_map_sg combines elemnets within a scatterlist table.https://lkml.org/lkml/2020/3/25/204 |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.