Message boards : GPUs : Problem after NVIDIA driver update
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Jul 20 Posts: 2 |
After updating the NVIDIA driver to the latest version (440.100), I get the following message from Asteroids@home: Asteroids@home | Message from server: NVIDIA GPU: Upgrade to the latest driver to process tasks using your computer's GPU when I look in the event log, I see the following entries for the driver version: Starting BOINC client version 7.16.6 for x86_64-pc-linux-gnu CUDA: NVIDIA GPU 0: GeForce RTX 2070 SUPER (driver version 440.10, CUDA version 10.2, compute capability 7.5, 4096MB, 3968MB available, 9216 GFLOPS peak) CUDA: NVIDIA GPU 1 (not used): GeForce GTX 1060 6GB (driver version 440.10, CUDA version 10.2, compute capability 6.1, 4096MB, 3974MB available, 4568 GFLOPS peak) OpenCL: NVIDIA GPU 0: GeForce RTX 2070 SUPER (driver version 440.100, device version OpenCL 1.2 CUDA, 7982MB, 3968MB available, 9216 GFLOPS peak) OpenCL: NVIDIA GPU 1 (ignored by config): GeForce GTX 1060 6GB (driver version 440.100, device version OpenCL 1.2 CUDA, 6075MB, 3974MB available, 4568 GFLOPS peak) OS: Linux Fedora: Fedora 32 (Workstation Edition) [5.7.6-201.fc32.x86_64|libc 2.31 (GNU libc)] the driver version for OpenCL is correct. the driver version for CUDA is cut off. I updated the driver from the rpmfusion repository. is this a BOINC or NVIDIA problem? |
Send message Joined: 29 Aug 05 Posts: 15541 |
It's an Asteroids problem and you'll have to report it to their forums. BOINC doesn't use your GPU, that's done by the project science applications. All BOINC does is detect what GPU you have and what its capabilities are and sends that to the project which then considers whether that information is all right to their minimum requirements. So best ask at https://asteroidsathome.net/boinc/forum_index.php what their requirements are. As for showing 440.100 vs 440.10, I think that's the same number, don't you? |
Send message Joined: 9 Jul 20 Posts: 2 |
No, I do not think so. This driver has a 3-digit minor version number and this can be important. when you look into the message https://boinc.berkeley.edu/forum_thread.php?id=13698&postid=98442 the driver has the version 390.132, for me it is not the same as the CUDA driver version 390.13. Version 440.10 is an older version than 440.100 |
Send message Joined: 29 Aug 05 Posts: 15541 |
Yes all right, you're right that matters. I'm checking the source code and see it's calling a string of 81 characters for the driver version. Ought to be enough. The 7.9.3 you pointed at is an old BOINC version, your 7.16.6 is reasonably new. I'll put that one up the ladder: #3893 What remains is the driver requirement and that's really done by the project, not by BOINC. |
Send message Joined: 5 Oct 06 Posts: 5121 |
I have a Linux install with - currently - GeForce GTX 1660 SUPER (driver version 440.82, CUDA version 10.2, compute capability 7.5). That accepted Asteroids work without any complaint about not being a current driver. I see I have driver 440.100 queued up ready for installation. At a convenient point later this afternoon, I'll update and see what happens. |
Send message Joined: 5 Oct 06 Posts: 5121 |
Confirmed. I now have Thu 09 Jul 2020 14:17:47 BST | | Starting BOINC client version 7.17.0 for x86_64-pc-linux-gnu Thu 09 Jul 2020 14:17:47 BST | | CUDA: NVIDIA GPU 0: GeForce GTX 1660 SUPER (driver version 440.10, CUDA version 10.2, compute capability 7.5, 4096MB, 3974MB available, 5153 GFLOPS peak) Thu 09 Jul 2020 14:17:47 BST | | OpenCL: NVIDIA GPU 0: GeForce GTX 1660 SUPER (driver version 440.100, device version OpenCL 1.2 CUDA, 5943MB, 3974MB available, 5153 GFLOPS peak) Thu 09 Jul 2020 14:40:12 BST | Asteroids@home | Sending scheduler request: To fetch work. Thu 09 Jul 2020 14:40:12 BST | Asteroids@home | Requesting new tasks for NVIDIA GPU Thu 09 Jul 2020 14:40:12 BST | Asteroids@home | [sched_op] NVIDIA GPU work request: 881.05 seconds; 1.00 devices Thu 09 Jul 2020 14:40:13 BST | Asteroids@home | Scheduler request completed: got 0 new tasks Thu 09 Jul 2020 14:40:13 BST | Asteroids@home | [sched_op] Server version 707 Thu 09 Jul 2020 14:40:13 BST | Asteroids@home | Message from server: NVIDIA GPU: Upgrade to the latest driver to process tasks using your computer's GPUThe same truncation of the CUDA version number, and the same refusal to issue new Asteroids tasks. However, a pre-downloaded task ran just fine with the new driver. And, Thu 09 Jul 2020 14:41:16 BST | GPUGRID | [sched_op] NVIDIA GPU work request: 37886.32 seconds; 0.00 devices Thu 09 Jul 2020 14:41:17 BST | GPUGRID | Scheduler request completed: got 2 new tasksSo the 'problem' appears to originate on the project server, when - presumably - the truncated version number is compared with the minimum version number specified. But we can't see that directly, at any project. Should we be able to? Meanwhile, the project has yet more problems: Thu 09 Jul 2020 14:35:32 BST | Asteroids@home | Started upload of ps_200624_input_98144_18_2_0 Thu 09 Jul 2020 14:35:34 BST | Asteroids@home | [error] Error reported by file upload server: can't open log file '../log_project1/file_upload_handler.log' (errno: 9) Thu 09 Jul 2020 14:35:34 BST | Asteroids@home | Temporarily failed upload of ps_200624_input_98144_18_2_0: transient upload error |
Send message Joined: 29 Aug 05 Posts: 15541 |
Quick update. It's apparently not easily solved. Vitalii Koshura, BOINC dev wrote: 2 of 3 platforms report driver version as an integer number. Third platform reports it as an integer represented as a string. |
Send message Joined: 5 Oct 06 Posts: 5121 |
I'm wondering what format he'll use to maintain compatibility, and still reconcile https://boinc.berkeley.edu/trac/wiki/AppPlanSpec#GPUapps https://github.com/BOINC/boinc/blob/master/sched/sched_send.cpp#L1103 <min_driver_version>x</min_driver_version> if (version) { if (version < req.min_driver_version) { sprintf(buf, "%s: %s", rsc_name, _("Upgrade to the latest driver to process tasks using your computer's GPU")Is version 44101 going to be greater or smaller against 440100? |
Send message Joined: 29 Aug 05 Posts: 15541 |
I wondered why can Mac and Windows read this as an integer and Linux only as a string? I did see in the source code it read it as a string (of 81 characters). |
Send message Joined: 29 Aug 05 Posts: 15541 |
I'm checking the Asteroids applications page since that was the project that brought on this bug and it seems they look for CUDA 10.2 - which was detected correctly in the driver but then they don't check for that number (because then the driver version doesn't matter if you check for CUDA 10.2, the sched_request either sends CUDA 10.2 along if the driver is in compliance, or it doesn't when the driver is of an older CUDA version) And I wonder if the short-fix isn't to use 450.57 ;-) |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.