Problem after NVIDIA driver update

Message boards : GPUs : Problem after NVIDIA driver update
Message board moderation

To post messages, you must log in.

AuthorMessage
Klaus0109

Send message
Joined: 9 Jul 20
Posts: 2
Germany
Message 99778 - Posted: 9 Jul 2020, 9:41:58 UTC

After updating the NVIDIA driver to the latest version (440.100), I get the following message from Asteroids@home:

Asteroids@home | Message from server: NVIDIA GPU: Upgrade to the latest driver to process tasks using your computer's GPU

when I look in the event log, I see the following entries for the driver version:

Starting BOINC client version 7.16.6 for x86_64-pc-linux-gnu
CUDA: NVIDIA GPU 0: GeForce RTX 2070 SUPER (driver version 440.10, CUDA version 10.2, compute capability 7.5, 4096MB, 3968MB available, 9216 GFLOPS peak)
CUDA: NVIDIA GPU 1 (not used): GeForce GTX 1060 6GB (driver version 440.10, CUDA version 10.2, compute capability 6.1, 4096MB, 3974MB available, 4568 GFLOPS peak)
OpenCL: NVIDIA GPU 0: GeForce RTX 2070 SUPER (driver version 440.100, device version OpenCL 1.2 CUDA, 7982MB, 3968MB available, 9216 GFLOPS peak)
OpenCL: NVIDIA GPU 1 (ignored by config): GeForce GTX 1060 6GB (driver version 440.100, device version OpenCL 1.2 CUDA, 6075MB, 3974MB available, 4568 GFLOPS peak)
OS: Linux Fedora: Fedora 32 (Workstation Edition) [5.7.6-201.fc32.x86_64|libc 2.31 (GNU libc)]

the driver version for OpenCL is correct.
the driver version for CUDA is cut off.

I updated the driver from the rpmfusion repository.

is this a BOINC or NVIDIA problem?
ID: 99778 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 99782 - Posted: 9 Jul 2020, 10:29:49 UTC - in response to Message 99778.  
Last modified: 9 Jul 2020, 10:31:31 UTC

It's an Asteroids problem and you'll have to report it to their forums. BOINC doesn't use your GPU, that's done by the project science applications. All BOINC does is detect what GPU you have and what its capabilities are and sends that to the project which then considers whether that information is all right to their minimum requirements. So best ask at https://asteroidsathome.net/boinc/forum_index.php what their requirements are.

As for showing 440.100 vs 440.10, I think that's the same number, don't you?
ID: 99782 · Report as offensive
Klaus0109

Send message
Joined: 9 Jul 20
Posts: 2
Germany
Message 99791 - Posted: 9 Jul 2020, 11:58:15 UTC - in response to Message 99782.  

No, I do not think so.

This driver has a 3-digit minor version number and this can be important.

when you look into the message https://boinc.berkeley.edu/forum_thread.php?id=13698&postid=98442 the driver has the version 390.132, for me it is not the same as the CUDA driver version 390.13.

Version 440.10 is an older version than 440.100
ID: 99791 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 99794 - Posted: 9 Jul 2020, 12:24:29 UTC - in response to Message 99791.  
Last modified: 9 Jul 2020, 12:30:49 UTC

Yes all right, you're right that matters.
I'm checking the source code and see it's calling a string of 81 characters for the driver version. Ought to be enough.

The 7.9.3 you pointed at is an old BOINC version, your 7.16.6 is reasonably new. I'll put that one up the ladder: #3893
What remains is the driver requirement and that's really done by the project, not by BOINC.
ID: 99794 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 99797 - Posted: 9 Jul 2020, 12:35:33 UTC

I have a Linux install with - currently - GeForce GTX 1660 SUPER (driver version 440.82, CUDA version 10.2, compute capability 7.5). That accepted Asteroids work without any complaint about not being a current driver.

I see I have driver 440.100 queued up ready for installation. At a convenient point later this afternoon, I'll update and see what happens.
ID: 99797 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 99799 - Posted: 9 Jul 2020, 13:53:13 UTC

Confirmed. I now have

Thu 09 Jul 2020 14:17:47 BST |  | Starting BOINC client version 7.17.0 for x86_64-pc-linux-gnu
Thu 09 Jul 2020 14:17:47 BST |  | CUDA: NVIDIA GPU 0: GeForce GTX 1660 SUPER (driver version 440.10, CUDA version 10.2, compute capability 7.5, 4096MB, 3974MB available, 5153 GFLOPS peak)
Thu 09 Jul 2020 14:17:47 BST |  | OpenCL: NVIDIA GPU 0: GeForce GTX 1660 SUPER (driver version 440.100, device version OpenCL 1.2 CUDA, 5943MB, 3974MB available, 5153 GFLOPS peak)
Thu 09 Jul 2020 14:40:12 BST | Asteroids@home | Sending scheduler request: To fetch work.
Thu 09 Jul 2020 14:40:12 BST | Asteroids@home | Requesting new tasks for NVIDIA GPU
Thu 09 Jul 2020 14:40:12 BST | Asteroids@home | [sched_op] NVIDIA GPU work request: 881.05 seconds; 1.00 devices
Thu 09 Jul 2020 14:40:13 BST | Asteroids@home | Scheduler request completed: got 0 new tasks
Thu 09 Jul 2020 14:40:13 BST | Asteroids@home | [sched_op] Server version 707
Thu 09 Jul 2020 14:40:13 BST | Asteroids@home | Message from server: NVIDIA GPU: Upgrade to the latest driver to process tasks using your computer's GPU
The same truncation of the CUDA version number, and the same refusal to issue new Asteroids tasks.

However, a pre-downloaded task ran just fine with the new driver. And,

Thu 09 Jul 2020 14:41:16 BST | GPUGRID | [sched_op] NVIDIA GPU work request: 37886.32 seconds; 0.00 devices
Thu 09 Jul 2020 14:41:17 BST | GPUGRID | Scheduler request completed: got 2 new tasks
So the 'problem' appears to originate on the project server, when - presumably - the truncated version number is compared with the minimum version number specified. But we can't see that directly, at any project. Should we be able to?

Meanwhile, the project has yet more problems:

Thu 09 Jul 2020 14:35:32 BST | Asteroids@home | Started upload of ps_200624_input_98144_18_2_0
Thu 09 Jul 2020 14:35:34 BST | Asteroids@home | [error] Error reported by file upload server: can't open log file '../log_project1/file_upload_handler.log' (errno: 9)
Thu 09 Jul 2020 14:35:34 BST | Asteroids@home | Temporarily failed upload of ps_200624_input_98144_18_2_0: transient upload error
ID: 99799 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 99812 - Posted: 9 Jul 2020, 16:39:48 UTC

Quick update. It's apparently not easily solved.

Vitalii Koshura, BOINC dev wrote:
2 of 3 platforms report driver version as an integer number. Third platform reports it as an integer represented as a string.
So it's basically no information how to interpret this integer as a float number.

I can make a fix to set proper in request but it will still shows incorrect version (two numbers after point instead of three) in log....
ID: 99812 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 99817 - Posted: 9 Jul 2020, 17:12:37 UTC

I'm wondering what format he'll use to maintain compatibility, and still reconcile

https://boinc.berkeley.edu/trac/wiki/AppPlanSpec#GPUapps
https://github.com/BOINC/boinc/blob/master/sched/sched_send.cpp#L1103

<min_driver_version>x</min_driver_version>
minimum display driver version. AMD driver versions are represented as MMmmRRRR. NVIDIA driver versions are represented as MMMmm.
        if (version) {
            if (version < req.min_driver_version) {
                sprintf(buf,
                    "%s: %s",
                    rsc_name,
                    _("Upgrade to the latest driver to process tasks using your computer's GPU")
Is version 44101 going to be greater or smaller against 440100?
ID: 99817 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 99821 - Posted: 9 Jul 2020, 17:50:49 UTC - in response to Message 99817.  

I wondered why can Mac and Windows read this as an integer and Linux only as a string? I did see in the source code it read it as a string (of 81 characters).
ID: 99821 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 99830 - Posted: 9 Jul 2020, 20:07:18 UTC

I'm checking the Asteroids applications page since that was the project that brought on this bug and it seems they look for CUDA 10.2 - which was detected correctly in the driver but then they don't check for that number (because then the driver version doesn't matter if you check for CUDA 10.2, the sched_request either sends CUDA 10.2 along if the driver is in compliance, or it doesn't when the driver is of an older CUDA version)

And I wonder if the short-fix isn't to use 450.57 ;-)
ID: 99830 · Report as offensive

Message boards : GPUs : Problem after NVIDIA driver update

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.