Specifications for NVidia RTX 30x0 range?

Author	Message
Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5103	Message 100797 - Posted: 21 Sep 2020, 10:58:59 UTC May I ask you hardware enthusiasts to double-check my thoughts on running BOINC on the new RTX 3070/3080/3090 range? I've been studying NVIDIA A100 Tensor Core GPU Architecture and NVIDIA Ampere GA102 GPU Architecture BOINC uses the number of CUDA cores per SM, and a flops multiplier, to estimate the GPU's peak speed. I'm getting that the GA102 (and above, but not the A100) benefit from both an increase from 64 to 128 cores per SM, and the ability to process two FP32 streams concurrently. So I think that the current v7.16.11 BOINC client will rate the new cards at one-quarter of the flops reported by other tools. Can anybody confirm that? If it's true, I'll code a patch for the next release of BOINC. ID: 100797 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 879	Message 100807 - Posted: 21 Sep 2020, 18:14:48 UTC - in response to Message 100797. I'll ask in the OCN forums for anyone who actually has managed to snag a 3080. Not many have. The ones that have are shipping them off to gpu block manufacturers for measurements for new blocks in return for a free block. So all they did was verify that the card was not a dud and shipped them off. No chance of asking whether anyone actually ran a BOINC project on it yet. ID: 100807 ·

ProDigit Send message Joined: 8 Nov 19 Posts: 718	Message 100816 - Posted: 21 Sep 2020, 22:57:06 UTC Last modified: 21 Sep 2020, 22:57:56 UTC I guess The Collatz Conjecture might make good use for these GPUs. They generally do load the (2080Ti)GPU to 100% on my systems, which comparatively should be just under a 3080 in terms of performance. ID: 100816 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5103	Message 100820 - Posted: 22 Sep 2020, 12:12:46 UTC - in response to Message 100816. The Collatz comparison doesn't really help to answer my question. Collatz will be especially well served by the previous Volta and Turing chipsets, because of their additional, independent, pathway for INT32 calculations. The Ampere chipset makes that extra pathway available for FP32 calculations too, which makes it more widely suitable for the type of research that BOINC is designed to support. ID: 100820 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 879	Message 100823 - Posted: 22 Sep 2020, 16:19:22 UTC Last modified: 22 Sep 2020, 16:20:42 UTC I found a RTX 3080 running Einstein on the Gamma Ray and Gravity Wave gpu applications. https://einsteinathome.org/host/12850228/tasks/0/40 Already posted a PM to them asking for their Event Log startup for the gpu detection output. Also running the stock Windows apps so my question about whether a new app would be needed got answered. Seems the apps developed for the Turing-Volta transition still work. The host ran the GR tasks in 360 seconds compared to my 460 seconds for my RTX 2080. So 27% faster on the GR tasks compared to Turing. ID: 100823 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5103	Message 100824 - Posted: 22 Sep 2020, 17:14:27 UTC - in response to Message 100823. Thanks for that - I've looked through some of the logs, and it all checks out (like the 10 GB VRAM shown in stderr.txt, against the 4 GB recognised by BOINC. That's for another day.) Remember that this is an OpenCL app, and relies of the efficiency of the OpenCL translation layer in using the new CUDA functions. And it wastes something like 14 seconds at the end on pure CPU work - so we can't put too much reliance on the speed ratio. Also, the host only ran for one day - looks like a burn-in test. Now, if we could just find a CUDA example at GPUGrid... ID: 100824 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 879	Message 100826 - Posted: 22 Sep 2020, 18:33:18 UTC - in response to Message 100824. Last modified: 22 Sep 2020, 18:48:21 UTC Well I fully expected the existing app to fail. It failed on the Pascal >> Turing transition. OpenCL app. The OpenCL layer didn't handle the change in the CUDA core to SM count and the CC capability was out of range. You still don't have the gpu detection output for the BOINC calculated GFLOPs rating you need. Yes, they also ran for a day at Milkyway. Not so impressive there. Twice as fast at Primegrid compared to a RTX 2080 Ti running a CUDA app. https://www.primegrid.com/results.php?hostid=1023381 ID: 100826 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5103	Message 100827 - Posted: 22 Sep 2020, 19:23:24 UTC - in response to Message 100826. Twice as fast at Primegrid compared to a RTX 2080 Ti running a CUDA app. That's more surprising. I'd have expected the speed increase to be less, because the RTX 2080 can use its INT32 pathway, which it (probably - I'm not fully knowledgeable on Einstein's maths) couldn't use at Einstein. ID: 100827 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 879	Message 100828 - Posted: 22 Sep 2020, 19:27:20 UTC - in response to Message 100827. I'm thinking it is because the Primegrid app is CUDA and not OpenCL so probably more optimized for the architecture. Really want to find one on GPUGrid. ID: 100828 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 879	Message 100830 - Posted: 22 Sep 2020, 21:42:15 UTC Also not a card family to deploy at Milkyway. Nvidia dropped the FP64 capability in half again from 1:32 to 1:64. Still trying to force the compute consumer to their pro line of products ($$). ID: 100830 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5103	Message 100834 - Posted: 23 Sep 2020, 11:04:44 UTC I've received reliable reports that BOINC shows 14,884 GFLOPS peak for the RTX 3080, and SIV shows 29,768 - exactly double. Since we use different API calls for getting the shader count, that'll be the difference - SIV will be right, and us wrong. That leaves the question of the doubled FP32 pipeline unresolved. That may require direct experimentation on the hardware - I'm thinking possibly including running two tasks in parallel. ID: 100834 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 879	Message 100838 - Posted: 23 Sep 2020, 20:11:30 UTC - in response to Message 100834. I've received reliable reports that BOINC shows 14,884 GFLOPS peak for the RTX 3080, and SIV shows 29,768 - exactly double. Since we use different API calls for getting the shader count, that'll be the difference - SIV will be right, and us wrong. That leaves the question of the doubled FP32 pipeline unresolved. That may require direct experimentation on the hardware - I'm thinking possibly including running two tasks in parallel. How do you want to set that experiment up? What parameters are you looking for? ID: 100838 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5103	Message 100839 - Posted: 23 Sep 2020, 21:06:32 UTC - in response to Message 100838. How do you want to set that experiment up? What parameters are you looking for? This is just for baseline BOINC users, not fancy optimisers. Ideally a single 30x0 card, in a host with plenty of power and cooling (so nothing gets throttled). Run a known - preferably CUDA - app for long enough to get a good idea of performance. Slap in an app_config.xml file with <gpu_usage>.5</gpu_usage>, and record what happens. ID: 100839 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 879	Message 100841 - Posted: 23 Sep 2020, 22:40:38 UTC - in response to Message 100839. Last modified: 23 Sep 2020, 22:56:23 UTC How do you want to set that experiment up? What parameters are you looking for? This is just for baseline BOINC users, not fancy optimisers. Ideally a single 30x0 card, in a host with plenty of power and cooling (so nothing gets throttled). Run a known - preferably CUDA - app for long enough to get a good idea of performance. Slap in an app_config.xml file with <gpu_usage>.5</gpu_usage>, and record what happens. Ok, I will ask Till to run his RTX 3080 at Primegrid with an app_config with 0.5 gpu usage. That is a CUDA application. ID: 100841 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 879	Message 100842 - Posted: 24 Sep 2020, 6:44:45 UTC Till ran a app_config.xml but report it is still running tasks as singles. This is his file. <app_config> <app> <name>pps_sr2sieve</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> </gpu_versions> </app> </app_config> I don't see anything wrong with the syntax. He reports no errors on startup and the app_config is read. Just doesn't run tasks as doubles. I found posts in Primegrid/Number Crunching that shows times running doubles with the PPSieve app so it should work. ID: 100842 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5103	Message 100843 - Posted: 24 Sep 2020, 7:02:04 UTC - in response to Message 100842. Might be wise to throw in a <cpu_usage> line for completeness. It's not marked as optional in the manual. ID: 100843 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 879	Message 100850 - Posted: 24 Sep 2020, 19:49:30 UTC Till responds: With the cpu_usage added it starts working immediately... BOINC is sometimes a strange piece of software. CPU utilization rises to 100%, power-limited at 320W. Computation time rises from 80 s (1WU) to 151 s (2WU). So doesn't look like the app is using the second FP32 pipeline. So responding like previous generations. Typically not exactly double the crunch times. So the card might be slightly more productive doing doubles compared to singles at the expense of using a lot more power. ID: 100850 ·

MarkJ Volunteer tester Help desk expert Send message Joined: 5 Mar 08 Posts: 272	Message 100852 - Posted: 25 Sep 2020, 4:03:05 UTC - in response to Message 100850. Last modified: 25 Sep 2020, 4:04:12 UTC So doesn't look like the app is using the second FP32 pipeline. So responding like previous generations. Typically not exactly double the crunch times. So the card might be slightly more productive doing doubles compared to singles at the expense of using a lot more power. Maybe it needs to be recompiled with the latest CUDA toolkit to take advantage of the additional pipeline. CUDA 11.1 was just released with support for RXT30 series cards. Phoronix article here MarkJ ID: 100852 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 879	Message 100860 - Posted: 25 Sep 2020, 22:39:00 UTC Looks like we will need new apps for Ampere at GPUGrid. Task fails with A-100 cards with nvrtc error of an unknown arch. ID: 100860 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5103	Message 100864 - Posted: 26 Sep 2020, 7:03:50 UTC - in response to Message 100860. Found the thread, and saw the error message in the results. Yup, that's a show-stopper, even though the A100 card is only cc8.0 Meanwhile, I've submitted #4031 to deal with the flops display. ID: 100864 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.