Thread 'Specifications for NVidia RTX 30x0 range?'

Message boards : GPUs : Specifications for NVidia RTX 30x0 range?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 100797 - Posted: 21 Sep 2020, 10:58:59 UTC

May I ask you hardware enthusiasts to double-check my thoughts on running BOINC on the new RTX 3070/3080/3090 range?

I've been studying
NVIDIA A100 Tensor Core GPU Architecture and
NVIDIA Ampere GA102 GPU Architecture

BOINC uses the number of CUDA cores per SM, and a flops multiplier, to estimate the GPU's peak speed. I'm getting that the GA102 (and above, but not the A100) benefit from both an increase from 64 to 128 cores per SM, and the ability to process two FP32 streams concurrently.

So I think that the current v7.16.11 BOINC client will rate the new cards at one-quarter of the flops reported by other tools.

Can anybody confirm that? If it's true, I'll code a patch for the next release of BOINC.
ID: 100797 · Report as offensive
ProfileKeith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 885
United States
Message 100807 - Posted: 21 Sep 2020, 18:14:48 UTC - in response to Message 100797.  

I'll ask in the OCN forums for anyone who actually has managed to snag a 3080. Not many have. The ones that have are shipping them off to gpu block manufacturers for measurements for new blocks in return for a free block. So all they did was verify that the card was not a dud and shipped them off. No chance of asking whether anyone actually ran a BOINC project on it yet.
ID: 100807 · Report as offensive
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 100816 - Posted: 21 Sep 2020, 22:57:06 UTC
Last modified: 21 Sep 2020, 22:57:56 UTC

I guess The Collatz Conjecture might make good use for these GPUs.
They generally do load the (2080Ti)GPU to 100% on my systems, which comparatively should be just under a 3080 in terms of performance.
ID: 100816 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 100820 - Posted: 22 Sep 2020, 12:12:46 UTC - in response to Message 100816.  

The Collatz comparison doesn't really help to answer my question. Collatz will be especially well served by the previous Volta and Turing chipsets, because of their additional, independent, pathway for INT32 calculations. The Ampere chipset makes that extra pathway available for FP32 calculations too, which makes it more widely suitable for the type of research that BOINC is designed to support.
ID: 100820 · Report as offensive
ProfileKeith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 885
United States
Message 100823 - Posted: 22 Sep 2020, 16:19:22 UTC
Last modified: 22 Sep 2020, 16:20:42 UTC

I found a RTX 3080 running Einstein on the Gamma Ray and Gravity Wave gpu applications. https://einsteinathome.org/host/12850228/tasks/0/40

Already posted a PM to them asking for their Event Log startup for the gpu detection output. Also running the stock Windows apps so my question about whether a new app would be needed got answered. Seems the apps developed for the Turing-Volta transition still work.

The host ran the GR tasks in 360 seconds compared to my 460 seconds for my RTX 2080. So 27% faster on the GR tasks compared to Turing.
ID: 100823 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 100824 - Posted: 22 Sep 2020, 17:14:27 UTC - in response to Message 100823.  

Thanks for that - I've looked through some of the logs, and it all checks out (like the 10 GB VRAM shown in stderr.txt, against the 4 GB recognised by BOINC. That's for another day.)

Remember that this is an OpenCL app, and relies of the efficiency of the OpenCL translation layer in using the new CUDA functions. And it wastes something like 14 seconds at the end on pure CPU work - so we can't put too much reliance on the speed ratio. Also, the host only ran for one day - looks like a burn-in test.

Now, if we could just find a CUDA example at GPUGrid...
ID: 100824 · Report as offensive
ProfileKeith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 885
United States
Message 100826 - Posted: 22 Sep 2020, 18:33:18 UTC - in response to Message 100824.  
Last modified: 22 Sep 2020, 18:48:21 UTC

Well I fully expected the existing app to fail. It failed on the Pascal >> Turing transition. OpenCL app. The OpenCL layer didn't handle the change in the CUDA core to SM count and the CC capability was out of range.

You still don't have the gpu detection output for the BOINC calculated GFLOPs rating you need.

Yes, they also ran for a day at Milkyway. Not so impressive there.

Twice as fast at Primegrid compared to a RTX 2080 Ti running a CUDA app.

https://www.primegrid.com/results.php?hostid=1023381
ID: 100826 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 100827 - Posted: 22 Sep 2020, 19:23:24 UTC - in response to Message 100826.  

Twice as fast at Primegrid compared to a RTX 2080 Ti running a CUDA app.
That's more surprising. I'd have expected the speed increase to be less, because the RTX 2080 can use its INT32 pathway, which it (probably - I'm not fully knowledgeable on Einstein's maths) couldn't use at Einstein.
ID: 100827 · Report as offensive
ProfileKeith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 885
United States
Message 100828 - Posted: 22 Sep 2020, 19:27:20 UTC - in response to Message 100827.  

I'm thinking it is because the Primegrid app is CUDA and not OpenCL so probably more optimized for the architecture. Really want to find one on GPUGrid.
ID: 100828 · Report as offensive
ProfileKeith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 885
United States
Message 100830 - Posted: 22 Sep 2020, 21:42:15 UTC

Also not a card family to deploy at Milkyway. Nvidia dropped the FP64 capability in half again from 1:32 to 1:64. Still trying to force the compute consumer to their pro line of products ($$).
ID: 100830 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 100834 - Posted: 23 Sep 2020, 11:04:44 UTC

I've received reliable reports that BOINC shows 14,884 GFLOPS peak for the RTX 3080, and SIV shows 29,768 - exactly double. Since we use different API calls for getting the shader count, that'll be the difference - SIV will be right, and us wrong.

That leaves the question of the doubled FP32 pipeline unresolved. That may require direct experimentation on the hardware - I'm thinking possibly including running two tasks in parallel.
ID: 100834 · Report as offensive
ProfileKeith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 885
United States
Message 100838 - Posted: 23 Sep 2020, 20:11:30 UTC - in response to Message 100834.  

I've received reliable reports that BOINC shows 14,884 GFLOPS peak for the RTX 3080, and SIV shows 29,768 - exactly double. Since we use different API calls for getting the shader count, that'll be the difference - SIV will be right, and us wrong.

That leaves the question of the doubled FP32 pipeline unresolved. That may require direct experimentation on the hardware - I'm thinking possibly including running two tasks in parallel.

How do you want to set that experiment up? What parameters are you looking for?
ID: 100838 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 100839 - Posted: 23 Sep 2020, 21:06:32 UTC - in response to Message 100838.  

How do you want to set that experiment up? What parameters are you looking for?
This is just for baseline BOINC users, not fancy optimisers. Ideally a single 30x0 card, in a host with plenty of power and cooling (so nothing gets throttled). Run a known - preferably CUDA - app for long enough to get a good idea of performance. Slap in an app_config.xml file with <gpu_usage>.5</gpu_usage>, and record what happens.
ID: 100839 · Report as offensive
ProfileKeith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 885
United States
Message 100841 - Posted: 23 Sep 2020, 22:40:38 UTC - in response to Message 100839.  
Last modified: 23 Sep 2020, 22:56:23 UTC

How do you want to set that experiment up? What parameters are you looking for?
This is just for baseline BOINC users, not fancy optimisers. Ideally a single 30x0 card, in a host with plenty of power and cooling (so nothing gets throttled). Run a known - preferably CUDA - app for long enough to get a good idea of performance. Slap in an app_config.xml file with <gpu_usage>.5</gpu_usage>, and record what happens.

Ok, I will ask Till to run his RTX 3080 at Primegrid with an app_config with 0.5 gpu usage. That is a CUDA application.
ID: 100841 · Report as offensive
ProfileKeith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 885
United States
Message 100842 - Posted: 24 Sep 2020, 6:44:45 UTC

Till ran a app_config.xml but report it is still running tasks as singles. This is his file.

<app_config>
<app>
<name>pps_sr2sieve</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
</gpu_versions>
</app>
</app_config>

I don't see anything wrong with the syntax. He reports no errors on startup and the app_config is read. Just doesn't run tasks as doubles.

I found posts in Primegrid/Number Crunching that shows times running doubles with the PPSieve app so it should work.
ID: 100842 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 100843 - Posted: 24 Sep 2020, 7:02:04 UTC - in response to Message 100842.  

Might be wise to throw in a <cpu_usage> line for completeness. It's not marked as optional in the manual.
ID: 100843 · Report as offensive
ProfileKeith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 885
United States
Message 100850 - Posted: 24 Sep 2020, 19:49:30 UTC

Till responds:
With the cpu_usage added it starts working immediately... BOINC is sometimes a strange piece of software.

CPU utilization rises to 100%, power-limited at 320W. Computation time rises from 80 s (1WU) to 151 s (2WU).

So doesn't look like the app is using the second FP32 pipeline. So responding like previous generations. Typically not exactly double the crunch times. So the card might be slightly more productive doing doubles compared to singles at the expense of using a lot more power.
ID: 100850 · Report as offensive
MarkJ
Volunteer tester
Help desk expert

Send message
Joined: 5 Mar 08
Posts: 272
Australia
Message 100852 - Posted: 25 Sep 2020, 4:03:05 UTC - in response to Message 100850.  
Last modified: 25 Sep 2020, 4:04:12 UTC

So doesn't look like the app is using the second FP32 pipeline. So responding like previous generations. Typically not exactly double the crunch times. So the card might be slightly more productive doing doubles compared to singles at the expense of using a lot more power.

Maybe it needs to be recompiled with the latest CUDA toolkit to take advantage of the additional pipeline. CUDA 11.1 was just released with support for RXT30 series cards. Phoronix article here
MarkJ
ID: 100852 · Report as offensive
ProfileKeith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 885
United States
Message 100860 - Posted: 25 Sep 2020, 22:39:00 UTC

Looks like we will need new apps for Ampere at GPUGrid. Task fails with A-100 cards with nvrtc error of an unknown arch.
ID: 100860 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 100864 - Posted: 26 Sep 2020, 7:03:50 UTC - in response to Message 100860.  

Found the thread, and saw the error message in the results. Yup, that's a show-stopper, even though the A100 card is only cc8.0

Meanwhile, I've submitted #4031 to deal with the flops display.
ID: 100864 · Report as offensive
1 · 2 · Next

Message boards : GPUs : Specifications for NVidia RTX 30x0 range?

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.