Thread 'Can I run 1 WU across multiple GPUs?'

Author	Message
ProDigit Send message Joined: 8 Nov 19 Posts: 718	Message 102138 - Posted: 13 Dec 2020, 23:37:55 UTC I'm a bit late to the game, but on F@H, the earlier versions of CUDA did exactly that! It distributed the data between GPUs, inasmuch as if 1GPU was running at 80%, it would transfer some work from one GPU to that GPU. The cons were, that especially FAH, but also many Boinc projects that send data via OpenCL to the GPU through the PCIE bus, if you have a powerful GPU, but a lousy PCIE speed or slot, you would see a noticeable dip in performance, and certainly in efficiency! It takes power and time, to transfer data from one GPU to the PCIE bus, to CPU, back to PCIE bus, to the other GPU, only to have the results come back through the PCIE bus, back to CPU, back through PCIE bus, to the first GPU, being then further processed there. It's not beneficial. What you could do on powerful GPUs, is if one project utilizes only a part of your GPU (say 80%), and load another project to utilize the remaining 20%, by setting it in app_config. Usually the Boinc client should be able to match the 80% GPU usage with the 20% GPU usage. That way you're not really offloading some of the WUs of your first GPU to the second, but you are loading more work in the secondary GPU; essentially doing more work; or reducing idle time. The hard part of doing it this way, is eg: Milkyway can run 1, 2 or 3 projects on a powerful GPU, and still use between 40-50% of the GPU's resources. Sometimes Boinc tries to combine 2 or 3 Milkyway projects into 1 GPU. Other times Boinc will combine 1 or 2 Milkyways with 1 or 2 Einsteins. In that case, projects usually work better. I haven't gotten to figuring out what to put in app_config.xml yet. ID: 102138 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1326	Message 102142 - Posted: 14 Dec 2020, 9:17:38 UTC Well - that's a grand theory, but searching the F@H archive it would appear that they never did anything like it on a "production" release, but may have tried it and found it not to work. As for using OpenCL to transfer data between a pair of GPUs would be a massive waste of resource on nVidia GPUs running CUDA as (in the early days nVidia implementing OpenCL) the OpenCL implementation was poor, thus they would probably have used CUDA (with a large chunk of CPU support). What you may have ben confused by was the wording of multi-GPU selection and control - in some earlier versions of the F@H control application it was very simple to select what GPUs were to be used (in a multi-GPU system), and the maximum amount of GPU "resource" was to be used for each task - while the wording such that it could be read to suggest that a single task could be shared over multiple GPUs it was explained elsewhere that a single task could only be run on a single GPU. ID: 102142 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1326	Message 102163 - Posted: 14 Dec 2020, 21:16:20 UTC Nothing to do with DP vs. FP, but all to do with the way the GPU application was coded in the first place. ID: 102163 ·

ProDigit Send message Joined: 8 Nov 19 Posts: 718	Message 102764 - Posted: 1 Feb 2021, 0:47:55 UTC - in response to Message 102142. Well - that's a grand theory, but searching the F@H archive it would appear that they never did anything like it on a "production" release, but may have tried it and found it not to work. As for using OpenCL to transfer data between a pair of GPUs would be a massive waste of resource on nVidia GPUs running CUDA as (in the early days nVidia implementing OpenCL) the OpenCL implementation was poor, thus they would probably have used CUDA (with a large chunk of CPU support). What you may have ben confused by was the wording of multi-GPU selection and control - in some earlier versions of the F@H control application it was very simple to select what GPUs were to be used (in a multi-GPU system), and the maximum amount of GPU "resource" was to be used for each task - while the wording such that it could be read to suggest that a single task could be shared over multiple GPUs it was explained elsewhere that a single task could only be run on a single GPU. In the early days of CUDA, it wasn't FAH, but it was the Nvidia drivers that took control of the OpenCL data, and switch it between GPUs. I think FAH had about a 2 week period around 2018-2019 where the public version supported it. They pulled or patched it rather quickly, as FAH aims for utilizing the GPU to the max, at all times the program is ran. Unlike Einstein, Milkyway, or other, who are either bottlenecked by the DPP shaders, or where the WUs are too small for larger GPUs. ID: 102764 ·

Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.