Thread 'Can I run 1 WU across multiple GPUs?'

Author	Message
Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5124	Message 102012 - Posted: 7 Dec 2020, 10:31:37 UTC - in response to Message 102011. No ID: 102012 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5124	Message 102018 - Posted: 7 Dec 2020, 16:38:06 UTC - in response to Message 102014. I think they'll appear twice in the Event Log as 'Device 0', Device 1'. I think it will be because of the computer languages and techniques used for scientific programming, which differ from screen-painting. ID: 102018 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1299	Message 102019 - Posted: 7 Dec 2020, 16:40:08 UTC There are major differences between the way graphics cards do graphics (as in games) and when they are used to do "serious" computing, not the least of which is the way Crossfire & SLI inter-card communications are configured. In both cases the communications are designed to link the graphics units within a pair (or more) of similar cards to improve frame rates. This leaves the numeric processing part "out in the cold", and so not linkable. The same applies when looking at the various dual GPU cards like the 7990 or the old GTX690. Despite all the fluff about games loading GPUs they are very light in terms of computational load compared with most of the science projects that typically use the numeric areas of the GPU to about 100% most of the time. (Games certainly use the graphics part to close to 100% at high frame rates) ID: 102019 ·

Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 229	Message 102039 - Posted: 8 Dec 2020, 18:14:20 UTC - in response to Message 102020. the fact that it gets "just as hot" means nothing. The way SLI/Crossfire works (frame interleaving) has nothing to do with the way general compute happens. SLI and Xfire also rely wholly on pre-made profiles on a game by game basis in the driver to even work. it doesn't "just work". if anything would work, it would be the nvlink technology or whatever the AMD equivalent is. this actually allows the cards to share resources, not just take turns generating alternating frames like SLI/Xfire. but these technologies only work on the professional level cards anyway and the way they function is completely different than SLI/Xfire even if they use a similar inter-card connector. but even this requires application support. BOINC doesn't support this, so it wont work. it treats all cards as individual devices. ID: 102039 ·

Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 229	Message 102040 - Posted: 8 Dec 2020, 18:18:31 UTC - in response to Message 102014. Last modified: 8 Dec 2020, 18:18:48 UTC No Do you know anything about how crossfire/SLI works? I assume when you play a game, the game treats it as though it was one faster card. Also, I wonder what Boinc does with those dual-GPU cards like the 7990? sounds like you are the one that doesnt know how SLI/crossfire works. Your assumption is incorrect. The cards are not treated as one. they setup in a master/slave config, where the master sends alternating frames to the slave for rendering, then the slave sends the rendered frame back to the master to send to the display. ID: 102040 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5124	Message 102048 - Posted: 9 Dec 2020, 17:38:07 UTC - in response to Message 102046. BOINC manages my dual 12 core CPUs on one motherboard as one.... No, it manages them as 24 - cores, that is, although the terminology does describe cores as CPUs. ID: 102048 ·

Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 229	Message 102050 - Posted: 9 Dec 2020, 17:58:16 UTC - in response to Message 102046. Last modified: 9 Dec 2020, 17:58:39 UTC the fact that it gets "just as hot" means nothing. It means exactly what I said, that it's being worked as hard and using the same amount of electricity. The way SLI/Crossfire works (frame interleaving) has nothing to do with the way general compute happens. SLI and Xfire also rely wholly on pre-made profiles on a game by game basis in the driver to even work. it doesn't "just work". if anything would work, it would be the nvlink technology or whatever the AMD equivalent is. this actually allows the cards to share resources, not just take turns generating alternating frames like SLI/Xfire. but these technologies only work on the professional level cards anyway and the way they function is completely different than SLI/Xfire even if they use a similar inter-card connector. but even this requires application support. BOINC doesn't support this, so it wont work. it treats all cards as individual devices. BOINC manages my dual 12 core CPUs on one motherboard as one.... again, incorrect. it treats them as 12 (or 24 if HT/SMT is on) individual cores. which is why you run many individual CPU jobs instead of just one job across all cores. (I think one project has the capability to run a multithreaded CPU job, but it's by and far an exception, not the rule) ID: 102050 ·

Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 229	Message 102081 - Posted: 10 Dec 2020, 19:48:24 UTC - in response to Message 102073. BOINC manages my dual 12 core CPUs on one motherboard as one.... again, incorrect. it treats them as 12 (or 24 if HT/SMT is on) individual cores. which is why you run many individual CPU jobs instead of just one job across all cores. (I think one project has the capability to run a multithreaded CPU job, but it's by and far an exception, not the rule) Ok, I guess it's down to the project programmers then. But one? I'm running Milkyway, Primegrid, and LHC which all do multi-core tasks. It causes problems when my 24 core machines have 23 1-core tasks left and a huge 24 core task which will take 5 days. Some of the 1 core tasks have a deadline shorter than 5 days and Boinc gets confused. yes, it is up to the project to decide how to program their own applications. BOINC is a platform for hosting other people's science. it's a middleman between your hardware and the science application and does the resource allocation and scheduling. BOINC isnt responsible for the behavior of the application itself. it is up to you as the user to be aware of conflicts that might arise, and adjust your computing preferences around projects that have competing ideologies on how to use your hardware. IE, don't try to run multithreaded applications alongside single threaded ones, or you can play with the project_max_concurrent settings to prevent a project from gobbling up all your threads, allowing some number of spare threads always available to another project. ID: 102081 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15552	Message 102085 - Posted: 10 Dec 2020, 21:08:12 UTC - in response to Message 102072. Last modified: 10 Dec 2020, 21:52:31 UTC On that note, I and a lot of people in the forums are referring to the things we download as work units. But on Primegrid, they call the work unit the thing that's duplicated into 2 or more tasks, and we each run a task. The task results are then compared to validate them, completing the work unit. Who is correct? Primegrid. Edit: Old Seti Classic had work units. It would send one work unit to each computer, which would work on it and send a result file back. All BOINC projects that work with redundancy send out two or more tasks per work unit. Some BOINC projects only send out one data file and then you can call that a work unit. Just as Seti Classic did. ID: 102085 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5124	Message 102086 - Posted: 10 Dec 2020, 21:41:04 UTC - in response to Message 102081. ... you can play with the project_max_concurrent settings to prevent a project from gobbling up all your threads, allowing some number of spare threads always available to another project. You can, more particularly, play with the full app_config.xml kit of parts to constrain generic MT jobs to use less than the full CPU. MT tasks don't become more efficient by turning the parallelism volume control up to 11. In fact, I believe that efficiency generally decreases as the number of threads increases. If the threading is managed properly (both avg_ncpus and --nthreads have to be set), MT jobs can co-exist perfectly happily with single-threaded jobs from other projects. ID: 102086 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5124	Message 102108 - Posted: 12 Dec 2020, 19:06:34 UTC - in response to Message 102105. I'd suggest running 24 threads one week, and 4 tasks x 6 threads each another week. See which gets most work done overall. [Suggesting you stick to divisors of 12, because a cache memory miss on the 'wrong' CPU is likely to be costly. Might even be worth trying 2 tasks x 12 threads each] ID: 102108 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.