Message boards :
Questions and problems :
Any way to manually change the deadline of a task?
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 5 Oct 06 Posts: 5094 |
Need to check one more thing. I've joined up, and my initial state is So, all four are running at once. That's because the project has a preference I'll switch it to 4 for the next fetch, and try again. Looks like I'll have time to go and fetch my newspaper before we get there... Edit - this is the one I got. Proth Prime Search LLR (PPS)Which one were you running? |
Send message Joined: 5 Oct 06 Posts: 5094 |
Ah - you'd answered my question (edit) already. I'll switch to that. But now I see I'll try LLR (SGS) anyway. |
Send message Joined: 5 Oct 06 Posts: 5094 |
No, I changed it manually. It was set to 1 when I first attached. For reference, here are the first two work requests for a new computer on a new project: 05/12/2020 12:16:25 | | [work_fetch] target work buffer: 864.00 + 8640.00 secNothing wrong with that. The estimated duration matches the speed and size of those first few tasks. But after three hours, we're at <checkpoint_elapsed_time>10853.239532</checkpoint_elapsed_time> <fraction_done>0.050201</fraction_done>I knew this machine was slow, but not that slow. At this rate, it'll finish at about midnight Monday! |
Send message Joined: 5 Oct 06 Posts: 5094 |
Yes, I've got all that now. I was a bit pre-occupied with getting my test started, fetching the paper, and writing my reply to David. And after all that, I think I need a bit of a lie-down... We also have to factor in <duration_correction_factor>7.767842 - most projects have stopped using that. But at least they don't use APR on top of it. I'll do the maths after I've had a break - at least, it's all in the sched files, for which many thanks. |
Send message Joined: 25 May 09 Posts: 1287 |
05/12/2020 17:35:17 | | Fetching configuration file from http://www.primegrid.com/get_project_config.php 05/12/2020 17:35:49 | PrimeGrid | Fetching scheduler list 05/12/2020 17:35:50 | PrimeGrid | Master file download succeeded 05/12/2020 17:35:59 | PrimeGrid | Sending scheduler request: Project initialization. 05/12/2020 17:35:59 | PrimeGrid | Requesting new tasks for CPU and NVIDIA GPU 05/12/2020 17:36:00 | PrimeGrid | Scheduler request completed: got 1 new tasks 05/12/2020 17:36:00 | PrimeGrid | Project requested delay of 7 seconds 05/12/2020 17:36:02 | PrimeGrid | Started download of tpsieve_0.3.10d_windows64.exe 05/12/2020 17:36:02 | PrimeGrid | Started download of stat_primegrid.png 05/12/2020 17:36:02 | PrimeGrid | Started download of primegrid_slideshow_00.png 05/12/2020 17:36:03 | PrimeGrid | Finished download of tpsieve_0.3.10d_windows64.exe 05/12/2020 17:36:03 | PrimeGrid | Finished download of stat_primegrid.png 05/12/2020 17:36:03 | PrimeGrid | Finished download of primegrid_slideshow_00.png 05/12/2020 17:36:29 | PrimeGrid | Starting task pps_sr2sieve_137992346_0 05/12/2020 17:36:34 | PrimeGrid | Sending scheduler request: To fetch work. 05/12/2020 17:36:34 | PrimeGrid | Requesting new tasks for CPU 05/12/2020 17:36:35 | PrimeGrid | Scheduler request completed: got 11 new tasks 05/12/2020 17:36:35 | PrimeGrid | Project requested delay of 7 seconds 05/12/2020 17:36:38 | PrimeGrid | Starting task pps_sr2sieve_137989313_3 05/12/2020 17:36:38 | PrimeGrid | Starting task pps_sr2sieve_137992342_0 05/12/2020 17:36:38 | PrimeGrid | Starting task pps_sr2sieve_137992721_0 (sorry, I didn't have the right debug messages selected before connecting to Prime Grid and the work arrived) OK, so this is a "new" machine of PrimeGrid, it has 4 cores available for use, and my cache is currently set at 1+0.01 days, and I'm using one core per task As delivered the estimated run times are of the order of 3hrs 50 minutes. Within seconds the estimated time jumps to several days, before rapidly dropping back to about 20 hours, and this figure correlates pretty well with the time I get by looking at the elapsed time and elapsed percent complete (albeit that the elapsed time is only a few minutes and the percent complete is about 0.5% so there could be some large margin for error there). Now as far as I'm aware the server side of BOINC calculates the amount of work to send based on figures that will give you roughly the right number of tasks, but if the "estimated" performance figures are way off from reality then the number of tasks sent out will be wrong This sort of error might explain what Peter is seeing - the servers think his computer is about ten times faster than it really is, so send him work based on that figure; as soon as work gets underway the expected runtimes get adjusted upwards and he sees far too much work in hand. I'm going to let these tasks run through then get another batch set to use two cores each. This may take some time.... |
Send message Joined: 25 May 09 Posts: 1287 |
Right, 34 minutes into a task, 2.9% complete, which gives a time left of 19.5hrs - and that agrees with the remaining time given by BOINC on my computer. Remember this task immediately before starting had an estimated runtime of 3.67hrs. I would posit that Prime Grid are using some rather strange figures in getting to their flops guess - which is what is used to give the "anticipated" runtime..... (Given my cache settings, and what I am now seeing I would have expected no more than four tasks in the initial batch, not the twelve that actually arrived - I'm glad I set NNT very soon after crunching the first batch commenced. However, given that Prime Grid's figures gave an expected runtime of ~4hours the initial delivery of twelve tasks is correct according to their end of the process.) |
Send message Joined: 5 Oct 06 Posts: 5094 |
I would posit that Prime Grid are using some rather strange figures in getting to their flops guess.I don't think it's that. For my first group of four, the flops estimate in the <workunit> sent by the server exactly matched (down to the units place, and I think on into the micro-flop) the Benchmark figure calculated by the computer. It's a straight copy. The dodgy one is probably the fpops_est for the <result>..It matters, but projects have a habit of paying very little attention, especially when it should be changing. PrimeGrid will be searching for ever-larger primes, and naively I'd suggest that the search will take ever longer? fpops_est should be growing to keep pace, but they may not have bothered. |
Send message Joined: 25 May 09 Posts: 1287 |
Hmmmm - if one did the calculation of amount of work delivered in the first batch immediately on receipt then one would expect to see a 1:1 relationship between the two figures, BUT soon after a task starts the estimate for remaining time jumps to something much higher, and thus the hours of work in the cache goes up significantly, which is exactly what Peter is complaining about. I think you might have struck on something with suggesting that the ever increasing complexity involved in finding the next prime is increasing the real runtime (flops or hours) while the project has not adjusted for this (or hasn't adjusted correctly). |
Send message Joined: 5 Oct 06 Posts: 5094 |
Checking... 1) Benchmark (from client_state / host_info) 2) Task speed (from sched_reply / app_version) <p_fpops>2084811827.956989</p_fpops> <flops>2084811827.956989</flops> |
Send message Joined: 25 May 09 Posts: 1287 |
Interesting..... I've got the value for p_flops in the expected place, but there is no value for flops in the place you suggest, but there is one in sched_request - and the values are the same. Well that shows that the value for the computer speed is being sent to the server correctly - but why am I not seeing it in sched_reply as you say (that would be the value returned, having possibly been used, by the server. Can I plead confusion? |
Send message Joined: 5 Oct 06 Posts: 5094 |
Now, doing a similar calculation for Peter's 24-core MT tasks, using the figures from sched_request and sched_reply. The basic calculation is "size / speed", or " size is <rsc_fpops_est>6525083980042.000000 speed is <flops>70852542799.840393 Which is bloody fast for a single core (over 70 Giga-Flops), so we'll assume it's for the whole CPU, 24 cores in parallel. So duration is 92.093857498882860479551119183157 cpu-secs or 2,210.2525799731886515092268603958 core-secs The client applies a DCF of 7.767842, so the task is estimated at 715.37053422183723671319732473793 cpu-secs, or 00:11:55. QED The total 170 job work-fetch turns out to be 170 tasks * 92ish sec/task * 7ish DCF = 121,612.99081771233024124354520545 cpu-sec The log says 121823 seconds, a little more. That's to take account of <on_frac>0.998370 <active_frac>0.999910 - the client takes account of those little breaks in service. |
Send message Joined: 5 Oct 06 Posts: 5094 |
Interesting.....Peter got <app_version> <app_name>llrTPS</app_name> <version_num>804</version_num> <api_version>7.11.0</api_version> <file_ref> <file_name>cllr64.3.8.23.exe</file_name> <open_name>primegrid_cllr.exe</open_name> <copy_file/> </file_ref> <file_ref> <file_name>llr.ini.6.07</file_name> <open_name>llr.ini</open_name> <copy_file/> </file_ref> <file_ref> <file_name>llr_wrapper_8.04_windows_x86_64.exe</file_name> <main_program/> </file_ref> <is_wrapper/> <platform>windows_x86_64</platform> <plan_class>mt</plan_class> <avg_ncpus>24.000000</avg_ncpus> <flops>70852542799.840393</flops> <cmdline> --nthreads 24</cmdline> </app_version>I don't think the server always plays it back if it thinks you have the app already, but Peter sent this in the request: <app_version> <app_name>llrTPS</app_name> <version_num>804</version_num> <platform>windows_x86_64</platform> <avg_ncpus>24.000000</avg_ncpus> <flops>70852542799.840393</flops> <plan_class>mt</plan_class> <api_version>7.11.0</api_version> <cmdline>--nthreads 24</cmdline> <is_wrapper/> </app_version> |
Send message Joined: 25 May 09 Posts: 1287 |
Have a look at the figures I gave: Initially the tasks were reporting a runtime of just under four hours. There were twelve tasks using one core each. So that 48hours worth of work was about right. Moments after starting the expected runtime leapt to about twenty hours, a five-fold increase in runtime and so may cache was about five times too big. OK, not as big as yours, but certainly heading in the right direction for what you've been looking at. My hypothesis is that PrimeGrid are using the correct figure for the performance of my computer when working out how many tasks to send me, but is not using the right figure for the actual amount of work that the tasks will require. Well that theory covers what I've seen, and to an extent what you've seen, but I'm not happy that it covers what happens when the task is running as that is something strange, during execution is there a calculation performed to estimate a more accurate estimate the amount of work required and so adjust the runtime estimate? Also I'm still working on one core per task, but you are using four cores - I should find out what happens in a couple of days once I've chewed through the single core tasks. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.