Message boards : Server programs : Limit number of jobs in progress to be 2 at most.
Message board moderation
Author | Message |
---|---|
Send message Joined: 23 Oct 17 Posts: 17 |
In our project jobs are highly compute intensive and as well requires a lot of memory. So I want to limit number of jobs in progress to be limited by two. Where as boinc server by default create slots equivalent to number of cores present in hosts usually 8 slots for 8 jobs as modern machines have 8 cores normally. I tried to follow this https://boinc.berkeley.edu/trac/wiki/ProjectOptions option and edited my config.xml. After edition bin/start command does not read the config file properly and gives error. So I need to know a way to limit the number of jobs in progress simultaneously to be equal to be two. Query 1 : As described above I need to know how to make jobs in progress to be maximum 2 or 3 at given time.? Below is my config file. And do you have to keep to two files for that config_aux.xml (to set job limits) and as well as simple config.xml. <daily_result_quota>1000</daily_result_quota> <one_result_per_user_per_wu>0</one_result_per_user_per_wu> <max_wus_to_send>50</max_wus_to_send> [b] <project> <max_jobs_in_progress> <total_limit> <jobs>3</jobs> </total_limit> </max_jobs_in_progress> </project>[/b] </config> <tasks> <task> <cmd>antique_file_deleter -d 2</cmd> <period>24 hours</period> <disabled>0</disabled> <output>antique_file_deleter.out</output> </task> <task> <cmd>db_dump -d 2 --dump_spec ../db_dump_spec.xml</cmd> <period>24 hours</period> <disabled>1</disabled> <output>db_dump.out</output> </task> <task> <cmd>run_in_ops ./update_uotd.php</cmd> <period>1 days</period> <disabled>0</disabled> <output>update_uotd.out</output> </task> <task> <cmd>run_in_ops ./update_forum_activities.php</cmd> <period>1 hour</period> <disabled>0</disabled> <output>update_forum_activities.out</output> </task> <task> <cmd>update_stats</cmd> <period>1 days</period> <disabled>0</disabled> <output>update_stats.out</output> </task> <task> <cmd>run_in_ops ./update_profile_pages.php</cmd> <period>24 hours</period> <disabled>0</disabled> <output>update_profile_pages.out</output> </task> <task> <cmd>run_in_ops ./team_import.php</cmd> <period>24 hours</period> <disabled>1</disabled> <output>team_import.out</output> </task> <task> <cmd>run_in_ops ./notify.php</cmd> <period>24 hours</period> <disabled>0</disabled> <output>notify.out</output> </task> <task> <cmd>run_in_ops ./badge_assign.php</cmd> <period>24 hours</period> <disabled>0</disabled> <output>badge_assign.out</output> </task> </tasks> <daemons> <daemon> <cmd>feeder -d 3 </cmd> </daemon> <daemon> <cmd>transitioner -d 3 </cmd> </daemon> <daemon> <cmd>file_deleter -d 3 </cmd> </daemon> <daemon> <cmd>sample_trivial_validator -d 3 --app sampleimage</cmd> </daemon> <daemon> <cmd>sample_assimilator -d 3 --app sampleimage</cmd> </daemon> </daemons> </boinc> Query 2 : I want to keep the input files always present on server I have applied no_delete tag in my job_in file. But I thinks it gets deleted once the output is uploaded on server. Query 3: Where do you find your output results in sample_results folder right ? I am also confused in it. |
Send message Joined: 19 Nov 16 Posts: 63 |
Try: <max_wus_in_progress> N </max_wus_in_progress> https://boinc.berkeley.edu/trac/wiki/ProjectOptions remove this section: [b] <project> <max_jobs_in_progress> <total_limit> <jobs>3</jobs> </total_limit> </max_jobs_in_progress> </project>[/b] Output results go into the upload directory first before they are assimilated into the single work unit result. Remember the same work is sent to multiple (usually 2) hosts to process. Once processed the result is sent back to the server. If both results are the same for the same work unit they are assimilated into the final output. The final output ends up in sample_result. Cheers Seth |
Send message Joined: 23 Oct 17 Posts: 17 |
Tried and no error. But still the client launches 8 jobs which is equal to number of cores present in client machine. Is there any other fix ? <?xml version="1.0" ?> <boinc> <config> <upload_dir>/home/pitb/projects/automationTest/upload</upload_dir> <send_result_abort>1</send_result_abort> <long_name>automationTest</long_name> <sched_debug_level>3</sched_debug_level> <cache_md5_info>1</cache_md5_info> <upload_url>http://103.226.217.106/automationTest_cgi/file_upload_handler</upload_url> <disable_account_creation>0</disable_account_creation> <uldl_dir_fanout>1024</uldl_dir_fanout> <disable_web_account_creation>0</disable_web_account_creation> <download_url>http://103.226.217.106/automationTest/download</download_url> <db_user>pitb</db_user> <log_dir>/home/pitb/projects/automationTest/log_food-home</log_dir> <app_dir>/home/pitb/projects/automationTest/apps</app_dir> <download_dir>/home/pitb/projects/automationTest/download</download_dir> <fuh_debug_level>3</fuh_debug_level> <master_url>http://103.226.217.106/automationTest/</master_url> <host>food-home</host> <db_name>automationTest</db_name> <shmem_key>0x1111f565</shmem_key> <show_results>1</show_results> <key_dir>/home/pitb/projects/automationTest/keys/</key_dir> <dont_generate_upload_certificates>1</dont_generate_upload_certificates> <ignore_upload_certificates>1</ignore_upload_certificates> <db_passwd> </db_passwd> <min_sendwork_interval>6</min_sendwork_interval> <db_host> </db_host> <ignore_delay_bound/> <daily_result_quota>500</daily_result_quota> <one_result_per_user_per_wu>0</one_result_per_user_per_wu> <max_wus_to_send>50</max_wus_to_send> <max_wus_in_progress>3</max_wus_in_progress> </config> <tasks> <task> <cmd>antique_file_deleter -d 2</cmd> <period>24 hours</period> <disabled>0</disabled> <output>antique_file_deleter.out</output> </task> <task> <cmd>db_dump -d 2 --dump_spec ../db_dump_spec.xml</cmd> <period>24 hours</period> <disabled>1</disabled> <output>db_dump.out</output> </task> <task> <cmd>run_in_ops ./update_uotd.php</cmd> <period>1 days</period> <disabled>0</disabled> <output>update_uotd.out</output> </task> <task> <cmd>run_in_ops ./update_forum_activities.php</cmd> <period>1 hour</period> <disabled>0</disabled> <output>update_forum_activities.out</output> </task> <task> <cmd>update_stats</cmd> <period>1 days</period> <disabled>0</disabled> <output>update_stats.out</output> </task> <task> <cmd>run_in_ops ./update_profile_pages.php</cmd> <period>24 hours</period> <disabled>0</disabled> <output>update_profile_pages.out</output> </task> <task> <cmd>run_in_ops ./team_import.php</cmd> <period>24 hours</period> <disabled>1</disabled> <output>team_import.out</output> </task> <task> <cmd>run_in_ops ./notify.php</cmd> <period>24 hours</period> <disabled>0</disabled> <output>notify.out</output> </task> <task> <cmd>run_in_ops ./badge_assign.php</cmd> <period>24 hours</period> <disabled>0</disabled> <output>badge_assign.out</output> </task> </tasks> <daemons> <daemon> <cmd>feeder -d 3 </cmd> </daemon> <daemon> <cmd>transitioner -d 3 </cmd> </daemon> <daemon> <cmd>file_deleter -d 3 </cmd> </daemon> <daemon> <cmd>sample_work_generator -d 3</cmd> </daemon> <daemon> <cmd>sample_trivial_validator -d 3 --app sampleimage</cmd> </daemon> <daemon> <cmd>sample_assimilator -d 3 --app sampleimage</cmd> </daemon> </daemons> </boinc> [/code] |
Send message Joined: 25 May 09 Posts: 1308 |
It would appear that you are making some assumptions about your client computers, in that they are all of the same CPU core count and RAM size - this may not be a valid assumption. That said, there may be a way around your problem -assign a number of CPU cores to each task - one of the multitude of (server side) configuration files contains a parameter that defines the default number of CPU (cores) and, where appropriate, the number of GPUs assigned to each task. By setting this value to two or three you will restrict the number of concurrent CPU tasks to core_count / number_cores_required. I think this parameter is called "n_cpus", but since I am a long way from the source code just now I can't be sure. |
Send message Joined: 23 Oct 17 Posts: 17 |
You are right all computers will not be same. Here are the details, each task takes 1.6gb memory, the user having even 8 gb memory but 8 cores will launch 8 jobs. That is why we want to restrict number of tasks/workunit in progress to be 2 or 1 for all users. Definitely we will be informing our users about minimum requirements require to volunteer their machines. Nothing is going wrong but usually when we are testing our jobs with our own dedicated machines we have noticed that sometimes the machine running jobs are hanged for couple of minutes and even for half an hour, so before public launch of our product we do not want our clients to see their computers hanged if they have to pause/suspend computing and have to use their machine. For example if a user comes back to resume his own work and see that machine is hanged he would panic. What all we want is that tasks in progress should be restricted to our desired number. We will be testing our jobs on several types of machines and then we are gonna set no of tasks in progress which is best for all. |
Send message Joined: 5 Oct 06 Posts: 5137 |
You probably want to have a look at Job Limits. One that catches my eye is <max_ncpus>N</max_ncpus>If that isn't adequate, try a config_aux.xml file, described in the following section. |
Send message Joined: 23 Oct 17 Posts: 17 |
Perfect, if I edit my current config file will the changes be effective for new tasks, or I have to re create the project. |
Send message Joined: 5 Oct 06 Posts: 5137 |
To be honest, I haven't a clue. I'm just a reader of documentation - I've never even seen a live BOINC server, let alone operated one. I would be surprised if you have to re-create the project. I would expect you might have to stop and then re-start the servers daemons after editing the config file: you might need to cancel any previously created workunits and create new ones. Trial and error, as always. |
Send message Joined: 25 May 09 Posts: 1308 |
From memory it should just be a stop the BOINC server and restart - although I think there are cases where if it is one of the .xml files it can be read whenever a new task is generated. Give it a go, let us know, either Richard or I will then try and remember when someone asks a similar question..... |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.