Input and output templates
Various properties of jobs, such as the number and naming of their input and output files, are described by a pair of XML documents called input and output templates. Typically the same templates are used for many jobs.
A given app usually has one input and output template. However, apps that can have varying numbers of files would need multiple templates, one per file "signature".
An input template file describes the job's input files, resource requirements, and scheduling parameters.
"Normal" input files are staged before job submission, and their physical names are passed as arguments to the job submission interfaces such as create_work().
There are also several special types of input files:
- Constant files are the same across multiple jobs. The physical name of the file is given in the input template, and is not passed to the job-submission interfaces. The file must be staged before jobs are created with the template. Constant files might be used for the files that make up a Docker image.
- Remote files: these are served from outside servers,
i.e. not BOINC project servers.
Their size and MD5 must be given explicitly since the files aren't locally available.
There are two ways to specify remote files:
- Pass a --remote_file URL nbytes MD5 argument to create_work(). The client will download the file from the given URL. Its physical name on the client will be "js_MD5".
- Supply the URL, size, and MD5 in the input template. In this case the URL is of a directory; the name passed to create_work() will be appended to the URL, and will also be used as the physical name on the client. Note: in this case you're either generating an input template per job, or you're using the file like a constant file.
<input_template> <file_info> [ <gzip/> ] [ <gzipped_nbytes> N </gzipped_nbytes> ] [ <sticky/> ] [ <no_delete/> ] [ <report_on_rpc/> ] [ <physical_name>...</physical_name> ] [ <url>...</url> ] [ <url>...</url> ] [ <md5_cksum>...</md5_cksum> ] [ <nbytes>...</nbytes> ] </file_info> [ ... other files ] <workunit> <file_ref> <open_name>NAME</open_name> [ <copy_file/> ] </file_ref> [ ... other files ] [ <command_line>-flags xyz</command_line> ] [ <rsc_fpops_est>x</rsc_fpops_est> ] [ <rsc_fpops_bound>x</rsc_fpops_bound> ] [ <rsc_memory_bound>x</rsc_memory_bound> ] [ <rsc_disk_bound>x</rsc_disk_bound> ] [ <delay_bound>x</delay_bound> ] [ <min_quorum>x</min_quorum> ] [ <target_nresults>x</target_nresults> ] [ <max_error_results>x</max_error_results> ] [ <max_total_results>x</max_total_results> ] [ <max_success_results>x</max_success_results> ] [ <size_class>N</size_class> ] </workunit> </input_template>
Elements must be in the given order. For compatibility with old clients, elements and tags must be on separate lines as shown.
Each <file_info> describes an input file:
- transfer the file in gzipped (compressed) format to reduce network usage. You must stage the file with the --gzip option. Only 7.0+ clients can handle compressed transfers; older clients will download the file in uncompressed form.
- if <gzip/> is specified, the size of the gzip file.
- if present, the file remains on the client after job is finished.
- if present, the file is not deleted from the server after job is completed. Use this if the file is used as input to more than one job.
- if present, report file in each scheduler request (for sticky files). Include this for compatibility with old (pre-7.x) clients; 7.0+ clients report all sticky files.
The following is used only for constant files:
- the physical name of the file.
The following are used for remote files specified in the input template:
- specifies a directory (i.e. it should end with a /) to which the file name will be appended to give the URL. If the file is replicated, you can supply more than one.
- the file's MD5 checksum
- the file size.
The <file_ref> describes how the file is referenced by the application running on the client:
- the logical name of the file
- if present, the file is copied into the job's slot directory
The list of <file_refs> corresponds to the list of <file_info>s.
The job parameters include:
- The command-line arguments to be passed to the main program. Note: if you're using the BOINC wrapper, use <append_cmdline_args/> in your job.xml file to pass command-line arguments from the wrapper to the wrapped application.
- Job attributes such has how much disk space will be used. BOINC will supply reasonable defaults for these, but you should supply the correct values; otherwise, for example, BOINC might try to run the job on a host with insufficient disk space.
- Specify the job's size class.
The input template (substituted with filenames and URLs) is stored in a database field with a 64KB limit. This is enough for about 200 input files, fewer if you use long file names or multiple download URLs. If this isn't enough, you can use BOINC file compression to zip several files into a single file reference for download, and expanding them prior to running on the client machine.
An output template file describes a job's output files. It has the form
<output_template> <file_info> <name><OUTFILE_0/></name> <generated_locally/> <upload_when_present/> <max_nbytes>32768</max_nbytes> <url><UPLOAD_URL/></url> [ <gzip_when_done/> ] </file_info> <result> <file_ref> <file_name><OUTFILE_0/></file_name> <open_name>result.sah</open_name> [ <copy_file>0|1</copy_file> ] [ <optional>0|1</optional> ] [ <no_validate>0|1</no_validate> ] [ <no_delete/> ] </file_ref> [ <report_immediately/> ] </result> </output_template>
Elements and tags must be on separate lines as shown. The elements include:
- describes an output file.
- the physical file name. Typically use <OUTFILE_0>, <OUTFILE_1> etc.; BOINC will replace this with a generated name based on the job name.
- deprecated, but you need to include this to work with pre-7.0 clients.
- use this to compress the output file before uploading, see FileCompression
- describes how an output file will be referenced by the application.
- the logical name by which the application will reference the file.
- if present, the file will be generated in the slot directory, and moved to the project directory after the job has finished. Use this for legacy applications. Important: If the slot directory and the project directory are on the same filesystem, the file is moved instead of copied!
- always include this for output files.
- maximum file size. If the actual size exceeds this, the file will not be uploaded, and the job will be marked as an error.
- the URL of the file upload handler. You may include this explicitly, or use <UPLOAD_URL/> to use the URL in your project's config.xml file.
- if 0 or absent, your application must create the file, otherwise the job will be marked as an error.
- if true, don't include this file in the result validation process (relevant only if you are using the sample bitwise validator).
- if present, the file will not be deleted on the server even after the job is finished.
- if present, clients will report this job immediately after the output files are uploaded. Otherwise they may wait up to a day. (Implemented in 6.12.27+ clients only).
Note: when a job is created, the name of its output template file is stored in the database. The file is read when instances of the job are created, which may happen days or weeks later. Thus, editing an output template file can affect existing jobs. If this is not desired, you must create a new output template file.
You can safely remove an input template file after creating your last job with it. However, output template files must exist until any task that refers to it is completed (i.e. no more replicas will be created).
The output template, substituted with filenames and URLs, is stored in a database field with a 64KB limit. This imposes a limit of about 50 output files; the exact number depends upon the length of your filenames and URLs. If you need more files, you can use BOINC file compression to zip several files into a single file reference for upload, prior to completing each task on the client machine. Once you have run some jobs through your project, you can compare the size of the expanded xml with the 65,535 limit by running the following MySQL statement:
select max(length(xml_doc_in)), max(length(xml_doc_out)) from result;