Changes between Version 29 and Version 30 of GpuWorkFetch


Ignore:
Timestamp:
Mar 18, 2009, 11:14:55 AM (15 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GpuWorkFetch

    v29 v30  
    11= Work fetch and GPUs =
    22
    3 This document describes changes to BOINC's work fetch mechanism,
    4 in the 6.7 client and the scheduler as of [17024].
     3This document describes changes to BOINC's work fetch mechanism
     4in the 6.6 client and the scheduler as of [17024].
    55
    66== Problems with the old work fetch policy ==
     
    2626== Examples ==
    2727
    28 In following, A and B are projects.
     28In following examples, the client is attached
     29to projects A and B with equal resource share.
    2930
    3031=== Example 1 ===
     
    4950
    5051Variation: a new project C is attached when A's job finishes.
    51 It should immediately share the CPU with B.
     52It should immediately share the CPU 50/50 with B.
    5253
    5354=== Example 3 ===
     
    5657After a year, B gets a GPU app.
    5758
    58 Goal: A and B immediately share the GPU.
    59 
    60 == Resource types ==
    61 
    62 New abstraction: '''processing resource type''' just "resource type".
     59Goal: A and B immediately share the GPU 50/50.
     60
     61== The new policy ==
     62
     63=== Resource types ===
     64
     65New abstraction: '''processing resource type''' or just "resource type".
    6366Examples of resource types:
    6467 * CPU
    6568 * A coprocessor type (a kind of GPU, or the SPE processors in a Cell)
    66 
    67 A job sent to a client is associated with an app version,
    68 which uses some number (possibly fractional) of CPUs,
    69 and some number of instances of a particular coprocessor type.
    70 
    71 == Scheduler request and reply message ==
    72 
    73 New fields in the scheduler request message:
    74 
    75  '''double cpu_req_secs''':: number of CPU seconds requested
    76  '''double cpu_req_instances''':: send enough jobs to occupy this many CPUs
    77 
    78 And for each coprocessor type:
    79 
    80  '''double req_secs''':: number of instance-seconds requested
    81  '''double req_instances''':: send enough jobs to occupy this many instances
    82 
    83 The semantics: a scheduler should send jobs for a resource type
    84 only if the request for that type is nonzero.
    85 
    86 For compatibility with old servers, the message still has '''work_req_seconds''',
    87 which is the max of the req_seconds.
    88 
    89 == Per-resource-type backoff ==
    90 
    91 We need to handle the situation where e.g. there's a GPU shortfall
    92 but no projects are supplying GPU work
    93 (for either permanent or transient reasons).
    94 We don't want an overall work-fetch backoff from those projects.
    95 
    96 Instead, we maintain a separate backoff timer per (project, resource type).
    97 The backoff interval is doubled up to a limit whenever we ask for work of that type and don't get any work;
     69Currently there are two resource types: CPU and NVIDIA GPUs.
     70
     71Summary of the new policy: it's like the old policy,
     72but with a separate copy for each resource type,
     73and scheduler requests can now ask for work for particular resource types.
     74
     75=== Per-resource-type backoff ===
     76
     77We need to keep track of whether projects have work for particular
     78resource types,
     79so that we don't keep asking them for types of work they don't have.
     80
     81To do this, we maintain a separate backoff timer per (project, resource type).
     82The backoff interval is doubled up to a limit (1 day)
     83whenever we ask for work of that type and don't get any work;
    9884it's cleared whenever we get a job of that type.
    99 
    100 There is still an overall backoff timer for each project.
    101 This is triggered by:
    102  * requests from the project
    103  * RPC failures
    104  * job errors
    105 and so on.
    106 
    10785Note: if we decide to ask a project for work for resource A,
    10886we may ask it for resource B as well, even if it's backed off for B.
    10987
    110 == Long-term debt ==
     88This is independent of the overall backoff timer for each project,
     89which is triggered by requests from the project,
     90RPC failures, job errors and so on.
     91
     92=== Long-term debt ===
    11193
    11294We continue to use the idea of '''long-term debt''' (LTD),
     
    145127 * An offset is added so that the maximum debt across all projects is zero (this ensures that when a new project is attached, it starts out debt-free).
    146128
    147 
    148 == Client data structures ==
    149 
    150 === RSC_WORK_FETCH ===
    151 
    152 Work-fetch state for a particular resource types.
    153 There are instances for CPU ('''cpu_work_fetch''') and NVIDIA GPUs ('''cuda_work_fetch''').
    154 Data members:
    155 
    156  '''ninstances''':: number of instances of this resource type
    157 
    158 Used/set by rr_simulation()):
    159 
    160  '''double shortfall''':: shortfall for this resource
    161  '''double nidle''':: number of currently idle instances
    162 
    163 Member functions:
    164 
    165  '''rr_init()''':: called at the start of RR simulation.  Compute project shares for this PRSC, and clear overall and per-project shortfalls.
    166  '''set_nidle()''':: called by RR sim after initial job assignment.
    167 Set nidle to # of idle instances.
    168  '''accumulate_shortfall()''':: called by RR sim for each time interval during work buf period.
    169 {{{
    170 shortfall += dt*(ninstances - instances in use)
    171 for each project p not backed off for this PRSC
    172     p->PRSC_PROJECT_DATA.accumulate_shortfall(dt)
    173 }}}
    174 
    175  '''select_project()''':: select the best project to request this type of work from. It's the project not backed off for this PRSC, and for which LTD + p->shortfall is largest, also taking into consideration overworked projects etc.
    176 
    177  '''accumulate_debt(dt)'''::
    178 for each project p:
    179 {{{
    180 x = insts of this device used by P's running jobs
    181 y = P's share of this device
    182 update P's LTD
    183 }}}
    184 
    185 === RSC_PROJECT_WORK_FETCH ===
    186 
    187 State for a (resource type, project pair).
    188 It has the following "persistent" members (i.e., saved in state file):
    189 
    190  '''backoff_interval'''::  how long to wait until ask project for work specifically for this PRSC;
    191 double this any time we ask for work for this rsc and get none (maximum 24 hours). Clear it when we ask for work for this PRSC and get some job.
    192  '''backoff_time''':: back off until this time
    193  '''debt''': long term debt
    194 
    195 And the following transient members (used by rr_simulation()):
    196 
    197  '''double runnable_share''':: # of instances this project should get based on resource share
    198 relative to the set of projects not backed off for this PRSC.
    199  '''instances_used''':: # of instances currently being used
    200 
    201 === PROJECT_WORK_FETCH ===
    202 
    203 Per-project work fetch state.
    204 Members:
    205  '''overall_debt''':: weighted sum of per-resource debts
    206 
    207 === WORK_FETCH ===
    208 
    209 Overall work-fetch state.
    210 
    211  '''PROJECT* choose_project()''':: choose a project from which to fetch work.
    212 
    213  * Do round-robin simulation
    214  * if a GPU is idle, choose a project to ask for that type of work (using RSC_WORK_FETCH::choose_project())
    215  * if a CPU is idle, choose a project to ask for CPU work
    216  * if GPU has a shortfall, choose a project to ask for GPU work
    217  * if CPU has a shortfall, choose a project to ask for CPU work
    218  In the case where a resource type was idle, ask for only that type of work.
     129=== Summary of the new policy ===
     130
     131Every 60 seconds, and when various events happen (e.g. jobs finish),
     132the following is done.
     133CI is the "connect interval" preference;
     134AW is the "additional work" preference.
     135
     136Auxiliary functions:
     137
     138'''get_major_shortfall(resource)'''
     139
     140If the resource will have an idle instance before CI,
     141return the greatest-overall-debt non-backed-off project P
     142(P may be overworked).  Otherwise return NULL.
     143
     144'''get_minor_shortfall(resource)'''
     145
     146If the resource will have an idle instance between CI and CI+AW,
     147return the greatest-overall-debt non-backed-off non-overworked project P
     148
     149'''get_starved_project(resource)'''
     150
     151If any project is not overworked, not backed off, and has no runnable jobs
     152for any resource, return the one with greatest overall debt
     153
     154Main logic:
     155 * Do a round-robin simulation of currently queued jobs.
     156 * p = get_major_shortfall(NVIDIA GPU); if p <> NULL, ask it for work and return
     157 * ... same for other coprocessor types (we assume that coprocessors are faster, hence more imporant, than CPU)
     158 * ... same, for CPU
     159 * p = get_minor(shortfall(NVIDIA GPU); if p <> NULL, ask it for work and return
     160 * ... same for other coprocessor types, then CPU
     161 * p = get_starved_project(NVIDIA GPU); if p <> NULL, ask it for work and return
     162 * ... same for other coprocessor types, then CPU
     163
     164In the get_major_shortfall() case, ask only for work of that resource type.
    219165Otherwise ask for all types of work for which there is a shortfall.
    220166
    221 == Scheduler changes ==
     167== Implementation notes ==
     168
     169A job sent to a client is associated with an app version,
     170which uses some number (possibly fractional) of CPUs,
     171and some number of instances of a particular coprocessor type.
     172
     173=== Scheduler request and reply message ===
     174
     175New fields in the scheduler request message:
     176
     177 '''double cpu_req_secs''':: number of CPU seconds requested
     178 '''double cpu_req_instances''':: send enough jobs to occupy this many CPUs
     179
     180And for each coprocessor type:
     181
     182 '''double req_secs''':: number of instance-seconds requested
     183 '''double req_instances''':: send enough jobs to occupy this many instances
     184
     185The semantics: a scheduler should send jobs for a resource type
     186only if the request for that type is nonzero.
     187
     188For compatibility with old servers, the message still has '''work_req_seconds''',
     189which is the max of the req_seconds.
     190
     191=== Client data structures ===
     192
     193 RSC_WORK_FETCH:: The work-fetch state for a particular resource type. There are instances for CPU ('''cpu_work_fetch''') and NVIDIA GPUs ('''cuda_work_fetch''').
     194 RSC_PROJECT_WORK_FETCH:: The work-fetch state for a (resource type, project pair).
     195 PROJECT_WORK_FETCH:: Per-project work fetch state.
     196 WORK_FETCH:: Overall work-fetch state.
     197
     198=== Scheduler changes ===
    222199
    223200 * WORK_REQ has fields for requests (secs, instances) of the various resource types
     
    229206 * get_app_version(): skip app versions for resource for which we don't need more work.
    230207
    231 
    232208== Notes ==
    233209