Changes between Version 8 and Version 9 of CreditNew


Ignore:
Timestamp:
Nov 4, 2009, 12:24:38 PM (15 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CreditNew

    v8 v9  
    134134so that the average is the same for each version.
    135135The adjustment is always downwards:
    136 we maintain the average PFC*(V) of PFC() for each app version V,
     136we maintain the average PFC^mean^(V) of PFC() for each app version V,
    137137find the minimum X.
    138138An app version V's jobs are then scaled by the factor
    139 {{{
    140 S(V) = (X/PFC*(V))
    141 }}}
     139
     140 S(V) = (X/PFC^mean^(V))
     141
    142142
    143143The result for a given job J
    144144is called "Version-Normalized Peak FLOP Count", or VNPFC(J):
    145 {{{
    146 VNPFC(J) = PFC(J) * (X/PFC*(V))
    147 }}}
     145
     146 VNPFC(J) = PFC(J) * (X/PFC^mean^(V))
    148147
    149148Notes:
     
    162161   (e.g., workunit.rsc_fpops_est)
    163162   we can normalize by this to reduce the variance,
    164    and make PFC*(V) converge more quickly.
     163   and make PFC^mean^(V) converge more quickly.
    165164 * ''a posteriori'' estimates of job size may exist also
    166165   (e.g., an iteration count reported by the app)
     
    204203then, for that app,
    205204hosts should get the same average granted credit per job.
    206 To ensure this, for each application A we maintain the average VNPFC*(A),
    207 and for each host H we maintain VNPFC*(H, A).
     205To ensure this, for each application A we maintain the average VNPFC^mean^(A),
     206and for each host H we maintain VNPFC^mean^(H, A).
    208207The '''claimed credit''' for a given job J is then
    209 {{{
    210 VNPFC(J) * (VNPFC*(A)/VNPFC*(H, A))
    211 }}}
     208
     209 VNPFC(J) * (VNPFC^mean^(A)/VNPFC^mean^(H, A))
     210
    212211
    213212There are some cases where hosts are not sent jobs uniformly:
     
    219218
    220219This can be done by dividing
    221 each sample in the computation of VNPFC* by WU.rsc_fpops_est
     220each sample in the computation of VNPFC^mean^ by WU.rsc_fpops_est
    222221(in fact, there's no reason not to always do this).
    223222
     
    227226   and increases the claimed credit of hosts that are more efficient
    228227   than average.
    229  * VNPFC* is averaged over jobs, not hosts.
     228 * VNPFC^mean^ is averaged over jobs, not hosts.
    230229
    231230== Computing averages ==
     
    312311
    313312 * One-time cheats (like claiming 1e304) can be prevented by
    314    capping VNPFC(J) at some multiple (say, 10) of VNPFC*(A).
     313   capping VNPFC(J) at some multiple (say, 10) of VNPFC^mean^(A).
    315314 * Cherry-picking: suppose an application has two types of jobs,
    316315  which run for 1 second and 1 hour respectively.
     
    319318  Suppose a client systematically refuses the 1 hour jobs
    320319  (e.g., by reporting a crash or never reporting them).
    321   Its VNPFC*(H, A) will quickly decrease,
     320  Its VNPFC^mean^(H, A) will quickly decrease,
    322321  and soon it will be getting several thousand times more credit
    323322  per actual work than other hosts!
     
    325324  whenever a job errors out, times out, or fails to validate,
    326325  set the host's error rate back to the initial default,
    327   and set its VNPFC*(H, A) to VNPFC*(A) for all apps A.
     326  and set its VNPFC^mean^(H, A) to VNPFC^mean^(A) for all apps A.
    328327  This puts the host to a state where several dozen of its
    329328  subsequent jobs will be replicated.
     
    335334
    336335Unrelated to the credit proposal, but in a similar spirit.
    337 The server will maintain ET*(H, V), the statistics of
     336The server will maintain ET^mean^(H, V), the statistics of
    338337job runtimes (normalized by wu.rsc_fpops_est) per
    339338host and application version.
    340339
    341340The server's estimate of a job's runtime is then
    342 {{{
    343 R(J, H) = wu.rsc_fpops_est * ET*(H, V)
    344 }}}
     341
     342 R(J, H) = wu.rsc_fpops_est * ET^mean^(H, V)
     343
    345344
    346345== Implementation ==