| 1 | = Workunit and result state transitions = |
| 2 | |
| 3 | The processing of workunits and results can be described in terms of transitions of their state variables. |
| 4 | |
| 5 | |
| 6 | === Workunit state variables === |
| 7 | Workunits parameters are described [JobIn here]. |
| 8 | |
| 9 | Workunit state variables are as follows: |
| 10 | |
| 11 | |
| 12 | |
| 13 | ||'''canonical_resultid'''||The ID of the canonical result for this workunit, or zero. |
| 14 | * Initially zero |
| 15 | * Set by the validator (by check_set()) |
| 16 | |
| 17 | ||||'''transition_time'''||The next time to check for state transitions for this WU. |
| 18 | * Initially now. |
| 19 | * Set to now by scheduler when get a result for this WU. |
| 20 | * Set to min(current value, now + delay_bound) by scheduler when send a result for this WU |
| 21 | * Set to min(x.sent_time + wu.delay_bound) over IN_PROGRESS results x by transitioner when done handling this WU |
| 22 | * Set to now by validator if it finds canonical result, or if there is already a canonical result and some other results have validate_state = INIT, or if there is no consensus and the number of successful results is > wu.max_success_results |
| 23 | |
| 24 | ||||'''file_delete_state'''||Indicates whether input files should be deleted. |
| 25 | * Initially INIT |
| 26 | * Set to READY by transitioner when all results have server_state=OVER and wu.assimilate_state=DONE Note: db_purge purges a WU and all its results when file_delete_state=DONE; therefore it is critical that it only be set to DONE if all results have server_state=OVER. |
| 27 | * Set to DONE by file_deleter when it has attempted to delete files. |
| 28 | |
| 29 | ||||'''assimilate_state'''||Indicates whether the workunit should be assimilated. |
| 30 | * Initially INIT |
| 31 | * Set to READY by transitioner if wu.assimilate_state=INIT and WU has error condition |
| 32 | * Set to READY by validator when find canonical result and wu.assimilate_state=INIT |
| 33 | * Set to DONE by assimilator when done |
| 34 | |
| 35 | ||||'''need_validate'''||Indicates that the workunit has a result that needs validation. |
| 36 | * Initially FALSE |
| 37 | * Set to TRUE by transitioner if the number of success results is at least wu.min_quorum and there is a success result not validated yet |
| 38 | * Set to FALSE by validator |
| 39 | |
| 40 | ||||'''error_mask'''||A bit mask for error conditions. |
| 41 | * Initially zero |
| 42 | * Transitioner sets COULDNT_SEND_RESULT if some result couldn't be sent. |
| 43 | * Transitioner sets TOO_MANY_RESULTS if too many error results |
| 44 | * Transitioner sets TOO_MANY_TOTAL_RESULTS if too many total results |
| 45 | * Validator sets TOO_MANY_SUCCESS_RESULTS if no consensus and too many success results |
| 46 | |
| 47 | ||Workunit invariants: |
| 48 | |
| 49 | |
| 50 | * eventually either canonical_resultid or error_mask is set |
| 51 | * eventually transition_time = infinity |
| 52 | * Each WU is assimilated exactly once |
| 53 | |
| 54 | Notes on deletion of input files: |
| 55 | |
| 56 | |
| 57 | * Input files are eventually deleted, but only when all results have state=OVER (so that clients don't get download failures) and the WU has been assimilated (in case the project wants to examine input files in error cases). |
| 58 | |
| 59 | |
| 60 | === Result state variable === |
| 61 | Result state variables are listed in the following table: |
| 62 | |
| 63 | ||'''report_deadline'''||Give up on result (and possibly delete input files) if don't get reply by this time. |
| 64 | * Set by scheduler to now + wu.delay_bound when send result |
| 65 | |
| 66 | ||||'''server_state'''||Values: UNSENT, IN_PROGRESS, OVER |
| 67 | * Initially UNSENT |
| 68 | * Set by scheduler to IN_PROGRESS when send result |
| 69 | * Set by scheduler to OVER when result is reported in request message from client. |
| 70 | * Set by scheduler to OVER when it thinks host has detached project. |
| 71 | * Set by transitioner to OVER if now > result.report_deadline |
| 72 | * Set by transitioner to OVER if WU has error condition and result.server_state=UNSENT |
| 73 | * Set by validator to OVER if WU has canonical result and result.server_state=UNSENT |
| 74 | |
| 75 | ||||'''outcome'''||Values: SUCCESS, COULDNT_SEND, CLIENT_ERROR, NO_REPLY, DIDNT_NEED, VALIDATE_ERROR, CLIENT_DETACHED. Defined iff result.server_state=OVER |
| 76 | * Set by scheduler to SUCCESS if get reply and no client error |
| 77 | * Set by scheduler to CLIENT_ERROR if get reply and client error |
| 78 | * Set by scheduler to NO_REPLY if it thinks host has detached project. |
| 79 | * Set by transitioner to NO_REPLY if server_state=IN_PROGRESS and now < report_deadline |
| 80 | * Set by transitioner to DIDNT_NEED if WU has error condition and result.server_state=UNSENT |
| 81 | * Set by validator to DIDNT_NEED if WU has canonical result and result.server_state=UNSENT |
| 82 | * Set by validator to VALIDATE_ERROR if outcome was initially SUCCESS, but the validator had a permanent error reading a result file, or a file had a syntax error. Prevents the validator from trying again. |
| 83 | * Set by scheduler to CLIENT_DETACHED if it gets a request indicating that the client detached, then reattached |
| 84 | |
| 85 | ||||'''client_state'''||Records the client state (DOWNLOADING, DOWNLOADED, COMPUTE_ERROR, UPLOADING, UPLOADED, ABORTED) where an error occurred. Defined if outcome is CLIENT_ERROR. ||||'''file_delete_state'''|| |
| 86 | * Initially INIT |
| 87 | * Set by transitioner to READY if this is the canonical result, and file_delete_state=INIT, and wu.assimilate_state=DONE, and all the results have server_state=OVER, and all all the results with outcome=SUCCESS have validate_state<>INIT |
| 88 | * Set by transitioner to READY if wu.assimilate_state=DONE and result.outcome=CLIENT_ERROR or result.validate_state!=INIT |
| 89 | |
| 90 | ||||'''validate_state'''|| Defined iff result.outcome=SUCCESS |
| 91 | * Initially INIT |
| 92 | * Set by validator to VALID if outcome=SUCCESS and matches canonical result |
| 93 | * Set by validator to INVALID if outcome=SUCCESS and doesn't match canonical result |
| 94 | * Set by transitioner to NO_CHECK if the WU had an error; this avoids showing claimed credit as 'pending'. |
| 95 | * Set by validator to ERROR if outcome=SUCCESS and had a permanent error trying to read an output file, or an output file had a syntax error. |
| 96 | * Set by validator to INCONCLUSIVE if check_set() didn't find a consensus in a set of results containing this one. |
| 97 | * Set by scheduler to TOO_LATE if the result was reported after the canonical result's files were deleted. |
| 98 | |
| 99 | || |
| 100 | |
| 101 | Result invariants: |
| 102 | |
| 103 | |
| 104 | * Eventually server_state = OVER. |
| 105 | * Output files are eventually deleted. |
| 106 | |
| 107 | Notes on deletion of output files: |
| 108 | * Non-canonical results can be deleted as soon as the WU is assimilated. |
| 109 | * Canonical results can be deleted only when all results have server_state=OVER and all success results are validated. |
| 110 | * If a result reply arrives after its timeout, the output files can be immediately deleted. |
| 111 | |
| 112 | How do we delete output files that arrive REALLY late? (e.g. uploaded after all results have timed out, and never reported)? Possible answer: let X = create time of oldest unassimilated WU. Any output files created before X can be deleted. |