[wish] Allow users to temporarily turn on resend lost work units or to declare failed computers in their account

Message boards : Web interfaces : [wish] Allow users to temporarily turn on resend lost work units or to declare failed computers in their account
Message board moderation

To post messages, you must log in.

AuthorMessage
Jesse Viviano

Send message
Joined: 14 Feb 11
Posts: 63
United States
Message 41865 - Posted: 1 Jan 2012, 3:44:47 UTC

I hope that I am posting this in the right forum.

I think that BOINC needs to add an option for a user to turn on resending of lost work units for the next time the BOINC client performs an update, and to be able to declare that a computer has failed and all incomplete work units must be declared invalid. The first option will help if someone has a computer problem which forces a complete reinstall like a rootkit or a failing hard drive. The second option will aid in recovering from a truly lost work unit and when a computer has failed and the owner decides to dispose of it instead of trying to fix it.

The background is below.

A Western Digital Caviar Black 2TB that was barely one year old failed in my computer on Christmas Eve in 2011. I have replaced it with an SSD and am waiting on another hard drive (this one being Seagate Barracuda XT 2TB this time) to arrive. Fortunately, I was able to make a last minute backup of my critical data before the drive died entirely. I finally reinstalled my software and rejoined the projects I was joined to before the crash. I merged the computer entries in BOINC. In Einstein@home, this caused the lost work units to be resent because Einstein@home has lost work units resent. Other projects turn it off because it loads down the database, so what I have to do is merge the computer entries in my account, wait for my computer to finish all work units for that project, detach, and then reattach to the projects to get the database to invalidate my results so that they can be resent to other computers. This has to be done so that projects with very long deadlines like SETI@home can resend work units that were sent to the failed computer. I would like a better way to deal with complete computer failure than this.
ID: 41865 · Report as offensive
Jesse Viviano

Send message
Joined: 14 Feb 11
Posts: 63
United States
Message 41868 - Posted: 1 Jan 2012, 16:34:08 UTC

I was unable to back those up, so I cannot do that.
ID: 41868 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 41869 - Posted: 1 Jan 2012, 18:15:10 UTC - in response to Message 41865.  

The project has the option to resend lost work, but most projects leave this option off as it adds tremendously to the overhead, both to the database actions as to the bandwidth. Only projects with truly good servers will have it on.

It won't ever be a user settable option as then everyone will turn it on and with a little of luck, the project will crash, since no one will turn it off again.

The loss of work --not returning of work-- is already a calculated risk that all projects take, therefore there's a deadline on work, and when you have not returned your work by the deadline, it'll be sent to a second or third computer. Thus hardware may break down all it wants, the projects won't suffer much by it.
ID: 41869 · Report as offensive
Jesse Viviano

Send message
Joined: 14 Feb 11
Posts: 63
United States
Message 41871 - Posted: 1 Jan 2012, 19:32:55 UTC - in response to Message 41869.  

That is why I proposed a temporary option originally. It would be a one-shot deal. It would look for lost work units the next time that computer updates, and would turn off at the end of the update process.

However, I realize that my idea should be scrapped in favor of adding a button to the BOINC console to request lost work units.
ID: 41871 · Report as offensive

Message boards : Web interfaces : [wish] Allow users to temporarily turn on resend lost work units or to declare failed computers in their account

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.