RAID fails while running BOINC

Message boards : Questions and problems : RAID fails while running BOINC
Message board moderation

To post messages, you must log in.

AuthorMessage
Hcir

Send message
Joined: 6 Jan 11
Posts: 2
United States
Message 36349 - Posted: 6 Jan 2011, 0:49:46 UTC

After years of running BOINC on my computers I switched to a system with a primary RAID01 array. Whenever I let BOINC run for more that a few hours one of the array member disks will go out of sync and the array needs to be rebuilt/repaired. This only happens when running BOINC. I can leave the computer on for days running other programs and not have a problem. Not sure why this is happening. TL for any suggestions.
ID: 36349 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15482
Netherlands
Message 36352 - Posted: 6 Jan 2011, 13:34:43 UTC - in response to Message 36349.  

ID: 36352 · Report as offensive
Hcir

Send message
Joined: 6 Jan 11
Posts: 2
United States
Message 36362 - Posted: 7 Jan 2011, 12:33:00 UTC


Here is some more info from the boinc startup



1/7/2011 7:30:47 AM Starting BOINC client version 6.10.58 for windows_x86_64
1/7/2011 7:30:47 AM log flags: file_xfer, sched_ops, task
1/7/2011 7:30:47 AM Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
1/7/2011 7:30:47 AM Data directory: C:\ProgramData\BOINC
1/7/2011 7:30:47 AM Running under account Monkey
1/7/2011 7:30:47 AM Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz [Family 6 Model 26 Stepping 5]
1/7/2011 7:30:47 AM Processor: 256.00 KB cache
1/7/2011 7:30:47 AM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 nx lm vmx tm2 popcnt pbe
1/7/2011 7:30:47 AM OS: Microsoft Windows 7: x64 Edition, (06.01.7600.00)
1/7/2011 7:30:47 AM Memory: 5.99 GB physical, 11.98 GB virtual
1/7/2011 7:30:47 AM Disk: 1.36 TB total, 1.23 TB free
1/7/2011 7:30:47 AM Local time is UTC -5 hours
1/7/2011 7:30:47 AM NVIDIA GPU 0: GeForce GTX 295 (driver version 26089, CUDA version 3020, compute capability 1.3, 873MB, 596 GFLOPS peak)
1/7/2011 7:30:47 AM NVIDIA GPU 1: GeForce GTX 295 (driver version 26089, CUDA version 3020, compute capability 1.3, 873MB, 596 GFLOPS peak)
1/7/2011 7:30:47 AM superlinkattechnion URL http://cbl-boinc-server2.cs.technion.ac.il/superlinkattechnion/; Computer ID 83410; resource share 100
1/7/2011 7:30:47 AM Milkyway@home URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 217519; resource share 100
1/7/2011 7:30:47 AM SETI@home URL http://setiathome.berkeley.edu/; Computer ID 5542423; resource share 100
1/7/2011 7:30:47 AM superlinkattechnion General prefs: from superlinkattechnion (last modified 13-Aug-2008 20:23:33)
1/7/2011 7:30:47 AM superlinkattechnion Host location: none
1/7/2011 7:30:47 AM superlinkattechnion General prefs: using your defaults
1/7/2011 7:30:47 AM Reading preferences override file
1/7/2011 7:30:47 AM Preferences:
1/7/2011 7:30:47 AM max memory usage when active: 3067.04MB
1/7/2011 7:30:47 AM max memory usage when idle: 4293.86MB
1/7/2011 7:30:47 AM max disk usage: 10.00GB
1/7/2011 7:30:47 AM suspend work if non-BOINC CPU load exceeds 25 %
1/7/2011 7:30:47 AM (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
1/7/2011 7:30:47 AM Milkyway@home Task de_separation_17_3s_fix_1_129506_1293428278_0 is 3.28 days overdue; you may not get credit for it. Consider aborting it.
1/7/2011 7:30:47 AM Milkyway@home Task de_nbody_model7_3_5388_1293544145_0 is 1.92 days overdue; you may not get credit for it. Consider aborting it.
1/7/2011 7:30:47 AM Milkyway@home Task de_nbody_model7_3_5448_1293544145_0 is 1.92 days overdue; you may not get credit for it. Consider aborting it.
1/7/2011 7:30:47 AM Milkyway@home Task de_nbody_model4_3_6418_1293544152_0 is 1.90 days overdue; you may not get credit for it. Consider aborting it.
1/7/2011 7:30:47 AM Milkyway@home Task de_nbody_model3_3_14538_1293566316_0 is 1.65 days overdue; you may not get credit for it. Consider aborting it.
1/7/2011 7:30:47 AM Milkyway@home Task de_separation_16_3s_fix_1_14307_1293571215_0 is 1.63 days overdue; you may not get credit for it. Consider aborting it.
1/7/2011 7:30:47 AM Milkyway@home Task de_separation_16_3s_fix_1_19551_1293567067_0 is 1.68 days overdue; you may not get credit for it. Consider aborting it.
1/7/2011 7:30:47 AM Milkyway@home Task de_separation_16_3s_fix_1_4332_1293567619_0 is 1.67 days overdue; you may not get credit for it. Consider aborting it.
1/7/2011 7:30:47 AM Not using a proxy
1/7/2011 7:30:47 AM Milkyway@home Aborting task de_separation_16_3s_fix_1_234654_1293566828_1; not started and deadline has passed
1/7/2011 7:30:47 AM Milkyway@home Aborting task de_separation_16_3s_fix_1_22997_1293568439_0; not started and deadline has passed
1/7/2011 7:30:47 AM Milkyway@home Aborting task de_separation_16_3s_fix_1_6997_1293568496_0; not started and deadline has passed
1/7/2011 7:30:47 AM Milkyway@home Aborting task de_separation_16_3s_fix_1_239315_1293569183_0; not started and deadline has passed
1/7/2011 7:30:47 AM Milkyway@home Aborting task de_separation_16_3s_fix_1_24747_1293569267_0; not started and deadline has passed
1/7/2011 7:30:47 AM Milkyway@home Aborting task de_separation_16_3s_fix_1_10702_1293569956_0; not started and deadline has passed
1/7/2011 7:30:47 AM Milkyway@home Aborting task de_separation_16_3s_fix_1_240308_1293570035_0; not started and deadline has passed
1/7/2011 7:30:47 AM Milkyway@home Aborting task de_separation_16_3s_fix_1_11280_1293570251_0; not started and deadline has passed
1/7/2011 7:30:47 AM Milkyway@home Aborting task de_separation_16_3s_fix_1_241093_1293570488_0; not started and deadline has passed
1/7/2011 7:30:47 AM Milkyway@home Aborting task de_separation_16_3s_fix_1_241614_1293570557_0; not started and deadline has passed
1/7/2011 7:30:47 AM Milkyway@home Aborting task de_separation_16_3s_fix_1_242505_1293570918_0; not started and deadline has passed
1/7/2011 7:30:47 AM Milkyway@home Aborting task de_separation_16_3s_fix_1_10352_1293569892_1; not started and deadline has passed
1/7/2011 7:30:47 AM SETI@home Restarting task ap_18se10ab_B5_P0_00366_20101221_15386.wu_1 using astropulse_v505 version 505
1/7/2011 7:30:47 AM Milkyway@home Restarting task de_separation_17_3s_fix_1_129506_1293428278_0 using milkyway version 50
1/7/2011 7:30:47 AM Milkyway@home Restarting task de_nbody_model7_3_5388_1293544145_0 using milkyway_nbody version 21
1/7/2011 7:30:47 AM Milkyway@home Restarting task de_nbody_model7_3_5448_1293544145_0 using milkyway_nbody version 21
1/7/2011 7:30:47 AM Milkyway@home Restarting task de_nbody_model4_3_6418_1293544152_0 using milkyway_nbody version 21
1/7/2011 7:30:47 AM SETI@home Restarting task 21se10aa.28858.20108.3.10.20_0 using setiathome_enhanced version 603
1/7/2011 7:30:47 AM Milkyway@home Restarting task de_nbody_model3_3_14538_1293566316_0 using milkyway_nbody version 21
1/7/2011 7:30:47 AM Milkyway@home Restarting task de_separation_16_3s_fix_1_14307_1293571215_0 using milkyway version 50
1/7/2011 7:30:47 AM Milkyway@home Restarting task de_separation_16_3s_fix_1_19551_1293567067_0 using milkyway version 50
1/7/2011 7:30:47 AM Milkyway@home Restarting task de_separation_16_3s_fix_1_4332_1293567619_0 using milkyway version 50
1/7/2011 7:30:47 AM Milkyway@home Sending scheduler request: To report completed tasks.
1/7/2011 7:30:47 AM Milkyway@home Reporting 14 completed tasks, not requesting new tasks
1/7/2011 7:30:49 AM Milkyway@home Scheduler request completed
1/7/2011 7:30:49 AM Milkyway@home Message from server: Result de_nbody_model7_3_5448_1293544145_0 is no longer usable
1/7/2011 7:30:49 AM Milkyway@home Message from server: Result de_separation_16_3s_fix_1_19551_1293567067_0 is no longer usable
1/7/2011 7:30:49 AM Milkyway@home Message from server: Result de_separation_16_3s_fix_1_4332_1293567619_0 is no longer usable
1/7/2011 7:30:49 AM Milkyway@home Message from server: Result de_nbody_model3_3_14538_1293566316_0 is no longer usable
1/7/2011 7:30:49 AM Milkyway@home Message from server: Result de_separation_16_3s_fix_1_14307_1293571215_0 is no longer usable
1/7/2011 7:30:50 AM Milkyway@home Computation for task de_nbody_model7_3_5448_1293544145_0 finished
1/7/2011 7:30:50 AM Milkyway@home Computation for task de_nbody_model3_3_14538_1293566316_0 finished
1/7/2011 7:30:50 AM Milkyway@home Computation for task de_separation_16_3s_fix_1_14307_1293571215_0 finished
1/7/2011 7:30:50 AM Milkyway@home Computation for task de_separation_16_3s_fix_1_19551_1293567067_0 finished
1/7/2011 7:30:50 AM Milkyway@home Computation for task de_separation_16_3s_fix_1_4332_1293567619_0 finished
1/7/2011 7:30:50 AM SETI@home Restarting task 08jn10ad.29491.22721.12.10.84_0 using setiathome_enhanced version 603
1/7/2011 7:30:50 AM SETI@home Restarting task 02se10ac.17424.9474.4.10.248_0 using setiathome_enhanced version 603
1/7/2011 7:30:50 AM SETI@home Starting 21se10af.28873.22317.3.10.128_0
1/7/2011 7:30:50 AM SETI@home Starting task 21se10af.28873.22317.3.10.128_0 using setiathome_enhanced version 609
1/7/2011 7:30:50 AM SETI@home Starting 08jn10ad.22881.19449.9.10.110_2
1/7/2011 7:30:50 AM SETI@home Starting task 08jn10ad.22881.19449.9.10.110_2 using setiathome_enhanced version 609
ID: 36362 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15482
Netherlands
Message 36363 - Posted: 7 Jan 2011, 12:56:53 UTC - in response to Message 36362.  
Last modified: 7 Jan 2011, 12:57:13 UTC

OK, with more information I meant all information about the RAID. Since that's what you have the problem with...

So is it a software RAID, hardware RAID?
Dedicated card or on the motherboard?
If card, what brand and model?
-- Did you check for updated BIOS for the card and/or firmware?
If motherboard, what brand and model?
-- Did you check for updated BIOS for the motherboard and other firmware for the onboard RAID?
How many hard drives in your RAID01?
What size?
For the aficionados, what brand?

I see you run both Seti and Milkyway.
Does it happen when you only run Seti?
Does it happen when you only run Milkyway?

BOINC will write a lot to some of its files.
Can you exclude the BOINC Data directory (default at C:\Programdata\BOINC in Windows 7) from being on the RAID and install it on a separate drive outside the RAID, if only to check that that fixes the problem? To do so, you will have to uninstall BOINC, move the remaining Data directory to its new position, then install BOINC and on the 3rd screen in the installer, click Advanced, to then on the next screen change the path for the BOINC Data directory to that of where you moved it to.
ID: 36363 · Report as offensive
BlueCarbon
Avatar

Send message
Joined: 1 Nov 10
Posts: 5
United States
Message 37421 - Posted: 8 Apr 2011, 3:00:05 UTC - in response to Message 36363.  
Last modified: 8 Apr 2011, 3:01:45 UTC

Try Ageless' recommendations, but I'd like to add a few things for you or anyone else having RAID issues.

I work for a hard drive company and sometimes the drives themselves can drop from the array for different reasons. Your drives may need a firmware update, or if you are running a cheap RAID controller, you will need to use RAID edition drives opposed to desktop edition drives that support TLER Time-Limited Error Recovery. Usually only the very high-end RAID controllers support TLER. Also test your drives using the drive manufacturer's diagnostic utility. These utilities can actually fix some problems with drives, or at least show they are good or failing. If your drive is tripping over a bad sector, this can also cause a drop which is repairable by the utility if there aren't too many. Hard drives made today have spare sectors that can be used to replace bad ones if it cannot be repaired. Don't rely on the Windows through Scan Disk utility.

Also, you may think "I only have this problem when using SETI@home, so it must be a SETI@home problem", but that may be wrong. Anything is possible, so it can be SETI@home, but that is highly unlikely. BOINC may simply be using a portion of the drive where a bad sector(s) resides thus causing the dropouts.
ID: 37421 · Report as offensive

Message boards : Questions and problems : RAID fails while running BOINC

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.