Thread 'Temporarily failed upload of WU: HTTP error'

Message boards : BOINC client : Temporarily failed upload of WU: HTTP error
Message board moderation

To post messages, you must log in.

AuthorMessage
Mumps

Send message
Joined: 29 Aug 09
Posts: 4
United States
Message 26860 - Posted: 29 Aug 2009, 3:23:02 UTC

I'm working over on SIMAP and I'm seeing a lot of errors with uploads. I've mentioned it over in their forums, and the response is basically that things look fine on their end. (Which I wouldn't doubt. :-)) I was wondering where I should go to see about tracking down why I'd be having problems.

Basically, a small percentage of my uploads seem to fail part-way through. Then, every subsequent retry for that WU fails with an HTTP error. But other WU's that complete on the host transfer without issue. So how would we diagnose this a bit further to track down the root cause?

Platforms
    o Dotsch Linux (Ubuntu 8.10, fully patched) both 32 and 64 bits
    o Windows flavors including 2003 R2 Enterprise and Standard, Win XP Pro, Windows 7 x64 Edition


Clients

    o 5.10.45 (32 bit)
    o 6.6.20 (32 and 64 bit)
    o 6.6.37 (64 bit Linux only)


Sample log

boincsimap 8/27/2009 8:10:16 PM Started upload of 9082401.105499_0_0
boincsimap 8/27/2009 8:10:16 PM Started upload of 9082401.118984_0_0
--- 8/27/2009 8:10:20 PM Project communication failed: attempting access to reference site
boincsimap 8/27/2009 8:10:20 PM Temporarily failed upload of 9082401.105499_0_0: HTTP error
boincsimap 8/27/2009 8:10:20 PM Backing off 1 hr 51 min 19 sec on upload of 9082401.105499_0_0
boincsimap 8/27/2009 8:10:20 PM Temporarily failed upload of 9082401.118984_0_0: HTTP error
boincsimap 8/27/2009 8:10:20 PM Backing off 3 hr 15 min 58 sec on upload of 9082401.118984_0_0
boincsimap 8/27/2009 8:10:20 PM Started upload of 9082401.135420_0_0
boincsimap 8/27/2009 8:10:20 PM Started upload of 9082401.140142_0_0
--- 8/27/2009 8:10:22 PM Internet access OK - project servers may be temporarily down.
--- 8/27/2009 8:10:23 PM Project communication failed: attempting access to reference site
boincsimap 8/27/2009 8:10:23 PM Temporarily failed upload of 9082401.135420_0_0: HTTP error
boincsimap 8/27/2009 8:10:23 PM Backing off 3 hr 44 min 18 sec on upload of 9082401.135420_0_0
boincsimap 8/27/2009 8:10:23 PM Temporarily failed upload of 9082401.140142_0_0: HTTP error
boincsimap 8/27/2009 8:10:23 PM Backing off 3 hr 27 min 56 sec on upload of 9082401.140142_0_0
--- 8/27/2009 8:10:25 PM Internet access OK - project servers may be temporarily down.

And with file_xfer_debug:
boincsimap 8/27/2009 9:12:37 PM Started upload of 9082401.105499_0_0
boincsimap 8/27/2009 9:12:37 PM [file_xfer_debug] URL: http://boinc.bio.wzw.tum.de/boincsimap_ ... ad_handler
boincsimap 8/27/2009 9:12:40 PM [file_xfer_debug] FILE_XFER_SET::poll(): http op done; retval 0
boincsimap 8/27/2009 9:12:40 PM [file_xfer_debug] parsing upload response: <data_server_reply>
<status>0</status>
<file_size>382932</file_size>
</data_server_reply>
boincsimap 8/27/2009 9:12:40 PM [file_xfer_debug] parsing status: 0
--- 8/27/2009 9:12:42 PM Project communication failed: attempting access to reference site
boincsimap 8/27/2009 9:12:42 PM [file_xfer_debug] FILE_XFER_SET::poll(): http op done; retval -184
boincsimap 8/27/2009 9:12:42 PM [file_xfer_debug] file transfer status -184
boincsimap 8/27/2009 9:12:42 PM Temporarily failed upload of 9082401.105499_0_0: HTTP error
boincsimap 8/27/2009 9:12:42 PM Backing off 8 min 36 sec on upload of 9082401.105499_0_0

    o Stopping and restarting BOINC hasn't helped.
    o Rebooting the systems hasn't helped.
    o I even tried to force the transfer to start from scratch by removing the <persistent_file_transfer> block from the client_state.xml file for some of the broken transfers while I had BOINC stopped. But it seems to immediately go back to where the transfer got stuck and still generates the same result. (Is there a "correct" way to tell a server to discard what's already been received and start over?)


All I've got as an option is to cancel the transfer and give up on the WU's. And seeing as I successfully calculated the WU, I hate having to do that. ;-)

Any suggestions on next steps? My nagging suspicion is that the Client and Server are not agreeing about something with the interrupted transfer. But how can we figure out why? (Maybe even permissions on their server preventing appending to the partial file?)

Thanks for any suggestions.

ID: 26860 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 26861 - Posted: 29 Aug 2009, 3:47:56 UTC - in response to Message 26860.  

That error usually means that the project's upload server is very busy at that moment.
It's usually transitory, so if you leave it alone, BOINC will keep on re-trying, up to a limit of 14 days.

ID: 26861 · Report as offensive
ZPM
Avatar

Send message
Joined: 14 Mar 09
Posts: 215
United States
Message 26862 - Posted: 29 Aug 2009, 3:57:18 UTC - in response to Message 26861.  

this limit can be changed though.
ID: 26862 · Report as offensive
Aurora Borealis
Avatar

Send message
Joined: 8 Jan 06
Posts: 448
Canada
Message 26864 - Posted: 29 Aug 2009, 7:30:51 UTC - in response to Message 26861.  
Last modified: 29 Aug 2009, 7:32:39 UTC

That error usually means that the project's upload server is very busy at that moment.
It's usually transitory, so if you leave it alone, BOINC will keep on re-trying, up to a limit of 14 days.


V6.10.x should have raised the limit to 90 days since the two weeks could be tight for projects that have major server failure.
ID: 26864 · Report as offensive
ProfileGundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 26865 - Posted: 29 Aug 2009, 8:32:17 UTC - in response to Message 26860.  

You could try switching from htt protocol version 1.1 to version 1.0; and there are two more log flags: <http_debug> and <http_xfer_debug> (not that I could help analysing the output :-).

To do so, you must add an <options> section in your cc_config.xml (if it's not already there):
<cc_config>
<log_flags>
...
</log_flags>
<options>
<http_1_0>1</http_1_0>
</options>
</cc_config>
There are no lower case 'L' characters in that code, only digits '1'. More information to cc_config you can find here.
In BOINC Manager (Advanced view) select menu 'Advanced->Read config file' to activate the new configuration.
Hope that helps.

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
ID: 26865 · Report as offensive
Mumps

Send message
Joined: 29 Aug 09
Posts: 4
United States
Message 26872 - Posted: 29 Aug 2009, 14:41:02 UTC - in response to Message 26861.  

Thanks for the input Les.

This has been going on for a few days now and I've been retrying transfers sporadically the entire time. Even when I see the host successfully upload everything it finishes, as it finishes, these "stuck" uploads consistently fail. At the moment, one of my hosts has 9 "stuck" uploads, so I upgraded it to the Linux 64 version of 6.6.37 and will do some more testing. Right now, the most telling piece of the log is indicating that the host negotiates and begins the transfer, then almost immediately issues a

[http_debug][ID#x]info: select/poll returned error
[http_debug][ID#x]info: Expire cleared
[http_debug][ID#x]info: Closing connection #y

Because SIMAP is now out of work, the normal strain their server gets during the infrequent work releases should now be relieved, and still these transfers are "stuck." Also, as Jonathan mentioned over in their forums, the fact that my host gets a data_server_reply when it asks for the file size on the server indicates it's successful at getting a connection for every one of these retries in order to check the restart point. But then always fails when trying to send the actual data.
ID: 26872 · Report as offensive
Mumps

Send message
Joined: 29 Aug 09
Posts: 4
United States
Message 26873 - Posted: 29 Aug 2009, 14:48:43 UTC - in response to Message 26865.  

You could try switching from htt protocol version 1.1 to version 1.0; and there are two more log flags: <http_debug> and <http_xfer_debug> (not that I could help analysing the output :-).

To do so, you must add an <options> section in your cc_config.xml (if it's not already there):
<cc_config>
<log_flags>
...
</log_flags>
<options>
<http_1_0>1</http_1_0>
</options>
</cc_config>
There are no lower case 'L' characters in that code, only digits '1'. More information to cc_config you can find here.
In BOINC Manager (Advanced view) select menu 'Advanced->Read config file' to activate the new configuration.
Hope that helps.

Gruß,
Gundolf

Thanks Gundolf. I'd been to the link with the options for the cc_config. That's how I knew to add the <file_xfer_debug> part. :-) Just wasn't familiar enough with the process to realize the http_debug flags would be helpful as well. The wiki tells you what each flag does, but not when it's appropriate (or helpful) to enable each one. "I'm having troubles with file transfers, maybe this flag will help..." doesn't really make the "http_debug" flags stand out. ;-)

So, it looks like it gets to the point of attempting to send the data, but then has issues with the opened connection. Time to research a bit deeper on socket communications in this Ubuntu 8.10 release... Most of my hosts are this specific install, so focusing there may be more fruitful than the smaller handful of Windows systems with the same issue. I wonder if there are any tunables that may help on the client side...

Thanks again for the leads folks!
ID: 26873 · Report as offensive
ProfileGundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 26880 - Posted: 29 Aug 2009, 16:26:21 UTC - in response to Message 26873.  

Thanks Gundolf. I'd been to the link with the options for the cc_config. That's how I knew to add the <file_xfer_debug> part. :-)

Okay, that link was copied along from an older post :-) since I often give the advice about <http_1_0>. By the way, you didn't mention if you tried it. It has helped me with HTTP errors while uploading. Downloading has never been an issue (on my side :-).

Gruß,
Gundolf
ID: 26880 · Report as offensive
Mumps

Send message
Joined: 29 Aug 09
Posts: 4
United States
Message 26889 - Posted: 29 Aug 2009, 21:24:18 UTC - in response to Message 26880.  

Thanks Gundolf. I'd been to the link with the options for the cc_config. That's how I knew to add the <file_xfer_debug> part. :-)

Okay, that link was copied along from an older post :-) since I often give the advice about <http_1_0>. By the way, you didn't mention if you tried it. It has helped me with HTTP errors while uploading. Downloading has never been an issue (on my side :-).

Gruß,
Gundolf

Oops. I originally had put it in my response to Les, but it seems I edited it out before posting. :-) No, it didn't resolve the problem.

I've added all those options (http_1_0, http_debug, http_xfer_debug. I also changed max_file_xfers to 1) and have grabbed some more log. The results seem to be as follows
    o Successfully connects to the server and receives the info about how much of the file the server already has.
    o Successfully opens the connection and begins sending data
    o Then gets the errors listed in my response to Les


I'm continuing to experiment a little more and should be posting some additional logs soon. But SIMAP has released more work, so their server is busy again which brings back the point Les made about the server being busy as a possible cause. So I'm hoping to see it die down again to have a higher confidence level that this really isn't the "transient" error it's flagged as. Even with their server busy, all new WU's that are completing are uploading with no errors.

The particular host I'm working with hasn't had a new upload failure since yesterday, but the 9 problem WU's it has are still hanging around. One other thing I did note is that all the trouble WU's are from a specific release of work of theirs. So it may have to do with the size of the results files as Jonathan mentioned as well. Normal results for SIMAP are around 500K. For their Superfamily tasks some of the results got up over 3,000K. But still, I would expect BOINC to successfully recover from interrupted transfers. So this is just puzzling. What can cause a failure to persist? Why can't the retry seem to get any additional data uploaded?

ID: 26889 · Report as offensive

Message boards : BOINC client : Temporarily failed upload of WU: HTTP error

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.