Very short runs
log in

Advanced search

Message boards : Number crunching : Very short runs

1 · 2 · Next
Author Message
Les Bayliss
Send message
Joined: 9 Aug 12
Posts: 30
Credit: 834,550
RAC: 0
Message 901 - Posted: 5 Dec 2012, 21:02:24 UTC

In the last 12 hours or so, both of my computers have had about 8 units that ended up with very short run times, and very small upload files.
Some, at last, have had confirmation from a second run, and have started on a 3rd.

The tasks list status is Completed, validation inconclusive.

Bad batch, or an area outside of a galaxy?

Profile Kevin
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar
Send message
Joined: 27 Jul 12
Posts: 507
Credit: 14,550,253
RAC: 2,913
Message 902 - Posted: 5 Dec 2012, 21:46:41 UTC - in response to Message 901.

Edge of a galaxy
____________
Regards
Kevin
-----
International Centre for Radio Astronomy Research

Les Bayliss
Send message
Joined: 9 Aug 12
Posts: 30
Credit: 834,550
RAC: 0
Message 904 - Posted: 5 Dec 2012, 23:08:09 UTC

Thanks.

Senilix
Send message
Joined: 9 Aug 12
Posts: 2
Credit: 265,163
RAC: 0
Message 905 - Posted: 6 Dec 2012, 1:14:57 UTC - in response to Message 902.
Last modified: 6 Dec 2012, 1:20:07 UTC

Edge of a galaxy


Hmmm, I'm not convinced. I too encountered 2 of these short runners. For example, this one consisted of 16 pixels all crunched within seconds by my computer. The weird part is that some of my wingmen are reporting much longer runtimes per pixel (according to their stderr output file).

As me and Les are running Windows XP x86, this might hint to an issue related to that specific app ...

Profile Kevin
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar
Send message
Joined: 27 Jul 12
Posts: 507
Credit: 14,550,253
RAC: 2,913
Message 906 - Posted: 6 Dec 2012, 1:21:19 UTC - in response to Message 905.

I'll investigate further
____________
Regards
Kevin
-----
International Centre for Radio Astronomy Research

Profile Kevin
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar
Send message
Joined: 27 Jul 12
Posts: 507
Credit: 14,550,253
RAC: 2,913
Message 907 - Posted: 6 Dec 2012, 3:19:19 UTC - in response to Message 906.

Found one in NGC3049f - Area469078

Now checking it
____________
Regards
Kevin
-----
International Centre for Radio Astronomy Research

Profile Kevin
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar
Send message
Joined: 27 Jul 12
Posts: 507
Credit: 14,550,253
RAC: 2,913
Message 908 - Posted: 6 Dec 2012, 7:57:41 UTC

Looks like some bad data in NGC3049f
____________
Regards
Kevin
-----
International Centre for Radio Astronomy Research

w1hue
Send message
Joined: 16 Aug 12
Posts: 18
Credit: 2,323,024
RAC: 5,023
Message 910 - Posted: 7 Dec 2012, 22:23:55 UTC - in response to Message 908.

One of my two machines just downloaded a whole bunch of very short runs. Should I abort the remaining ones or just let them run and see what happens?

w1hue
Send message
Joined: 16 Aug 12
Posts: 18
Credit: 2,323,024
RAC: 5,023
Message 914 - Posted: 8 Dec 2012, 23:34:26 UTC - in response to Message 910.

Apparently no one reads the boards on weekends... Four of the short WUs that downloaded yesterday have completed and validated successfully so I guess they are OK.

Les Bayliss
Send message
Joined: 9 Aug 12
Posts: 30
Credit: 834,550
RAC: 0
Message 915 - Posted: 9 Dec 2012, 21:41:13 UTC

I had some more short runs too.
2 have validated, giving 159.30 credits for 0.00 seconds run time.
On the other hand, I've got some with 150.45 credits for 10,655.19 seconds run time, 141.60 credits for 10,107.25 seconds, and 185.85 credits for 13,212.64 seconds.

It's nice getting "paid" for not doing anything (dole ? :) ) but some sanity checking seems to be in order here.

Senilix
Send message
Joined: 9 Aug 12
Posts: 2
Credit: 265,163
RAC: 0
Message 917 - Posted: 10 Dec 2012, 0:20:29 UTC - in response to Message 915.

I checked some of your results and they looked okay to me.

The reported CPU time 0.00 sec for your WUs is way off of course - due to the fact that the app is using a wrapper program starting a new process each time a new pixel has to be processed and the communication of the process's runtime to the BOINC client seems to be broken under some circumstances). But your stderr output files are showing that the application used the usual amount of CPU time for each pixel.

The credit given for each WU depends solely on the number of pixels the WU is build of - it does not depend on the WU's runtime. The WUs consist of different number of pixels (varying from 1 to ~19), so the amount of credit is different for WUs with a different number of pixels.

Profile Kevin
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar
Send message
Joined: 27 Jul 12
Posts: 507
Credit: 14,550,253
RAC: 2,913
Message 919 - Posted: 10 Dec 2012, 2:24:13 UTC

It's a chicken and egg problem.

Until we do the SED calculations on them we don't know if they are good of bad pixels. Occasionally some will slip though that we can't process as they SED will not converge. So if you get one - it's quick and you get the credit for the pixel
____________
Regards
Kevin
-----
International Centre for Radio Astronomy Research

Les Bayliss
Send message
Joined: 9 Aug 12
Posts: 30
Credit: 834,550
RAC: 0
Message 921 - Posted: 10 Dec 2012, 3:39:17 UTC

No worries then. :)

[PST]Howard
Send message
Joined: 15 Aug 12
Posts: 2
Credit: 305,228
RAC: 0
Message 942 - Posted: 15 Dec 2012, 16:21:52 UTC

All well n good, but so far have had 7 WUs marked as invalid in a quorum with these short running WUs so have suspended running project until this is sorted out.

Daniel Carrion
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 12
Posts: 159
Credit: 32,849,471
RAC: 21,930
Message 944 - Posted: 15 Dec 2012, 18:27:42 UTC - in response to Message 942.

Are the WUs you're getting invalid results for related to galaxy NGC4647f areas only or other ones as well?

kashi
Send message
Joined: 10 Aug 12
Posts: 44
Credit: 19,419,745
RAC: 6
Message 945 - Posted: 15 Dec 2012, 18:49:20 UTC
Last modified: 15 Dec 2012, 18:52:44 UTC

I haven't had any invalid recently yet but I currently have 12 tasks with status "Completed, validation inconclusive". All 12 are NGC4647f.

Wingmen all have fast completion times; 6X Linux 3.2.0-4-686-pae, 4X Windows XP x86, 1X Windows Vista x86, 1X Windows 7 x86. So all 32-bit?

Les Bayliss
Send message
Joined: 9 Aug 12
Posts: 30
Credit: 834,550
RAC: 0
Message 946 - Posted: 15 Dec 2012, 20:16:20 UTC

I've been getting a variety lately for that galaxy.

At least 1 for:
NGC4647f_area584383_1
Server state Over
Outcome Success
Client state Done

but
Validate state Invalid


Several for:
e.g. here
Validate state Valid


Several for:
e.g. here
Validate state Checked, but no consensus yet


These are on 2 Q6600 computers running Windows XP Pro, 32 bits. One with BOINC 6.2.18, the other with BOINC 6.10.18

I've been getting a lot of the short runs too, including just now when I was checking the BOINC version number. (Not sure which galaxy for these.)

[PST]Howard
Send message
Joined: 15 Aug 12
Posts: 2
Credit: 305,228
RAC: 0
Message 947 - Posted: 15 Dec 2012, 20:46:18 UTC - in response to Message 945.

They all are

..:: Thor ::..
Send message
Joined: 23 Nov 12
Posts: 1
Credit: 13,794,493
RAC: 0
Message 948 - Posted: 15 Dec 2012, 21:36:06 UTC

how is it possible to have that low computation time??

Daniel Carrion
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 12
Posts: 159
Credit: 32,849,471
RAC: 21,930
Message 949 - Posted: 16 Dec 2012, 0:35:47 UTC - in response to Message 948.
Last modified: 16 Dec 2012, 1:01:01 UTC

Good pickup re 32 bit.

I managed to catch a NGC4647f running on a 64bit host of mine (Linux). I dragged it over to a 32bit windows machine I have and here's the result:

AREA: NGC4647f_area591632

Windows 32-bit:

C:\Temp\slot>fit_sed_windows_intelx86.exe_310 1 filters.dat observations.dat 1 19 n_flux = 4 No model library at this galaxy redshift...


Linux 64-bit (the actual run):

1 19 n_flux = 4 z= 5.000000000000000E-003 optilib=starformhist_cb07_z0.0000.lbr irlib=infrared_dce08_z0.0000.lbr At this redshift: purely stellar... SDSSu purely stellar... SDSSg purely stellar... SDSSr purely stellar... SDSSi purely stellar... SDSSz Reading SFH library... Reading IR dust emission library... Starting fit....... 25% done... 12476 / 49904 opt. models 50% done... 24952 / 49904 opt. models 75% done... 37428 / 49904 opt. models 100% done... 49904 opt. models - fit finished Number of random SFH models: 49904 Number of IR dust emission models: 50000 Value of df: 0.150000005960464 Total number of models: 668790423 ptot= 78640059.4422103 chi2_optical= 2.00846699361787 chi2_infrared= 0.000000000000000E+000


So I thought I'd try both 32 and 64 bit version of the client on my Linux 64 bit host:

32-bit result:

daniel@snm-boi01:/tmp/4# ./fit_sed_i686-pc-linux-gnu 1 filters.dat observations.dat 1 19 n_flux = 4 No model library at this galaxy redshift...


64-bit result:

daniel@snm-boi01:/tmp/4# ./fit_sed_x86_64-pc-linux-gnu 1 filters.dat observations.dat 1 19 n_flux = 4 z= 5.00000000000000010E-003 optilib=starformhist_cb07_z0.0000.lbr irlib=infrared_dce08_z0.0000.lbr At this redshift: purely stellar... SDSSu purely stellar... SDSSg purely stellar... SDSSr purely stellar... SDSSi purely stellar... SDSSz Reading SFH library... Reading IR dust emission library... Starting fit.......


Can anyone else replicate something like this?

So possible issue with 32 bit client reading from those library (.lbr) files? Maybe those library files were generated on 64 bit and something can't be read re these area pixels. Wild assumptions here at the moment due to lack of knowledge regarding this so nothing I'm saying here should be taken as 'fact'.

I wouldn't stop 32 bit clients because of this. I have been watching my 32 bit windows client and it has been processing other areas just fine.

1 · 2 · Next

Message boards : Number crunching : Very short runs


Main page · Your account · Message boards


Copyright © 2017 The International Centre for Radio Astronomy Research