Posts by marmot
log in
1) Message boards : Number crunching : Maximum CPU % for graphics - what does it do? (Message 5289)
Posted 22 days ago by marmot
Some projects, Rosetta for example, have an application that works like a screen saver, that switch allows a user to limit the amount of CPU time wasted on such applications. The irony is that the graphics app often contains static parts which burn susceptible screens instead...


I noticed Rosetta screen saver was doing quite a bit.
Wonder if it is enough to account for the RAC loss between two identical computer builds.

Going to disable BOINC screen savers.

Thanks for the hint.
2) Message boards : Number crunching : Computation Errors (Message 5288)
Posted 22 days ago by marmot
Started processing POG files on 27 Aug. Was going great until 16 Sept when I started getting Comutation errors on all of my machines I believe. I have always run Malwarebytes, but It may have updates to a newer version sometime in the. Here is what a failed file says.

Name J112449.6+235642_area44746008_1
Workunit 44680302
Created 25 Sep 2017, 16:31:21 UTC
Sent 25 Sep 2017, 18:39:40 UTC
Report deadline 3 Oct 2017, 18:39:40 UTC
Received 25 Sep 2017, 20:15:30 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -185 (0xffffffffffffff47) ERR_RESULT_START
Computer ID 816854
Run time
CPU time
Validate state Invalid
Credit 0.00
Device peak FLOPS 4.78 GFLOPS
Application version fitsedwrapper v4.02

Any ideas?

Mike.


All machines failing would be systematic issue such:
1) as a batch of bad data on the server side
2) a short power outage that reset all your machines but corrupted the WU's (assuming your machines can reboot and go back to working state with a chance you didn't notice and assuming they don't have battery backups)
3) a malware infection spread across your local network to all your machines
4) a common OS update that all machines received
5) a security software update on all machines that heuristically now believes the WU's are malware.

My bets would be 1) or 2).
3) Message boards : Number crunching : Tasks not showing up (Message 5277)
Posted 18 Sep 2017 by marmot
My account says I have 28 tasks in progress, but they do not appear in my BOINC Manager task list.

My other two projects are showing and running fine. Apparently, POGS is working on my computer, just not making themselves known.

Any help?

Steve Gaber
Oldsmar, FL



Tasks that crash for some reason on the client side, and never get reported, will show as in progress on the server until they time out.
I've rarely seen this happen on a system when BOINC.exe crashes and needs to be restarted and all the running WU's get corrupted or, very rarely, they run to completion and have no reporting ability since BOINC.exe isn't in RAM to receive results.

Check and see if the fitsedwrapper work unit exists in some of the BOINC data\slots folders.
Check your task manager to see if they are in RAM.
4) Message boards : Number crunching : Computation Errors (Message 5276)
Posted 18 Sep 2017 by marmot
You probably don't want to invest any more time, but that issue seems like something Malwarebytes coders should address from a bug report.


The probably wouldn't have a clue as to what I was talking about. . .


The coders would.
The support staff that answer the phone, probably not.

Use their bug tracking system.
5) Message boards : Number crunching : Some WU's are not advancing past 6.666% after 18 to 24 hours. (Message 5273)
Posted 17 Sep 2017 by marmot
Swapped stick per stick from one bank to the other, reseating them firmly, and the watts drawn from the first bank under full load dropped from 26 to a steady 22 which is the same as the other bank.

Hopefully the problem was only one of the sticks was slightly dislodged in shipping.


You can add to the benefits of running POGS apps as a RAM diagnostic utility.

That machine fully completed hundreds of WU from about 8 other projects over the last month, although I did notice about a 10% performance decrease from the sister machine's WU RAC in some projects.

Surprisingly, this machine with RAM issues actually outperformed the other machine on TN-Grid so a hardware issue wasn't on my RADAR.
6) Message boards : Number crunching : Computation Errors (Message 5272)
Posted 17 Sep 2017 by marmot
Try whitelisting the entire BOINC data directory, not just the slots.


Tried that first -- didn't help. POGS is no longer running on that machine anyway.


You probably don't want to invest any more time, but that issue seems like something Malwarebytes coders should address from a bug report.
7) Message boards : Number crunching : Computation Errors (Message 5270)
Posted 16 Sep 2017 by marmot
I decided that to solve the problem I would have to dump either Malwaebytes or POGS. I have chosed to dump POGS on this machine. It is still running on two other machines without problems (and those machines are also running Malwarebytes...).


Try whitelisting the entire BOINC data directory, not just the slots.
8) Message boards : Number crunching : Some WU's are not advancing past 6.666% after 18 to 24 hours. (Message 5269)
Posted 16 Sep 2017 by marmot
OK, the issue is hardware. Event viewer showed a corrected memory error every few seconds and the system process was using 9% of total cycles.

Tracking the temps and watts, one of the RAM banks is drawing more than expected power and setting the banks into low power mode has the errors stopping and the WU's are moving.

Will have to swap out pieces until finding the dying stick of RAM.

This system has no DVD so booting a diags DVD not happening.
I'll have to setup a tool kit on bootable USB.

Oh yeah... more hours of work to add to the backlog!
9) Message boards : Number crunching : Some WU's are not advancing past 6.666% after 18 to 24 hours. (Message 5267)
Posted 15 Sep 2017 by marmot
Some of the WU freeze at 13.333% or a few at 20.000%, but they never continue their progress after 24 hours.
I aborted one that had over 1m CPU seconds.

It's happening to about 30% of the batch of WU d/led about Sept 12th.

This machine has never run pogs before but it's identical sister machine completed a couple days of WU without issues (the WU's that show aborted on that machine were clearing the cache for priority work from another project). This machine has successfully been working on other projects for a month now.

Are the WU's actually progressing but the progress bar isn't accurately showing the advance, are the WU's corrupted?




Main page · Your account · Message boards


Copyright © 2017 The International Centre for Radio Astronomy Research