What's new
  • Please do not post any links until you have 3 posts as they will automatically be rejected to prevent SPAM. Many words are also blocked due to being used in SPAM Messages. Thanks!

Help NANS detected on gpu

Alwaysrun

Well-known member
Joined
Sep 3, 2008
Messages
784
Location
Qualicum Beach BC
Been getting this error a few times since I installed the SMP console client and it's a pain because it pauses for 24 hours saying my EUE limit has been reached. Any ideas how to fix this? It seems that if I shut down the client and restart it sometimes it starts ok and sometimes it gives me the same error. I put my 260 back on stock settings but I still get these random unstable machines. :help:

[01:53:03] Entering M.D.
[01:53:09] Working on Protein
[01:53:10] Client config found, loading data.
[01:53:10] Starting GUI Server
[01:53:10] mdrun_gpu returned
[01:53:10] NANs detected on GPU
[01:53:10]
[01:53:10] [email protected] Core Shutdown: UNSTABLE_MACHINE
[01:53:13] CoreStatus = 7A (122)
[01:53:13] Sending work to server
[01:53:13] Project: 5768 (Run 9, Clone 54, Gen 145)
[01:53:13] - Error: Could not get length of results file work/wuresults_04.dat
[01:53:13] - Error: Could not read unit 04 file. Removing from queue.
[01:53:13] - Preparing to get new work unit...
 

LCB001

Well-known member
Joined
Feb 19, 2008
Messages
1,732
Location
Aylmer QC.
Try deleting the work folder, queue.dat, unitinfo and FahCore_11 for the GPU. Reboot and restart the client it should download replacments. Sometimes it get in a loop and keeps trying the same WU. If that don't work you might have to reinstall...
 

Alwaysrun

Well-known member
Joined
Sep 3, 2008
Messages
784
Location
Qualicum Beach BC
ATM charlie I have both my CPU and GPU running stock, temps at load are excellent for both. CPU 44C GPU 68C

I'm using 181.20 from December but I see they came out with 181.22 on the 22nd of January. Maybe update those? But strange this just started happening since I started using the SMP client.

LCB I'll do as you suggested and see if a sticky wicket is gumming up the works here. btw I haven't got around yet to lowering the CPU usage yet in the SMP client. I dunno if that may be causing some issues, but I did unlock the cores as you suggested earlier and I noticed the GPU works a bit faster now it can utilize those unused clock cycles. (seems it was losing the fight for CPU power against the SMP hogging it before)

Thanks gents.
 

chrisk

Folding Captain
Joined
Jul 12, 2008
Messages
7,540
Location
GTA, Ontario
I am wondering if its a machine ID issue....Make sure that in the advanced tabs for the gpu and the cpu clients, that you have different machine IDs selected (ie. GPU ID set to 1, CPU set to 3, etc) or they can conflict with each other. Do that first, and if the numbers were the same, change them, and then delete the files as stated by LCB001
 

LCB001

Well-known member
Joined
Feb 19, 2008
Messages
1,732
Location
Aylmer QC.
ATM charlie I have both my CPU and GPU running stock, temps at load are excellent for both. CPU 44C GPU 68C

I'm using 181.20 from December but I see they came out with 181.22 on the 22nd of January. Maybe update those? But strange this just started happening since I started using the SMP client.

LCB I'll do as you suggested and see if a sticky wicket is gumming up the works here. btw I haven't got around yet to lowering the CPU usage yet in the SMP client. I dunno if that may be causing some issues, but I did unlock the cores as you suggested earlier and I noticed the GPU works a bit faster now it can utilize those unused clock cycles. (seems it was losing the fight for CPU power against the SMP hogging it before)

Thanks gents.
Did you up the priority of the GPU client, that will eliminate the fight for CPU cycles...
 

Alwaysrun

Well-known member
Joined
Sep 3, 2008
Messages
784
Location
Qualicum Beach BC
I am wondering if its a machine ID issue...
SMP is ID 1 and GPU is ID 2

Did you up the priority of the GPU client, that will eliminate the fight for CPU cycles...
Yes LCB the Core priority is set to "slightly higher" in the GPU client, or are you talking about the slider? it's at 100%.

I'm just going to uninstall and reinstall the gpu client I guess. In the morning I'm going to go through all the SMP advanced options and actually set the Core usage to 98% like you suggested LCB.

Hope it helps, having to babysit this is a pain.
 

LCB001

Well-known member
Joined
Feb 19, 2008
Messages
1,732
Location
Aylmer QC.
SMP is ID 1 and GPU is ID 2



Yes LCB the Core priority is set to "slightly higher" in the GPU client, or are you talking about the slider? it's at 100%.

I'm just going to uninstall and reinstall the gpu client I guess. In the morning I'm going to go through all the SMP advanced options and actually set the Core usage to 98% like you suggested LCB.

Hope it helps, having to babysit this is a pain.
Sometimes it takes deleting those files several times to get rid of a bad WU, if thats what's causing this. A fresh reinstall will usually fix it though. You might want to check if it's trying to redo the same WU each time, if it is it will usually clear up after a few deleting cycles.

Folding wasn't always as stable as it is now, when stanford starts fiddling with core revisions and client changes it can get really annoying and cause problems for days until you figure out how to stabilize your system...that's part of the FUN, just ask 3.0charlie, sswilson and some of the other Oldtimers...:haha:
 

Alwaysrun

Well-known member
Joined
Sep 3, 2008
Messages
784
Location
Qualicum Beach BC
fun...heh. Well I unistalled the gpu client and reinstalled last night before I went to bed. Seems it did two WUs then it took a crap again and stayed idle all throughout the night. sucks losing 6 hours of downtime to this NANs whatever problem. I've shut off off the SMP console and will try folding today just with the GPU and see if I can isolate this problem. I'll make special note of which project is causing this if there is just one.

*Edit: Well after a lengthy read over at the stanford folding forums it appears that many people are getting this error and it's been happening for the last 3 weeks. Seems to be a few specific projects in the 57xx range. Nvidia cards are getting these and driver versions and OS used dosn't seem to be the problem as users with different setups are experiencing this same error. People have scrubbed their work folder and other files, also complete registry cleaning and reinstalling the client does not fix this problem. My initial thought that my new install of SMP was the culprit has been disproved as many people without SMP are getting this error as well.

I'm suspecting certain project servers are issuing bad WUs repeatedly. Vijay made an announcement about this issue and they are hard at work to resolve this. They have a thread on the official forums to report these errors and I found people with my exact setup getting this error so I didn't bother adding mine to the list.
Folding Forum • View topic - 57xx - NV GPUs failing all the time @ all projects

I just don't know what to do. Babysitting the client every completion is impossible so I guess I'll just have to bare with it until the Pande group figures this out.
 
Last edited:
Top