Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

If you like a puzzle: lockups with a new GPU

  • 30-01-2023 7:52pm
    #1
    Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭


    I've upgraded to an RTX 4090, and I'm experiencing random system freezes ever since.

    It's not entirely random however. It only seems to happen when I have both and the CPU and GPU under full load, and then do something other action, e.g. new browser tab.

    When it freezes, it's a complete freeze, with the last image still displayed. If I press the reset button, it does reset, albeit after a delay of about 5 seconds, which is an odd detail.

    It's not temps, the GPU only hits 40C under full load, as it has water blocks on both sides (VRAM hits 52C). It's not the main RAM, that passes Memtest86+ fine. I've already tried the usual things, reset BIOS to defaults, force PCIe slot to 3.0 instead of auto, run sfc scan on Windows. Nothing is captured in the logs.

    My suspicion is that it's the transient power spikes that the PSU can't handle. If I leave it running on full load on both GPU and CPU it's completely fine. The lockups only happen during active use which would prompt a spike. But the PSU is a Corsair AX1200i with three separate 8 pin PCIe cables to it. I would have thought it capable of absorbing the spikes. It's a half decade old but has never shown an issue before, including running an SLI'd pair of GTX 1080s, at 180W each.

    Am I wrong in my thinking? Or does anyone have suggestions on anything to try?



Comments

  • Registered Users, Registered Users 2 Posts: 7,476 ✭✭✭The Continental Op


    PSU is the first thing I'd think of. It might be the opposite of your logic? New app requiring a boost of power the PSU can't cope with?

    Wake me up when it's all over.



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    That is my thinking though, opening a new application/browser tab that triggers GPU acceleration causes a transient power spike that the PSU can't accommodate.



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    I've found something that's worth testing. The PSU has an option to switch between single rail and multi rail modes. The latter potentially limiting the power available, and it was set on multirail. I've since switched it to single rail, so time to wait and see if it freezes again.



  • Registered Users, Registered Users 2 Posts: 113 ✭✭cornholio509


    I think back to first principes would be the way to go . Is it drivers/software since you can benchmark it and it works fine . I would download DDU and go into safe mode in windows and completely wipe any GPU driver from the system . reboot and test it out again . it is entirely possible that the gtx 1080 drivers have something left behind that is causing a lock up issue .

    Windows itself could also be an issue . I have noticed the odd system lockup on my PC . My issue though has been different as the PC itself isnt under any heavy load . Usually it would happen switching between tabs in the browser while instaling and downloading games . Its entirely possible i havde more background processes going on that i want aware off . However no hardware or software crash logs were available for whatever happened . no matter how hard i try i cant replicate it . SO just for my own sanity i have been using Ccleaner after installing anything incase windows just hung up on a corrupted reg entry .

    Hard to say its the PSU as its overkill for a single 4090 . The GPU transients load draw shouldnt even be reaching the PSU's max wattage . never mind surpass tHe PSU's max transient load ratings which i think is between 10-15 % for 50ms ontop what the PSU's wattage it is rated for . That said there could be a fault in it . It may have seen enough power spikes and brownouts that the PSU has gotten degraded over time and is out of spec . Rarely happens but it does happen . SO i would try everythign else first b4 resorting to the PSU as a fault .



  • Registered Users, Registered Users 2 Posts: 6,782 ✭✭✭Damien360


    That looks like a power supply issue to me. In the industry I work in, we use modded pc power supplies for some components and loss of 5V just below mainboard specifications causes a lock up. You may be drawing just enough power to pull the 5V supply just below the system minimum spec (typical 5.1v for my equipment). I would change the PS and given its age, despite its good history, it may be worth it anyway.



  • Advertisement
  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    To clarify one part, the previous GPU was a 3090 (which didn't show this either), the SLI 1080s were before that, but together they pulled more watts than the 3090.

    So far it's been 90 minutes of trying to provoke into occurring. I have BOINC maxing out the CPU and GPU, with multiple videos playing in multiple browsers, and no sign of the issue.

    From what I could find, the multi rail cap is 40A, with the GPU pulling 25A, so a spike could hit the 40A cap instead. So rather than the PSU not being able to handle the spikes, it may have been the spike hitting the multirail cap.

    It was easily reproduced in prior attempts, so it'll take a few days to be sure.



  • Registered Users, Registered Users 2 Posts: 113 ✭✭cornholio509


    It is possible that is the issue . Then again the power requirements for a 4090 are insane and compared to the 3090 the transients are worse . That said nvidia recomendations for the PSU are basically for the founders cards . From what i can gather aibs like gigabyte , asus and msi require 1000w-12000w . Founders can be run on 850w atx 2.x powersupplies . Then again it varies as some of the AIBs say you need ATX 3.x PSU's

    It is entirely possible the 40A cap was been hit or even surpassed for more the 50ms . Again i am not sure if your GPU is pulling that power at any point . I do know yyour PSu should have a com cable and can be monitored via the corsair link software . WHat might be a good idea to do is run furmark and put a serious load on the GPU and see if it pull more than 480w on the gpu alone . If it is its then the 40A limit is been exceeded enough to cause you problem .



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    It runs Furmark just fine, with total PSU power draw hitting 580W, which would seem to correspond to about 480W for the GPU. I've tried both in single and multirail mode. It can certainly pull the normal sustained total power just fine.

    But so far after 4 hours of heavy load and all sorts of random activity, not a single freeze so far.



  • Registered Users, Registered Users 2 Posts: 7,476 ✭✭✭The Continental Op


    It may not be consistent load thats the issue. Obviously it can get to 580W but the problem may be if it needs to go from 300 to 500W in an instant.

    Wake me up when it's all over.



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    No, consistent load never triggered it. Gaming or high load programs alone couldn't set it off. It's only the combination of those along with other normal usage that did so, and hasn't done so since moving it to single rail after 5 hours of usage.



  • Advertisement
  • Registered Users, Registered Users 2 Posts: 113 ✭✭cornholio509


    @Spear Has it done it since ?

    Might be worth putting the CPU under a load then run fur mark or some other GPU heavy load if it hasnt just to rule it out . As you said it could be the limitation on the multi rail mode . It would be interesting to see if the GPU could possibly trip that in single rail mode . It seems its an issue since the 3000 series with nvidia that the intial ramp up causes the issue . If it does trip it then that answers the PSY v2.x versus V3.x for GPUs from now on .



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    Nothing since then. Two days plus of my normal use pattern, and nothing so far. The 3090 never triggered it in multirail mode, but a 4090 can spike a good bit more than that can. I might try testing it a bit more over the weekend in multirail mode to see if it can be recreated.



  • Moderators, Computer Games Moderators Posts: 14,723 Mod ✭✭✭✭Dcully


    Glad its looking positive Spear, nothing worse than issues with new gear.



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    Well, I've spent the weekend with it on multirail mode, and the same usage pattern as before, and nothing. Not a single freeze.

    So I've no idea what that was then at this point. Nothing else has changed, and yet the freezes went away the moment it was switched to single rail.

    At least it's gone though, confounding as it is.



  • Registered Users, Registered Users 2 Posts: 7,476 ✭✭✭The Continental Op


    So now it works fine on both multirail and single rail mode?

    Wake me up when it's all over.



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    Yes, I can't recreate it at all in either mode.



  • Registered Users, Registered Users 2 Posts: 7,476 ✭✭✭The Continental Op


    Any taking apart and reseating occur that might account for it?

    Wake me up when it's all over.



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    No, nothing was taken apart, or reseated at all. And no obvious software changes either.



  • Registered Users, Registered Users 2 Posts: 7,476 ✭✭✭The Continental Op


    It could it just be that moving the switch reseated the contacts?

    Wake me up when it's all over.



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    There's no phsyical switch, it's an entirely digital controlled PSU. It's just a select box in the iCUE software.



  • Advertisement
  • Registered Users, Registered Users 2 Posts: 113 ✭✭cornholio509


    @Spear

    Now i could be clutching at straws here . As i said in my first post it is more than likely software . I forgot that google has partnered with nvidia for local AI upscaling youtube videos locally on NVIDIA RTX GPU's . That was supposed to happen end of january this year . There was an update in january for the google chrome browser and there was also another update last friday . Looking back on the issue i had youtbe on the chrome browser was running . By any chance did you tab into a youtbe video at the time of the crashes . There was an update friday or saturday on the chrome browser . SO it might have been google from the begining with a patch that wasnt stable . Firday upsdate probably fixed it .

    Again i could be clutching at straws here . It could also be that whatever we were doing just hardlocked windows itself during a system scan from windows defender or windows installing an update .



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    I wouldn't rule that out entirely. Some of the tabs would have been Youtube, but in Firefox. With Chrome it would have been Twitch running. So that's accelerated video in both cases. I know Firefox did update between the freezes, but froze with both versions running. Chrome updates silently, so I can't be sure it didn't update, or update around the time I switched the rail mode. Windows update did update midway through the freezes too, but again, it froze before and after.



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    And the freezes are back, worse than before. They were gone for 4 weeks, and started again yesterday morning. I can't find any pattern or logic to them. My next step is the somewhat vague approach of just reseating the RAM, and maybe the CPU.



  • Registered Users, Registered Users 2 Posts: 31,218 ✭✭✭✭Lumen


    I know it's difficult to say definitely, but doubt it's GPU power spikes, given your beefy PSU and the testing you've been doing.

    I'm running a 4090 FE with a 12600k on a 750W PSU (came with the case, but Reddit says it's a "nothing special" or "total garbage" Silverstone SX750 Gold by CWT) and it's fine. Am also running the GPU on 3 power cables.



  • Registered Users, Registered Users 2 Posts: 7,476 ✭✭✭The Continental Op


    That made me wonder are all the connectors good? I've had factory crimped ones fail before. Sometimes you don't even notice it if you have black cables and black connectors. Its a long shot but check for anything that looks like overheating/burning on the GPU and motherboard power connectors and blocks.

    Wake me up when it's all over.



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    At this point, no, I don't think it's power related either. Given that it seems to have started just after installing the 4090, my guess it something was knocked/flexed slightly during installation and isn't making perfect contact. It'd be possibly random enough to freeze the machine, and wouldn't be linked to any power or software issues.



  • Registered Users, Registered Users 2 Posts: 765 ✭✭✭minitrue


    On that theory, would it have been colder overnight? Heat expansion (well the contraction) could be triggering some funny connection issue. Do you have another card to test/use in the meantime (or another machine to put the card in) as my inclination would be to send that one back (probably for warranty repair/replace rather than refund at this point).

    I always struggle to rule out the idea any sudden appearance of a problem isn't software related. If everything seems identical to the last problems (and you didn't do something like roll back drivers back then) it probably isn't the case here though but assuming you didn't change anything it might be worth trying to find a windows update log (and maybe nvidia have a log if it can autoupdate) just to check if something came in.



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    Some kind of thermal expansion could exacerbate a poor connection.

    I've been through the logs, and nothing is getting recorded at the time it happens, the lock is total and complete and prevents anything being logged. This also makes me think hardware issue.

    The last few times I've seen it happen, the CPU was under full load, yet nothing recent when only the GPU is under full load. Hence why I'm inclined to think a CPU or RAM contact issue. I'll break it down over the weekend and clean/reseat the bits. This is a fiddly task as it's full water cooling loop.



  • Registered Users, Registered Users 2 Posts: 6,390 ✭✭✭Cordell


    In theory a sudden drop in load can create a voltage spike or noise, so even if the PSU can sustain high loads it can still fail to provide adequate stable and clean voltages during a transient load. But the same thing can occur in the power circuits on the motherboard and GPU.



  • Advertisement
  • Registered Users, Registered Users 2 Posts: 18,816 ✭✭✭✭K.O.Kiki


    I have a 3080 FE with a known good PSU (Corsair SF750 Platinum) and I also am starting to get random lockups.

    The GPU will stop receiving power (i.e. logo will turn off) but will POST no issue. However, sometimes it will lock up again 1-2 times if I try to run a game. Haven't checked logs to see WTH is happening.



  • Registered Users, Registered Users 2 Posts: 31,218 ✭✭✭✭Lumen


    This thread is interesting. Have you looked at HAGS settings?

    Also, could simply be a defective GPU.



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    I found something last night, the screws on the CPU waterblock had loosened over time, presumably through years or vibrations and thermal cycles. I've retightened them, and no lockups in the last 16 hours. It wasn't enough to disrupt contact with the waterblock, so temps didn't reflect it. But was it enough to make a pin contact borderline enough? I thought I was right before with the rail setting, and that seemed to make perfect sense too, so I guess time will tell over the weekend.



  • Registered Users, Registered Users 2 Posts: 18,816 ✭✭✭✭K.O.Kiki


    Update on my issue too: I think it was a loose RTX power adapter - wasn't fully clicked into the 8-pin cables. No idea how I loosened it.



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    I reseated the CPU and RAM, but no difference.

    But I did also take the time to swap the GPU power cables with Corsairs own 12HVPWR cable. And then I found something. The third PCI-e power cable I had to add for the 4090 wasn't seated in completely. The little catch hadn't full engaged, and it was about 1mm short on one side. I had been paranoid about connecting the power cable into the GPU itself, having seen the various melted parts images that went round. But I wasn't so careful about the other end of it. Since I've fully connected that, not a single lockup.

    A dodgy connection would explain the symptoms. There was no pattern in software or usage, because it wasn't a software or usage related problem. It was probably being triggered any time I caused any slight vibration by moving my chair or the like.

    But this did also disappear for a few weeks before, so time will tell.



  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    It last two days, then locked up a few minutes ago.

    I just don't get it.



  • Advertisement
  • Moderators, Computer Games Moderators, Technology & Internet Moderators, Help & Feedback Category Moderators Posts: 25,757 CMod ✭✭✭✭Spear


    It turns out there's a worryingly vague issue described in recent nVidia drivers:

    [GeForce RTX 4090] [GeForce RTX 4080] [GeForce RTX 4070Ti] stability / TDR / black screen issues. Check for a motherboard BIOS update that states 'compatibility updates for Lovelace/4080/4090/4070Ti'. The motherboard update is in addition to any VBIOS update. Nvidia control panel setting 'Prefer Maximum Performance' may mitigate idle/monitor resume/crashing issues (workaround)

    Which leaves me to wonder if I was trying to fix something I could never solve. And there's no way an X299 board is getting an BIOS update at this stage for it.



Advertisement