Chosen Solution
Dear all, It is my first time writing on this forum. I bought a pre assembled desktop pc to which I added a second hand graphic card. The relevant specs are: CPU: intel 8700K, GPU: msi armor gtx 1080ti, RAM: 16 GB, PSU: 650W gold rated. The GPU drivers are the latest from NVIDIA 416.81. I was not pleased with the performance or noise of the stock cooler of the GPU, so I attached a Arctic accelero 4 cooler on it. Unfortunately I did not realize mine is an extended PCB (have not dealt with desktops in a long time) and is not supported by the cooler. The die cooler is properly pasted and working, but the back-plate supposed to cool the other components could not be fastened properly. So I made an half hack by adding some heath sinks to the exposed VRM and VRAM chips under the cooler. The computer works normally and can play light games, but when I try something heavy, like the new tomb rider or the Witcher 3, it shuts down after a while, order of 10 minutes. It does not produce graphical artifacts or performance dips, solid 60 fps at 4K, the PC just suddenly reboots. The core temps look very ok: 50 C for the GPU and something 60 for the CPU, fans are blowing. How to I know f it is the VRAM overheating and not something else like the power supplier? Did I damage something permanently or I can save it reinstalling the old cooler? Note that I run some CUDA codeand I can allocate up to 99% of the free memory (so around 10.4 GB removing what is used by the display), and write/read to it with a bandwidth of around 8.5 GB/s, so the VRAM seems to still be there. I am more of a software than hardware dude, so any help to analyze or solve the situation will be very much appreciated! Cheers!
EDIT: int the event log, the failure is reported as this critical error:
Log Name: System
Source: Microsoft-Windows-Kernel-Power
Date: 18/11/2018 21:45:08
Event ID: 41
Task Category: (63)
Level: Critical
Keywords: (70368744177664),(2)
User: SYSTEM
Computer: DESKTOP-TRR31LG
Description:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
Event Xml:
<Event xmlns=“http://schemas.microsoft.com/win/2004/08...>
under load its either the heat of your system (the system temp console can show you all of this) or a failing componant (short, RAM, something not plugged in a socket correctly) if it boots and runs then chances are that the PSU is fine (depends how it sounds when in use along with temp). with the limited infdo avaliable, i would say take it all apart (what you are comfortable doing) and chech everything is seated correctly and ensure that air is getting to places that need it.