Windows 10 High Non-Paged Memory Usage (2020)

Estimated reading time: 5 minutes.

Since I don't currently possess any accounts on the StackExchange network with enough reputation to actually contribute this answer directly, I figured I'd post here and maybe if anyone else was stuck and desperate just enough to end up on the 40th page of Google where this will land past all of the spammy "DRIVER FIX" websites, maybe it will help someone. This is just going to be some dense information and no screenshots. Sorry.

The Problem

I run a Windows 10 Pro computer at home to use as a on-premises server. It runs Hyper-V with Windows Server 2019 as well as VMware Player hosting a few copies of Debian. It serves as the DHCP/DNS server for my network, File Server, Backup Server, as well as hosts Home Assistant and Node-RED for home automation. ... Which is why it was particularly annoying when from time to time the entire server would slow to such a crawl it was no longer responsive—it also meant I couldn't turn my lights on and off anymore!

I managed to catch the server one time it did this while it was almost falling over, but still vaguely usable. While I really wanted to simply restart it and get my lights working again, I decided today was as good a time as any to solve this once and for all. I started working on some simple diagnostics. The first thing I took a look at was the performance graph in Windows Task Manager. There were vaguely high levels of resource usage everywhere, but "vaguely high" does not overcommitted make, so the two things that really stood out were the disk performance graph and memory usage. The server was using 16GB/16GB of available memory, and the system disk was seeing 100% of the available transfer rate being used. I decided to ignore the disk transfer for the time being as it was likely that the increased transfer was due to the memory pressure forcing the system to swap to disk excessively. (This can be confirmed with Windows's Resource Monitor—under the "Disk" tab the "Disk Activity" section lists which specific files are seeing the disk activity and in this case it would be pagefile.sys.)

The Diagnotics & Solution

The first step then was to figure out what was using all of the available RAM. To rule out the virtual machines that were running. I stopped them all to no real effect. Looking at the process list the biggest individual consumer of memory was Firefox at a whopping 90MB. The sum total of all running processes came nowhere near 16GB. Then something caught my eye—the non-paged pool was almost 3GB! Generally this should be under a gigabyte. The non-paged pool is the RAM consumed by the kernel and device drivers which is not eligible to be swapped out to disk. The reason it cannot be swapped out is it may need to be accessed in response to a hardware interrupt during which the system cannot generate a page fault to load the memory back in from disk. This isn't essential to solving the problem, however so don't worry if this all went woosh.

Usually if the kernel is using that much RAM it's an issue of some driver with a memory leak. I'll gloss over the next steps because they've already been well documented elsewhere. See, for example, this SuperUser answer. Essentially, we need to dive into where the kernel's memory is going. You'll need some extra tools for this which are included in the Windows Driver Kit. You can ignore the dependency on Visual Studio (it does not need to be installed) as it is only required if you intend to actually build drivers. We only need some of the ancillary tools.

Running the poolmon tool included in the WDK and sorting by memory usage (by pressing b) we can start to see the biggest consumers of memory usage in the kernel. Sorting by the difference between allocations and frees (i.e., processes which are roughly the biggest on-going net consumer of memory; press d) shows more or less the same list. This is where my experience starts to diverge from most of the issues people present online.

The biggest consumer of non-paged memory in my kernel was not a driver, but instead "Proc". Some process was consuming all the memory but not showing in Task Manager or Resource Monitor! The largest growth was ObCi. Looking at the pool tag list included in the WDK we find this is in fact representing the kernel's own tracking for objects being created in the kernel.

While this doesn't directly answer the question for us, it did give me an idea how we might locate the process which the kernel seems to be allocating all this memory to. If the kernel is creating a ton of new objects on a largely idle system, then something must be creating them and that something probably needs to be able to access the objects it's creating.

In order to actually access kernel objects, you need a handle and those handles are process specific so Windows is able to report back how many handles any given process has open. Back to where we started in Windows Task Manager now, we right click on the column headers at the top and choose "Select Columns". Adding the columns for "Handles" and then sorting descending we find...

synergyd.exe has over 800,000 handles open.

For some context here, the next closest process was at 45,000. A handful sat around 2,000-3,000. The vast majority of processes had under a thousand handles.

Killing the Synergy server immediately resulted in memory using dropping to around 5GB and the non-paged pool size dropping under a gigabyte.

The Conclusion

Looks like we found our culprit. The synergy service apparently is leaking kernel object handles—allocating them somewhere and never freeing them. While I'm not versed enough in Windows' kernel architecture or API to diagnose or explain if, why or how the leaking handles related to the huge non-paged memory usage (as handles are supposed to be pageable), it was certainly a sign that something was wrong with that process and was easy enough to confirm by just stopping it.

Thankfully, I didn't need synergy anymore as I'd actually ended up moving the server out from under my desk when a recent heat wave hit and my smart home sensors were reporting the end of the rec room with my desk as being 3-4°C warmer than the other end. It was accessed exclusively over RDP now. Uninstalling synergy resolved all my problems. (If you're unable to solve it that way, the simplest option may be to simply set up a scheduled task to run something like taskkill /im synergyd.exe on a regular basis to forcefully restart it and release all of its memory.)