OBS performs *worse* on a better GPU

Nimble

Member
I've lately been utilizing OBS Virtual camera, and was hoping to do so at 4K60.

Up until now I've been using a GTX 1080, but I quickly realized that this wasn't quite powerful enough to run OBS with a 4K60 base canvas and Virtual Camera getting pulled by 3 applications at once, so I decided I'd upgrade to an RTX 2080 Ti. Before I did, I got with a friend who already had an RTX 2080 Ti to verify that it could handle what I wanted, and it could. With a 4K60 canvas and Virtual Camera running, his 3D usage was below 30%. However, once I got my hands on an RTX 2080 Ti I'm observing much worse performance, especially when compared to what I observed on his system, and even worse than my GTX 1080.... What?

RTX 2080 TI Task Manager.png


GTX 1080 Task Manager.png


As you can see the GTX 1080 is under a 69% load while the RTX 2080 Ti is under a 82% load when running an instance of OBS with identical settings. To choose which GPU I'm using I plug my main display into the GPU I'd like to use, and then I restart my PC. Windows sets the primary GPU via whichever one is running the primary display at boot.

I thought the issue might be that there are 2 GPUs or something, so I completely disabled the GTX 1080, but no dice... It's also worth noting that both GPUs are in x16 slots, I'm running them in a system with a AMD Threadripper 1950X. Next I thought that maybe the 2080 Ti I got is defective, but when I run a basic benchmark (UserBenchMark) outside of OBS it shows the 2080 Ti performing 70% better than my GTX 1080.

I've attached two log files, the first one is with the GTX 1080 as the primary GPU, and the second one is with the RTX 2080 Ti as the primary GPU. I'm completely baffled, any idea how this could be happening?
 

Attachments

  • 2021-04-30 11-33-52.txt
    56.8 KB · Views: 18
  • 2021-04-30 11-37-53.txt
    60.7 KB · Views: 24

R1CH

Forum Admin
Developer
Are you actually noticing any problems or are you just worried about the high % use?
 

Nimble

Member
Are you actually noticing any problems or are you just worried about the high % use?
As soon as I start pulling the Virtual Camera into different programs the load hits 100% and I start dropping frames / whole PC starts to stutter.

I just double checked with my friend doing a basic test, blank OBS canvas @ 4K60 with Virtual Camera running and his 3D usage is at 20%, when I run the same test I'm at 50%. He does have PCIe 4.0, but that shouldn't make a difference since the 2080 Ti is a PCIe 3.0 device.
 

Nimble

Member
Here's one of the FFmpeg commands I'm running that pulls the Virtual Camera:
Code:
ffmpeg -y -loglevel warning -stats `
-guess_layout_max 0 -thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -video_size 3840x2160 -framerate 60 `
-pixel_format nv12 -i video="OBS Virtual Camera":audio="ADAT (7+8) (RME Digiface USB)" `
-guess_layout_max 0 -thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i audio="ADAT (31+32) (RME Digiface USB)" `
-guess_layout_max 0 -thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i audio="ADAT (27+28) (RME Digiface USB)" `
-guess_layout_max 0 -thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i audio="ADAT (5+6) (RME Digiface USB)" `
-guess_layout_max 0 -thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i audio="ADAT (15+16) (RME Digiface USB)" `
-guess_layout_max 0 -thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i audio="ADAT (1+2) (RME Digiface USB)" `
-guess_layout_max 0 -thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i audio="ADAT (3+4) (RME Digiface USB)" `
-guess_layout_max 0 -thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i audio="ADAT (21+22) (RME Digiface USB)" `
-guess_layout_max 0 -thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i audio="ADAT (23+24) (RME Digiface USB)" `
-guess_layout_max 0 -thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i audio="ADAT (25+26) (RME Digiface USB)" `
-guess_layout_max 0 -thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i audio="ADAT (13+14) (RME Digiface USB)" `
-map 0 -map 1 -map 2 -map 3 -map 4 -map 5 -map 6 -map 7 -map 8 -map 9 -map 10
-c:v h264_nvenc -preset: p1 -pix_fmt nv12 -r 60 -rc-lookahead 120 -strict_gop 1 -flags +cgop -g 120 -forced-idr 1 `
-sc_threshold 0 -force_key_frames "expr:gte(t,n_forced*2)" -b:v 288M -minrate 288M -maxrate 288M -bufsize 288M -c:a mp3 -ac 2 `
-ar 44100 -b:a 320K -vsync 1 -max_muxing_queue_size 9999 `
-f segment -segment_time 2 -segment_wrap 6000 -segment_list "C:\Users\gabri\Videos\FFmpeg\Segments\Master.m3u8" `
-segment_list_size 6000 -reset_timestamps 1 -segment_format_options max_delay=0 `
"C:\Users\gabri\Videos\FFmpeg\Segments\Master%01d.ts"

The end goal is to privately stream to YouTube at 4K60 for permanent HQ archive, stream to Twitch at 1080p60 for others to watch, and record locally in segments to emulate OBS's replay Buffer. The above command is the local recording command, I have two others for the streaming, with the Twitch one downscaling the 4K60 feed to 1080p60. One of the handful of reasons I'm taking this approach as opposed to encoding with OBS directly is the unlocked audio channel count. But I also couldn't figure out a way to do 2 streams, 1 with a different resolution, and a local recording with just OBS. Another nice thing about FFmpeg segmentation as opposed to OBS's replay buffer is I can constantly keep the last 3 hours recorded at a time, but pull any length of video from that buffer by generating a segment playlist, it's extremely versatile.

But even when running just this one command I'm peaking up to 100% usage and dropping frames, with my GTX 1080 I was able to do one pull of the Virtual Camera using this same command without peaking up to 100%. But I couldn't get the other two commands to run at the same time, as the load was at a constant 100%. This is when I looked into upgrading the GPU, and I thought I did my due diligence before hand by testing on my friends PC.
 

Nimble

Member
Additional steps taken:
  1. Physically removed the GTX 1080 from the system so only the 2080 Ti was present
  2. Completely removed Nvidia drivers with DDU in safe mode and re-installed them (GRD v466.27)
  3. Uninstalled OBS & all OBS plugins, restarted, and re-installed OBS
  4. Downloaded and ran 3DMark's PCIe bandwidth test
No change after these steps, here's the 3DMark results, seems to checkout:
1619822333922.png


Guess I'm going to have to try a full system restore, though I'd rather not. Something has to be wrong, when I run OBS @ 4K60 with Virtual Camera running on my X299 (Intel 7900X) system equipped with an RTX 3090 the 3D usage is at 25%...
 

R1CH

Forum Admin
Developer
Use GPU-z to check the frequencies your GPU is running at during heavy load. 80% usage of 200 MHz is normal if the card has downclocked to save power, but under load it should increase frequencies to handle it.
 

TryHD

Member
Set in the controll panel of nvidia obs.exe to perfer max performance, that should reduce the load because virtual cam is nearly no load so the card will clock high
 

Nimble

Member
Set in the controll panel of nvidia obs.exe to perfer max performance, that should reduce the load because virtual cam is nearly no load so the card will clock high
I already had "Prefer maximum performance" set in the global settings, when looking at OBS specifically in Nvidia Control panel it had adopted the global setting.

Use GPU-z to check the frequencies your GPU is running at during heavy load. 80% usage of 200 MHz is normal if the card has downclocked to save power, but under load it should increase frequencies to handle it.
Looks to be clocking up just fine:
1619825996654.png


Went ahead and ran 3DMark's Time Spy benchmark on each GPU respectively:
GTX 1080 - https://www.3dmark.com/3dm/61199462
RTX 2080 Ti - https://www.3dmark.com/spy/20016540

As one would expect, the 2080 Ti outperformed the GTX 1080 by 68%. I can't seem to get the 2080 Ti to perform worse than the GTX 1080 in any application except OBS.
 

R1CH

Forum Admin
Developer
That's very strange. Have you forced any of the vsync options in the nvidia control panel?
 

Nimble

Member
That's very strange. Have you forced any of the vsync options in the nvidia control panel?
After almost 20 straight hours of troubleshooting I’ve almost certainly found that it’s actually a defect of either my CPU or Motherboard.

After completely restoring Windows on my capture PC which had no affect on performance, I took the the 2080 TI out of my capture system and put it in my desktop. The reason I didn’t do this up front is because everything I have is water cooled, and it’s a big pain to shuffle things around. Nevertheless, it performed as you’d expect with about 10% usage from OBS with a base canvas of 4K60, no scaling, no sources, and virtual camera running. On my capture PC running the same test I’m at 50% GPU usage from OBS.

This whole time I thought that the 50%+ usage from my GTX 1080 was indicative of its regular performance in relation to OBS at 4K, but after running the same test on a 3rd PC I have equipped with a GTX 1050 Ti and observing 15% usage, it became apparent that I was facing a hardware issue. So much for needing to upgrade to an RTX 2080 TI…

I removed all other PCIe device, which changed nothing. But when I removed every stick of ram except one the usage was down to 20%. Still high compared to my other PCs, but significantly better than before. Moved on to test with different ram sticks entirely and found that certain capacities weren’t recognized in specific slots.

Could be a problem with the CPU, but my intuition is telling me it’s the motherboard. I ordered a replacement of an identical board (Asrock X399 Taichi) brand new, if it doesn’t work I can return it and move to replace the CPU. What I find crazy is that the computer ran well enough for me not to notice these things until I really started pushing it, I would expect the PC not to run at all TBH.

Anyways I’ll post an update when I replace the motherboard, fingers crossed.
 

BardiBard

Member
Maybe, if available, try updating the mainboards BIOS first and see if that changes anything. (don't forget to reactivate the RAM profile after the update)
 

Nimble

Member
Maybe, if available, try updating the mainboards BIOS first and see if that changes anything. (don't forget to reactivate the RAM profile after the update)
That was one of the first things I did, I was already in the latest version, but I did it any way.

Unfortunately didn’t change anything.
 

Nimble

Member
I watched as two birds aligned perfectly, threw a rock, but somehow only managed hit one of them.

After replacing my motherboard with an identical but new motherboard (Asrock Taichi X399), I am faced the same issue; even though my old motherboard was absolutely defective. I made sure of this by testing it again with two separate CPUs and two sets of RAM just to make sure after the initial swap with the new one, some of the RAM slots are completely unresponsive on the old board. With the new board all of the slots work, but it didn't change anything in regards to OBS's performance. I honestly couldn't believe it, because it seemed to be RAM related. Like I was saying earlier if I remove every stick of RAM but one, my performance drastically increases, and this continues to be true not only after replacing the motherboard with a brand new one but also the CPU & RAM too. In-fact just to make sure I covered 100% of my bases I replaced the SSD I was using with an M.2 drive I had on hand, which changed nothing.

At this point I'm fairly confident I'm looking at a bug of some kind, or maybe just a platform limitation? I've tested every component in multiple PCs, and even replaced several with brand new counterparts. The only action that's net me any additional performance is switching my RAM configuration from quad-channel to dual-channel or single-channel. I get the best performance with a dual-channel configuration, or in other words, with two sticks of RAM inserted. This isn't the case with my Intel based system, which performs fine in a quad-channel configuration. But it is the case in my other X399 system, which is running the same processor (1950X) as my capture PC and the Asrock Taichi X399M, which is the smaller variant of the same line of motherboards. Just like in my capture PC, my performance, or rather 3D usage is more than cut in half by switching from a quad-channel configuration to a dual-channel configuration when running OBS. I tested with 3DMarks PCIe bandwidth test, Time Spy, and other benchmarks to see if my performance was drastically different with quad vs dual channel and it wasn't. As far as I can tell this seems to be limited to OBS on the X399 platform, or at least with 1st generation Threadrippers.

For now I'm able to get good enough performance in my capture pc by using two sticks of 16GB RAM for a total of 32GB, which was the same total amount of RAM I had previously. But I was looking to upgrade to 64GB due to the massive amount of RAM my FFmpeg commands were using, which poses a problem. Due to the drastic decrease in performance with quad-channel memory, I'm limited to a total of 32GB of ram. Unless it's possible for me to run two sets of dual channel memory somehow, my motherboard has 8 slots after all, but I'm not sure of the implications there. I've always been under the impression that quad-channel outperformed dual-channel, I've never heard of it hurting performance.

Any ideas on what how I should proceed? @R1CH
 

R1CH

Forum Admin
Developer
AMD boards are very picky about which slots are used for memory. Check the board manual and it should tell you exactly which DIMM slots to populate for single / dual / quad channel.
 

Nimble

Member
Looks like putting all four DIMs on the same side of the CPU socket results in the ability to run 4 sticks in dual channel m
AMD boards are very picky about which slots are used for memory. Check the board manual and it should tell you exactly which DIMM slots to populate for single / dual / quad channel.
Thanks for the reply,

I referenced the manual for how to install the memory while running all my tests, I verified if the memory was running in single, dual, or quad channel via the BIOS before boot, and with CPU-Z post boot.

Unless it's possible for me to run two sets of dual channel memory somehow, my motherboard has 8 slots after all, but I'm not sure of the implications there.
Looks like this actually does work, if I put all 4 sticks on the same side of the CPU socket I can run 4 DIMs in dual-channel (in this case each stick is 8GB):
Mem new.jpg

1620274117949.png


Though from my understanding quad-channel would be preferable in every other application...
 

Nimble

Member
Looks like this actually does work, if I put all 4 sticks on the same side of the CPU socket I can run 4 DIMs in dual-channel (in this case each stick is 8GB)
Just to clarify, this also shows the same performance boost / decreased 3D usage as running two sticks in dual-channel. But as soon as I'm running the DIMs in quad channel my 3D usage is doubled.

Here's a direct comparison using the same 4 sticks of RAM on the same side of the socket (two sets of dual-channel) vs two sticks on each side of the socket (quad-channel):
Dual new.png

quad new.png


No other hardware or software changes other than repositioning the RAM, same PC, same 4 sticks of RAM. In OBS I'm just displaying a messy mosaic of my 3 4K60 capture cards to emulate the worst case scenario.
 

Attachments

  • Dual new.png
    Dual new.png
    830.1 KB · Views: 46

R1CH

Forum Admin
Developer
That's really strange. I don't know enough about the TR platform to speculate on why this might be happening, super weird.
 

Nimble

Member
That's really strange. I don't know enough about the TR platform to speculate on why this might be happening, super weird.
Spent the last few hours searching for any answers as to why a quad-channel configuration would perform better than a dual-channel configuration in regards to the X399 platform and I came up short.

Is it not possible that this is a bug or unoptimized code within OBS?
 
Top