M.2 for external gpu

This post was created with help from AI. It is intended for inspiration only and not as a comprehensive guide. Do your own research!

The motherboard you have might be enough

I wanted to add external GPU capability to my system and discovered something interesting: my M.2 slot provides PCIe 3.0 x4 bandwidth, while most motherboard expansion slots only offer x1. That’s four times the bandwidth from what initially seemed like an unconventional approach.

Photo of an M.2 eGPU setup sitting on top of a computer case

If you would like to do this yourself, you will want to verify whether your motherboard’s M.2 slot supports PCIe. You’ll also want to determine how many PCIe lanes are supported.

Hardware

The adapters and cables cost about $100 excluding taxes, shipping, the PSU, and GPU.

Photo of an oculink wire running out of the back of a computer case

The oculink cable exits through the back of the case since I ran it without a proper mounting plate.

BIOS

Initial boot resulted in a black screen – the system detected the external GPU but couldn’t boot with it. I made these BIOS changes:

  • Enabled CSM (Compatibility Support Module) for legacy boot support
  • Disabled Secure Boot by clearing platform keys (can be restored later)
  • Enabled “Above 4G Decoding” for modern GPU support
  • Disabled ASPM (Active State Power Management) for the CPU PCIe controller
  • Disabled CPU PCIe ASPM Mode Control for stability
  • Disabled SR-IOV Support to avoid conflicts
  • Set M.2 Link Mode to Gen 3

Oculink doesn’t support hot-plugging like Thunderbolt, so I power the eGPU dock first, then boot the computer with everything connected.

Multi-GPU AI performance

With both GPUs visible to the system, I tested AI workloads. Both showed VRAM usage when loading models, but only one processed during inference. The issue was tensor splitting configuration.

With llama-server, I experimented with different settings:

llama-server -m model.gguf -ngl 99 -sm row --tensor-split 0.5,0.5

For models that fit on a single GPU, single-GPU performance often exceeds multi-GPU due to reduced communication overhead and simpler execution paths.

GPU assignment in Windows

Windows assigned the external GPU as GPU 0 and the internal GPU as GPU 1. Some games default to GPU 0, so I manually assigned applications to the preferred GPU when needed.

Since my motherboard lacks integrated graphics, I couldn’t easily change the GPU numbering. The eGPU gets detected first, likely because it connects through a CPU-direct PCIe lane via the M.2 slot.

Monitoring tools

Task Manager’s GPU monitoring showed minimal activity even while running llama-bench. I found nvidia-smi -l 1 more reliable for real-time monitoring.

# Single GPU testing
set CUDA_VISIBLE_DEVICES=0
llama-bench -m model.gguf -ngl 99

# Multi-GPU testing  
set CUDA_VISIBLE_DEVICES=0,1
llama-bench -m model.gguf -ngl 99 -sm row --tensor-split 0.5,0.5

Trade-offs

The BIOS changes carry some downsides. Disabling Secure Boot reduces protection against boot-level malware. Disabling ASPM increases power consumption. CSM adds boot time overhead.

The performance gains justified these compromises in my case. The M.2 to oculink connection provides nice bandwidth, allowing usage of multiple GPUs in systems that otherwise couldn’t.

This approach requires comfort with BIOS configuration and boot troubleshooting and isn’t for the faint of heart.