M.2 for external gpu

This post was created with help from AI. Intended for inspiration only and not as a comprehensive guide. Do your own research!

The motherboard you have is probably enough

I didn’t expect this to work as well as it does. Connecting an external GPU through an M.2 slot sounds like a hack, but it actually gives you better performance than using most motherboard PCIe slots. Here’s the thing that surprised me: my M.2 slot provides PCIe 3.0 x4 bandwidth, while most motherboard expansion slots only give you x1. That’s literally four times the bandwidth.

Photo of an M.2 eGPU setup sitting on top of a computer case

Hardware

The adapters and cables ran me about $100 not counting taxes, shipping, or the PSU.

Photo of an oculink wire running out of the back of a computer case

Janky oculink wire running out the back of the case since I was too lazy to get a base plate.

Getting It Working Is… Involved

You plug everything in, power it up, and get a black screen. That’s normal. The computer detects the external GPU but doesn’t know how to boot with it.

Start by changing the following BIOS settings:

  • Enable CSM (Compatibility Support Module) for legacy boot support.
  • Disable Secure Boot by clearing the platform keys. You can turn this back on later, there should be an option for reloading the original keys. You can also save your keys to USB and restore them that way too.
  • Enable “Above 4G Decoding” for modern GPU support.
  • Disable ASPM (Active State Power Management) for the CPU PCIe controller. eGPUs don’t like power management features that can drop the connection.
  • Disable CPU PCIe ASPM Mode Control – this is crucial for stability.
  • Disable SR-IOV Support because it causes conflicts.
  • Set M.2 Link Mode to Gen 3 (or whatever your hardware supports).

Here’s something important about oculink: it doesn’t support hot-plugging like Thunderbolt. Everything has to be connected and powered on before you boot. Power the eGPU dock first, then the computer. No exceptions.

Multi-GPU AI Workloads

Once I got it working, I wanted to test both GPUs for AI stuff. Both GPUs would show VRAM usage when loading a model, but only one would actually process during inference. That seems wrong – if weights are distributed across both GPUs, shouldn’t both be processing?

The issue is usually suboptimal tensor splitting. With llama-server, you can try different configurations:

llama-server -m model.gguf -ngl 99 -sm row --tensor-split 0.5,0.5

If your model fits on one GPU, single-GPU setups often perform better. There’s no inter-GPU communication overhead, faster memory access, and simpler execution. Multi-GPU is mainly useful when the model doesn’t fit on a single card.

GPU Numbering and Gaming

Windows assigned my external GPU as GPU 0 and my main GPU as GPU 1. Some games default to GPU 0, which means they’d use the less powerful external card. You have to manually assign games to use the high-performance GPU through Windows Graphics Settings or your GPU’s control panel.

My motherboard doesn’t have integrated graphics, so there’s no easy way to change which GPU gets which number. The eGPU gets detected first, probably because it’s on a CPU-direct PCIe lane through the M.2 slot.

Monitoring and Debugging

I found Task Manager’s GPU monitoring not that helpful for AI workloads. Even when llama-bench shows good performance, Task Manager shows no GPU activity. Use nvidia-smi -l 1 for real-time monitoring instead.

For testing different configurations, llama-bench is perfect:

# Test single GPU
set CUDA_VISIBLE_DEVICES=0
llama-bench -m model.gguf -ngl 99

# Test multi-GPU
set CUDA_VISIBLE_DEVICES=0,1
llama-bench -m model.gguf -ngl 99 -sm row --tensor-split 0.5,0.5

Trade-offs

These BIOS changes do have downsides. Disabling Secure Boot reduces protection against boot-level malware. Disabling ASPM increases power consumption slightly. CSM adds some boot time overhead.

But the performance you get from a proper eGPU setup makes it worth it. The M.2 to oculink connection provides solid bandwidth, and you can add serious GPU power to systems that couldn’t otherwise fit large graphics cards.

This isn’t plug-and-play. You need to be comfortable with BIOS settings, troubleshooting boot issues, and accepting some security trade-offs. But if you put in the time to get it right, you end up with something that performs way better than you’d expect from what looks like a janky adapter setup.