M.2 for external gpu
This post was created with help from AI. Intended for inspiration only and not as a comprehensive guide. Do your own research!
The motherboard you have is probably enough
I didn’t expect this to work as well as it does. Connecting an external GPU through an M.2 slot sounds like a hack, but it actually gives you better performance than using most motherboard PCIe slots. Here’s the thing that surprised me: my M.2 slot provides PCIe 3.0 x4 bandwidth, while most motherboard expansion slots only give you x1. That’s literally four times the bandwidth.
Hardware
- F9G-BK7 EGPU OCuLink GPU Dock PCIe4.0 X4 This is the dock that actually holds your GPU. The F9G series is pretty reliable, though not cheap. See also: ADT-link
- ONE XPLAYER OCuLink Cable Don’t cheap out here. Better brands have better shielding for longer cables and also allow faster protocols PCI-E 4.0 versus 3.0.
- chenyang Oculink SFF-8612 to PCI-E 4.0 NVME M.2 M-Key Host Adapter This converts your M.2 slot to oculink. Supports all the standard M.2 sizes (2230/2242/2260/2280). It gets poor reviews on amazon but worked for me ¯\(ツ)/¯
- Separate power supply for the external GPU. The connection doesn’t carry power.
The adapters and cables ran me about $100 not counting taxes, shipping, or the PSU.
Janky oculink wire running out the back of the case since I was too lazy to get a base plate.
Getting It Working Is… Involved
You plug everything in, power it up, and get a black screen. That’s normal. The computer detects the external GPU but doesn’t know how to boot with it.
Start by changing the following BIOS settings:
- Enable CSM (Compatibility Support Module) for legacy boot support.
- Disable Secure Boot by clearing the platform keys. You can turn this back on later, there should be an option for reloading the original keys. You can also save your keys to USB and restore them that way too.
- Enable “Above 4G Decoding” for modern GPU support.
- Disable ASPM (Active State Power Management) for the CPU PCIe controller. eGPUs don’t like power management features that can drop the connection.
- Disable CPU PCIe ASPM Mode Control – this is crucial for stability.
- Disable SR-IOV Support because it causes conflicts.
- Set M.2 Link Mode to Gen 3 (or whatever your hardware supports).
Here’s something important about oculink: it doesn’t support hot-plugging like Thunderbolt. Everything has to be connected and powered on before you boot. Power the eGPU dock first, then the computer. No exceptions.
Multi-GPU AI Workloads
Once I got it working, I wanted to test both GPUs for AI stuff. Both GPUs would show VRAM usage when loading a model, but only one would actually process during inference. That seems wrong – if weights are distributed across both GPUs, shouldn’t both be processing?
The issue is usually suboptimal tensor splitting. With llama-server, you can try different configurations:
llama-server -m model.gguf -ngl 99 -sm row --tensor-split 0.5,0.5
If your model fits on one GPU, single-GPU setups often perform better. There’s no inter-GPU communication overhead, faster memory access, and simpler execution. Multi-GPU is mainly useful when the model doesn’t fit on a single card.
GPU Numbering and Gaming
Windows assigned my external GPU as GPU 0 and my main GPU as GPU 1. Some games default to GPU 0, which means they’d use the less powerful external card. You have to manually assign games to use the high-performance GPU through Windows Graphics Settings or your GPU’s control panel.
My motherboard doesn’t have integrated graphics, so there’s no easy way to change which GPU gets which number. The eGPU gets detected first, probably because it’s on a CPU-direct PCIe lane through the M.2 slot.
Monitoring and Debugging
I found Task Manager’s GPU monitoring not that helpful for AI workloads. Even when llama-bench shows good performance, Task Manager shows no GPU activity. Use nvidia-smi -l 1
for real-time monitoring instead.
For testing different configurations, llama-bench is perfect:
# Test single GPU set CUDA_VISIBLE_DEVICES=0 llama-bench -m model.gguf -ngl 99
# Test multi-GPU set CUDA_VISIBLE_DEVICES=0,1 llama-bench -m model.gguf -ngl 99 -sm row --tensor-split 0.5,0.5
Trade-offs
These BIOS changes do have downsides. Disabling Secure Boot reduces protection against boot-level malware. Disabling ASPM increases power consumption slightly. CSM adds some boot time overhead.
But the performance you get from a proper eGPU setup makes it worth it. The M.2 to oculink connection provides solid bandwidth, and you can add serious GPU power to systems that couldn’t otherwise fit large graphics cards.
This isn’t plug-and-play. You need to be comfortable with BIOS settings, troubleshooting boot issues, and accepting some security trade-offs. But if you put in the time to get it right, you end up with something that performs way better than you’d expect from what looks like a janky adapter setup.