> Some will do rocm but I don’t know if the AMD iGPUs are guaranteed to support ...

> Some will do rocm but I don’t know if the AMD iGPUs are guaranteed to support it?

If you only care about inference, llama.cpp supports Vulkan on any iGPU with Vulkan drivers. On my laptop with crap bios that does not allow changing any video ram settings, reserved "vram" is 2GB, but llama.cpp-vulkan can access 16GB of "vram" (half of physical ram). 16GB vram is sufficient to run any model that has even remotely practical execution speed on my bottom-of-the-line ryzen 3 3250U (Picasso/Raven 2); you can always offload some layers to CPU to run even larger.

(on Debian stable) Vulkan support:

  apt install libvulkan1 mesa-vulkan-drivers vulkan-tools

Build deps for llama.cpp:

  apt install libshaderc-dev glslang-dev libvulkan-dev

Build llama.cpp with vulkan back-end:

  make clean (I added this, in case you previously built with a diff back-end)

  make LLAMA_VULKAN=1

If more than one GPU: When running, you have to set GGML_VK_VISIBLE_DEVICES to the indices of the devices you want e.g.,

  export GGML_VK_VISIBLE_DEVICES=0,1,2

The indices correspond to the device order in

  vulkaninfo --summary.

By default llama.cpp will only use the first device it finds.

llama.cpp-vulkan has worked really well, for me. But, per benchmarks from back when Vulkan support was first released, using the CUDA back-end was faster than the Vulkan back-end on NVIDIA GPUs. Probably same Rocm vs Vulkan on AMD too. But, zero non-free / binary blobs required for Vulkan, and Vulkan supports more devices (e.g., my iGPU is not supported by Rocm)-- haven't tried, but you can probably mix GPUs from diff manufacturers using Vulkan.