It isn't for "running models." Inference workloads like that are faster on a mac studio, if that's the goal. Apple has faster memory.
These devices are for AI R&D. If you need to build models or fine tune them locally they're great.
That said, I run GPT-OSS 120B on mine and it's 'fine'. I spend some time waiting on it, but the fact that I can run such a large model locally at a "reasonable" speed is still kind of impressive to me.
It's REALLY fast for diffusion as well. If you're into image/video generation it's kind of awesome. All that compute really shines when for workloads that aren't memory speed bound.
With a 5070 Ti performance that's a weird choice for R&D as well. You won't be able to train models that require anywhere near 100GB VRAM due to slow processing, and 5070 Ti is under $1k
Yeah, that's mostly fair, but it kind of misses the point. This is a professional tool for AI R&D. Not something that strives to be the cheapest possible option for the homelab. It's fine to use them in the lab, but that's not who they built it for.
If I wanted to I could go on ebay, buy a bunch of parts, build my own system, install my own OS, compile a bunch of junk, tinker with config files for days, and then fire up an extra generator to cope with the 2-4x higher power requirements. For all that work I might save a couple of grand and will be able to actually do less with it. Or... I could just buy a GB10 device and turn it on.
It comes preconfigured to run headless and use the NVIDIA ecosystem. Mine has literally never had a monitor attached to it. NVIDIA has guides and playbooks, preconfigured docker containers, and documentation to get me up and developing in minutes to hours instead of days or weeks. If it breaks I just factory reset it. On top of that it has the added benefit of 200Gbe QSFP networking that would cost $1,500 on it's own. If I decide I need more oomph and want a cluster I just buy another one and connect them, then copy/paste the instructions from NVIDIA.
Not really, not it isn't, because it's deliberately gimped and doesn't support the same feature-set as the datacenter GPUs[1]. So as a professional development box to e.g. write CUDA kernels before you burn valuable B200 time it's completely useless. You're much better off getting an RTX 6000 or two, which is also gimped, but at least is much faster.
Fair enough if that's your use case. I have to be honest with you though, I've never written cuda code in my life and wouldn't know sm_121 from LMNOPO. :)
It does seem really shady that they'd claim it to be 5th gen tensor cores and then not support the full feature set. I searched through the spark forums, and as that poster said nobody is answering the question.
But how much is MY time worth? Every hour I spend fixing some goof up Jimmy in I.T. made or Googling obscure incompatibilities is another hour I could have been productive.
I guarantee you the $5 a month option is easier than what you're setting up on a DGX Spark. Which should make sense, because you can buy server hardware for cheaper in the long run.
Sorry, I misunderstood you. My comment was comparing the cost of DIY hardware with buying a Gb10 based system. I thought you'd meant I should pay someone $5/hr to build manage the hardware for me (presumably by outsourcing).
The cloud is fine, if you're OK with your workload being on someone else's computer. I'm not. Plus with my usage levels I would be in the red within 2 months when comparing costs.
Your point about server hardware might make sense for some folks. I haven't actually looked, because in my case I'm using it as a dev system that sits on my desk. Part of it's appeal is that it's small/quiet.
I COULD opt to just buy the hardware and get a real server though, since I run my spark headless. I just assumed the cost of colo'ing it someplace would rule that out. I haven't actually done the math though. Have you?