As the leading GPU manufacturer for AI and machine learning, NVIDIA has recently unveiled the DGX Spark, a compact AI supercomputer designed specifically for researchers. They have sent review units to several prominent tech reviewers, and NVIDIA’s CEO Jensen Huang personally delivered several units to Elon Musk and OpenAI’s leadership team.
This device has a price tag of $4,000. Customers can buy it directly from NVIDIA or through major retailers like Amazon and Best Buy.
DGX Spark is a small computer, roughly the size of a Mac Mini. There is a refreshingly different texture in front of the device. Following is the specifications of the DGX Spark:
Architecture and Performance #
Hardware Specifications #
- Architecture: Arm64
- CPU: 20 cores
- 10x Cortex-X925 (performance cores)
- 10x Cortex-A725 (efficiency cores)
- RAM: 128GB LPDDR5x
- Storage: 4TB
GPU Specifications #
- Model: NVIDIA GB10 (Blackwell architecture)
- Compute Capability: sm_121 (12.1)
- Memory: 128 GB
- Multi-processor Count: 48 streaming multiprocessors
- Architecture: Blackwell
So basically DGX Spark has a unified memory of 128GB, which can be used by both CPU and GPU. Itβs designed for both training and running models. The GB10 GPU delivers up to 1 PFLOP of sparse FP4 tensor performance. This would place its AI capability roughly between that of an RTX 5070 and 5070 Ti.
The unified memory is offering up to 273 GB/s of memory bandwidth. This limited bandwidth is a bottleneck for AI inference performance, but 128GB of memory enables running models that are too large to fit into typical GPU memory.
Connectivity #
DGX Spark has a variety of connectivity options: Four USB-C ports (with the leftmost supporting up to 240W of power delivery), an HDMI port, a 10 GbE RJ-45 Ethernet port, and two QSFP ports driven by NVIDIA ConnectX-7 NIC capable of up to 200 Gbps. These QSFP ports can be used to connect multiple DGX Spark units together for distributed training.
Four things that make DGX Spark does better #
1. Running more stuff locally #
Let’s say you have a desktop beast with two high-end GPUs like an RTX 4090, each with 24GB of VRAM. That’s a total of 48GB of VRAM. Now, if you’re working with a massive model that requires, say, 60GB of memory, you’re out of luck because your GPUs can’t handle it. But with the DGX Spark’s 128GB of unified memory, it can virtually accommodate models that need up to 128GB of memory.
While a desktop with two RTX 4090s might be faster for inference on smaller models, the DGX Spark shines when it comes to handling multiple large models simultaneously. For instance, if you’re running several instances of a 70B parameter model, the DGX Spark can manage this with ease, whereas a typical desktop setup would struggle.
2. Image generation #
The DGX Spark is particularly well-suited for image generation tasks. Open source image generation models, like ComfyUI, require substantial memory to store intermediate data during the generation process. The DGX Spark’s 128GB of unified memory allows it to handle these tasks more efficiently than a standard desktop setup with limited GPU memory.
3. Training #
Training is an LLM is giving the ability to think the way you want it to think. Giving it your data and tailoring it to your specific use case. Training a model requires a lot of VRAM. The DGX Spark’s unified memory architecture allows it to handle larger datasets and more complex models during the training process.
4. FP4 support #
With AI models, you can quantize them and make them smaller so they are easier to run on smaller devices. If you are running a model at FP16 precision, you need a lot of memory, but you are getting the best quality possible. You can quantize the model to FP8 or FP4, which will make the model smaller and faster, but you will lose some quality.
DGX Spark’s is optimized for running models at FP4 precision very efficiently. NVIDIA claims that this device can run models at FP4 models very close to the quality of FP8 models. NVIDIA is even giving a tutorial on how to quantize “DeepSeek-R1-Distill-Llama-8B” model to FP4.
This makes DGX Spark a great device for “Speculative Decoding”, a technique that speeds up text generation by using a small, fast model to draft several tokens ahead, then having the larger model quickly verify or adjust them. This way, the big model doesn’t need to predict every token step-by-step, reducing latency while keeping output quality. (Source: Speculative Decoding). To do that you essentially run two models at the same time, one small and one large. DGX Spark’s 128GB of unified memory allows it to handle both models simultaneously without running out of memory.
NVIDIA is going to introduce cheaper variants of DGX Spark with OEM partners in the future.
This mini AI supercomputer is not a replacement for high-end desktops with multiple GPUs, but it is direct competition for the new generation of mini PCs like Apple Mac Studio, Beelink GTR9 Pro, which has a new AMD AI chip. This Beelink device also has 128GB of unified memory, but they don’t have the powerfull Blackwell GPU that DGX Spark has. The cost of Beelink GTR9 Pro is around $2,000, but as everyone knows, NVIDIA is way ahead of the game on AI and machine learning. NVIDIA has a strong software ecosystem and developer support, which makes DGX Spark a more attractive option for researchers and developers working on AI projects.
NVIDIA DGX Spark is the option you want if you want things to work and you don’t want to spend so much time getting things set up and troubleshooting. NVIDIA has a lot of work into making sure that everything works out of the box.
Who is this for? #
Personally, this doesn’t feel like a super computer to me. It’s more like a mini supercomputer. With the price tag of $4,000, I would need more inference speed. If you are a consumer looking for a device to run AI models locally, I would recommend waiting for the next generation of GPUs like RTX 60 series or AMD’s next-gen AI chips. But if you are a researcher or developer who needs to run large models and do training or fine-tuning, DGX Spark is a great option.