Question 1

Which is better for AI inference: A40 or GH200?

Accepted Answer

GH200 has 48 GB more VRAM, making it better suited for large models and long context windows. For compute-bound workloads like training, GH200 delivers 6.6× higher FP16 throughput. At $0.44/hr vs $2.29/hr, A40 is the more cost-efficient choice for inference.

Question 2

How much VRAM does the A40 have compared to the GH200?

Accepted Answer

The A40 has 48 GB of VRAM (GDDR6), while the GH200 has 96 GB (HBM3).

Question 3

Which GPU is cheaper to rent in the cloud, the A40 or GH200?

Accepted Answer

The cheapest on-demand price for the A40 is $0.44/hr, while the GH200 starts at $2.29/hr. A40 is the more affordable option.

	A40	GH200
VRAM	48 GB	96 GB
VRAM Type	GDDR6	HBM3
Memory Bandwidth	0.7 TB/s	4.0 TB/s
FP16 Performance	150 TFLOPS	990 TFLOPS
Manufacturer	NVIDIA	NVIDIA
FP8 Support	No	Yes
FP4 Support	No	No

	A40	GH200
$/hr (cheapest)	$0.44✓ best	$2.29
$/TFLOP (compute value)	$0.0029	$0.0023✓ best
$/GB VRAM (memory value)	$0.0092✓ best	$0.0239

A40 vs GH200

Verdict

Specifications

Price / Performance

Cloud Pricing

A40

GH200

Model Compatibility

A40 (1000 models)

GH200 (1000 models)

You might also compare…