Yep, AI inference workload but probably slower for AI training (haven't looked into it myself). According to eBay (all CAD):
- A used RTX 8000 48GB (TU102 so 2nd gen card) is $3K CAD
- modded RTX 4090D 48GB is $4.7K CAD (same kind of card in GN video)
- Nvidia L20 48GB is $6.2K
- Tesla A40 48GB is $10K
- L40S 48GB is $12K
- Tesla V100 32GB is $1.1K (Nvidia stop supporting it just recently, Volta is 1.5 gen)
- Tesla M10 32GB is $200 (this is Maxwell so GTX 900 era)
That's why I was kicking myself for not seeing another user here selling their RTX 3090 for $800. The usual price in Vancouver is $900 to $1200. And you get 24GB, high bandwidth, and support SAGE Attention code to speed up Stable Diffusion and WAN video generation.
From my understanding, the memory is really there to load the bigger models. The memory bandwidth plays a role in the inference speed or tokens per second. So it could be slower than a RTX 3090 due to slower bandwidth but it won't be capped when you try to load a really big model. Once you run out of VRAM, it have to do memory swap between the GPU and system RAM which slows it down. Bigger model tends to give you better quality answers so having a card with 2x24GB helps but since data have to go thru the PCI-E bus (literally 2 GPU on 1 board), it may not be fast. At the end of the day, we need to see benchmarks. From my experience with Stable Diffusion, this card doesn't work because none of the apps support multiple GPU in a way that parallel the workload. You may be able to generate multiple images faster, but this card won't speed up generating a single image.
Not like a regular B770 won't work for local AI. You can still install Ollama or Stable Diffusion but you have to very careful in what kind of model you load. For Stable Diffusion, you end up picking a quantified version of the original model that fits in VRAM but gives you worst picture or close enough but slower. Full fat FLUX model and WAN video generation is like 20GB and that's not counting other stuff like text encoders that you need so it can translate prompts into image generation. Oh also, all the speed up tricks like SAGE Attention requires CUDA. Not sure if you can get away with ZLUDA (AMD implementation of CUDA) but Intel for sure doesn't have similar thing.
The 5070Ti Super looks very promising because it is rumored to have 24GB of memory and running the latest Nvidia tech. Right now 5070Ti is $1.2K CAD so if SUPER is only slightly more expensive, then the Intel B60 is moot.