Which GPU to Use for Deep Learning

2014-12-24

Selecting a GPU is much more complicated than selecting a computer. Since we are going to use CUDA for deep learning, only NVIDIA GPUs will be considered. You have to know many jagons, like PCI-E x.0, Architecture (Fermi (GF110), Kepler(GK104,GK110), Maxwell, and the coming Pascal), product line (Tesla, GeForce, Quadro…this page gives a clear explanation of different target of those cards, i.e.: GeForce for Entertainment, Quadro for professional graphics, Tesla for High Performance Computing (used in workstation)), and you may also need to change a PSU (power supply unit) if needed.

For GPU comparison for deep learning, you may get some useful information from this post. Alex claimed that, GTX 580 is better than GTX 680, even though with less CUDA cores (see the explanation on Stackoverflow referred in the post).

The reason may lies in the answer of my question from ZygmuntZ:

“What I know is that 600 and 700 series are tuned more for gaming than scientific computation, so personally I’d prefer 580 to 680. 580 were the best back in the day and now that they got cheaper they are probably the best value for money.”

This is confirmed by another post(I think a must read for choosing a GPU, ), which states:

“Another important factor to consider however, is that the Fermi architecture (400 and 500 series) is quite a bit faster than the newer Kepler architecture (600 and 700 series)”

The above post further illustrates the importance of memory bandwidth, and estimating of GPU memory required. So he recommends to rank GPU by GB/s rate.

However, Alex’s CudaConvnet2 package has done some optimization for Kepler cards, compared to previous version which is more suitable to run on a Fermi card. So I am not sure whether this statement would still be true or not.

But Eric Battenberg suggests GTX780, which is the cheapest GK110 GPU, in this post.

“780 is 40% faster for us than 580. ~1 T flops with torch 7 convolutions. 2 T flops linear layers “

“The performance of 680s on SGEMM is slightly better than a 580. 780s are significantly better: even though their double precision performance is crippled, GK110 based chips (Titan, GTX780, GTX780Ti, Tesla K20, Tesla K40, Quadro K6000, etc.) have access to 255 registers per thread (unlike the GK104 based GTX680, GTX770, Tesla K10 that only access 63 registers per thread.) Even though the register file is the same size per core between GK104 and GK110, workloads like SGEMM that really need more state per thread do best on GK110 based chips – so I recommend you get a GK110 based card. GK110 also has more cores & more memory channels than GK104.

The cheapest GK110 based card would be the GTX780, so that’s what I’d recommend. But even if you end up with a GK104, it’d be fine for neural network stuff (even though it’s true for some workloads a 680 is slower than a 580).”

GTX titan serials have much better double float computing performance than GTX 780/780Ti, but it is said in Eric Battenberg’s post that train deep learning model does not require double float computing.

GTX 780Ti is much more powerful than GTX 780 (look for specs in the wiki table), with only a little bit more expensive than GTX 780, so sounds like GTX 780Ti is the best w.r.t performance/price ratio.

Therefore, if you have limited budget, GTX 580 might be the best choice. If budget is not the thing you need to concern, you may go for GTX Titan Z (about 3000$), which is the most powerful cards, or Tesla K40 (4000+$) for workstations. If you are somewhere between, GTX 780Ti might be a good choice.

FYI, NVIDIA has a card donation program for researchers, which are all high end cards, including Tesla K40 and Titan Black. You need to write a research statement for your application and get notified in a two weeks basis.

Which GPU to Use for Deep Learning

Blog

Opinion

Project