Finding the ideal num_workers for Pytorch Dataloaders

One of the biggest bottlenecks in Deep Learning is loading data.  having fast drives and access to the data is important, especially if you are trying to saturate a GPU or multiple processors.  Pytorch has Dataloaders, which help you manage the task of getting the data into your model.  These can be fantastic to use, especially for large datasets as they are very powerful and can handle things such as shuffling of data, batching data, and even memory management.  Pytorches Dataloaders also work in parallel, so you can specify a number of “workers”, with parameter num_workers, to be loading your data.  Figuring out the correct num_workers can be difficult.  One thought is you can use the number of CPU cores you have available.  In many cases, this works well.  Sometimes it’s half that number, or one quarter that number.  There are a lot of factors such as what else the machine is doing, and the type of data you are working with.  The nice thing about Dataloaders is they can be working on loading data while your GPU is processing data.  This is one reason why loading data into CPU memory is not a bad idea………..it saves valuable GPU memory and allows your computer to be making use of the CPU and GPU simultaneously.

The best way to go about tackling this is to run a basic test.  One thing I can tell you for sure is it is painfully slow to leave num_workers set to default.  You should absolutely at least set it to something higher.  Using 0 or 1 num_workers can take say 1-2 minutes to load a batch.  Having it set correctly can get this down to a few seconds!  When you are doing a bunch of interactions in your model this really adds up and can be the primary way you speed up your training.

Here is code I used to benchmark and find my ideal num_workers:

 

Here is an example of the output on CIFAR-10 Data:

Obviously there are a lot of factors that can contribute to the speed in which you load data and this is just one of them.  But it is an important one.  When you have multiple GPU’s it is very important that you can feed them as fast as they can handle, and oftentimes people are falling short of this.   You can see from the above output, using at least num_workers=4 is highly beneficial.  I have had datasets where setting this parameter much higher was required for a drastic increase.  It’s always good to check!

This entry was posted in Data Analytics and tagged . Bookmark the permalink.

Leave a Reply