over 6 years ago

Three weeks to go: best practice tips for your PyTorch App

There are three weeks remaining in the PyTorch Virtual Summer Hack. Submissions are due September 16, 2019 before 2pm PST. (Find out what time that is in your city)

Start your submission now

Starting a submission is a good way to make sure that your project meets the requirements. There’s nothing worse than getting to the deadline, only to realize that you’re missing a key component. Remember: once you submit a project, you can keep editing it as needed at any time prior to the submission deadline. Need a primer on how the submission form works? Head over to our submission form tutorial.

Prepare your video

Don’t forget: your submission must include a video showing your submission. Too often, participants leave this video to the last minute without realizing that it’s an important way to showcase your work for the judges. If you need some help getting started, check out our video-making tips!

Bonus Tech Tip: Multiprocessing Best Practices for Your PyTorch Application

At this point, you may be well on your way with your project, however, it is always good to have some best practice tips to strengthen your submission over the last few weeks!

torch.multiprocessing is a drop in replacement for Python’s multiprocessing module. It supports the exact same operations, but extends it, so that all tensors sent through a multiprocessing.Queue, will have their data moved into shared memory and will only send a handle to another process.

Avoiding and fighting deadlocks

There are a lot of things that can go wrong when a new process is spawned, with the most common cause of deadlocks being background threads. If there’s any thread that holds a lock or imports a module, and fork is called, it’s very likely that the subprocess will be in a corrupted state and will deadlock or fail in a different way. Queue is actually a very complex class, that spawns multiple threads used to serialize, send and receive objects, and they can cause aforementioned problems too. If you find yourself in such situation try using a multiprocessing.queues.SimpleQueue, that doesn’t use any additional threads.

Reuse buffers passed through a Queue

Remember that each time you put a Tensor into a multiprocessing.Queue, it has to be moved into shared memory. If it’s already shared, it is a no-op, otherwise it will incur an additional memory copy that can slow down the whole process. Even if you have a pool of processes sending data to a single one, make it send the buffers back - this is nearly free and will let you avoid a copy when sending next batch.

Asynchronous multiprocess training (e.g. Hogwild)

Using torch.multiprocessing, it is possible to train a model asynchronously, with parameters either shared all the time, or being periodically synchronized. In the first case, we recommend sending over the whole model object, while in the latter, we advise to only send the state_dict().

We recommend using multiprocessing.Queue for passing all kinds of PyTorch objects between processes. It is possible to e.g. inherit the tensors and storages already in shared memory, when using the fork start method, however it is very bug prone and should be used with care, and only by advanced users. Queues, even though they’re sometimes a less elegant solution, will work properly in all cases.

To read the full article on multiprocessing best practices click here. We can’t wait to see your project!

Questions?

If you have any questions about the hackathon, please post on the discussion forum.