Deep Learning

PyTorch V1.0.1 Updates / Bug Fix Release

February 7, 2019

16 min read

PyTorch is a widely used, open source deep learning platform used for easily writing neural network layers in Python enabling a seamless workflow from research to production.

Here are the latest updates / bug fix releases.

Serious

Higher order gradients for CPU Convolutions have been fixed (regressed in 1.0.0 under MKL-DNN setting) #15686
Correct gradients for non-contiguous weights in CPU Convolutions #16301
Fix ReLU on CPU Integer Tensors by fixing vec256 inversions #15634
Fix bincount for non-contiguous Tensors #15109
Fix torch.norm on CPU for large Tensors #15602
Fix eq_ to do equality on GPU (was doing greater-equal due to a typo) (#15475)
Workaround a CuDNN bug that gave wrong results in certain strided convolution gradient setups
- blacklist fft algorithms for strided dgrad (#16626)

Correctness

Fix cuda native loss_ctc for varying input length (#15798)
- this avoids NaNs in variable length settings
C++ Frontend: Fix serialization (#15033)
- Fixes a bug where (de-)/serializing a hierarchy of submodules where one submodule doesn't have any parameters, but its submodules do
Fix derivative for mvlgamma (#15049)
Fix numerical stability in log_prob for Gumbel distribution (#15878)
multinomial: fix detection and drawing of zero probability events (#16075)

Crashes

PyTorch binaries were crashing on AWS Lambda and a few other niche systems, stemming from CPUInfo handling certain warnings as errors. Updated CPUInfo with relevant fixes.
MKL-DNN is now statically built, to avoid conflicts with system versions
Allow ReadyQueue to handle empty tasks (#15791)
- Fixes a segfault with a DataParallel + Checkpoint neural network setting
Avoid integer divide by zero error in index_put_ (#14984)
Fix for model inference crash on Win10 (#15919) (#16092)
Use CUDAGuard when serializing Tensors:
- Before this change, torch.save and torch.load would initialize the CUDA context on GPU 0 if it hadn't been initialized already, even if the serialized tensors are only on GPU 1.
Fix error with handling scalars and rpow, for example 1 ^^ x, where x is a PyTorch scalar (#16687)
Switch to CUDA implementation instead of CuDNN if batch size >= 65536 for affine_grid (#16403)
- CuDNN crashes when batch size >= 65536
[Distributed] TCP init method race condition fix (#15684)
[Distributed] Fix a memory leak in Gloo's CPU backend
[C++ Frontend] Fix LBFGS issue around using inplace ops (#16167)
[Hub] Fix github branch prefix v (#15552)
[Hub] url download bugfix for URLs served without Content-Length header

Performance

LibTorch binaries now ship with CuDNN enabled. Without this change, many folks saw significant perf differences while using LibTorch vs PyTorch, this should be fixed now. #14976
Make btriunpack work for high dimensional batches and faster than before (#15286)
improve performance of unique with inverse indices (#16145)
Re-enable OpenMP in binaries (got disabled because of a CMake refactor)

Other

create type hint stub files for module torch (#16089)
- This will restore auto-complete functionality in PyCharm, VSCode etc.
Fix sum_to behavior with zero dimensions (#15796)
Match NumPy by considering NaNs to be larger than any number when sorting (#15886)
Fixes various error message / settings in dynamic weight GRU / LSTMs (#15766)
C++ Frontend: Make call operator on module holder call forward (#15831)
C++ Frontend: Add the normalize transform to the core library (#15891)
Fix bug in torch::load and unpack torch::optim::detail namespace (#15926)
Implements Batched upper triangular, lower triangular (#15257)
Add torch.roll to documentation (#14880)
(better errors) Add backend checks for batch norm (#15955)

JIT

Add better support for bools in the graph fuser (#15057)
Allow tracing with fork/wait (#15184)
improve script/no script save error (#15321)
Add self to Python printer reserved words (#15318)
Better error when torch.load-ing a JIT model (#15578)
fix select after chunk op (#15672)
Add script standard library documentation + cleanup (#14912)

Topics

Have any questions?

Deep Learning

PyTorch V1.0.1 Updates / Bug Fix Release

February 7, 201916 min read

PyTorch is a widely used, open source deep learning platform used for easily writing neural network layers in Python enabling a seamless workflow from research to production.

Here are the latest updates / bug fix releases.

Serious

Higher order gradients for CPU Convolutions have been fixed (regressed in 1.0.0 under MKL-DNN setting) #15686
Correct gradients for non-contiguous weights in CPU Convolutions #16301
Fix ReLU on CPU Integer Tensors by fixing vec256 inversions #15634
Fix bincount for non-contiguous Tensors #15109
Fix torch.norm on CPU for large Tensors #15602
Fix eq_ to do equality on GPU (was doing greater-equal due to a typo) (#15475)
Workaround a CuDNN bug that gave wrong results in certain strided convolution gradient setups
- blacklist fft algorithms for strided dgrad (#16626)

Correctness

Fix cuda native loss_ctc for varying input length (#15798)
- this avoids NaNs in variable length settings
C++ Frontend: Fix serialization (#15033)
- Fixes a bug where (de-)/serializing a hierarchy of submodules where one submodule doesn't have any parameters, but its submodules do
Fix derivative for mvlgamma (#15049)
Fix numerical stability in log_prob for Gumbel distribution (#15878)
multinomial: fix detection and drawing of zero probability events (#16075)

Crashes

PyTorch binaries were crashing on AWS Lambda and a few other niche systems, stemming from CPUInfo handling certain warnings as errors. Updated CPUInfo with relevant fixes.
MKL-DNN is now statically built, to avoid conflicts with system versions
Allow ReadyQueue to handle empty tasks (#15791)
- Fixes a segfault with a DataParallel + Checkpoint neural network setting
Avoid integer divide by zero error in index_put_ (#14984)
Fix for model inference crash on Win10 (#15919) (#16092)
Use CUDAGuard when serializing Tensors:
- Before this change, torch.save and torch.load would initialize the CUDA context on GPU 0 if it hadn't been initialized already, even if the serialized tensors are only on GPU 1.
Fix error with handling scalars and rpow, for example 1 ^^ x, where x is a PyTorch scalar (#16687)
Switch to CUDA implementation instead of CuDNN if batch size >= 65536 for affine_grid (#16403)
- CuDNN crashes when batch size >= 65536
[Distributed] TCP init method race condition fix (#15684)
[Distributed] Fix a memory leak in Gloo's CPU backend
[C++ Frontend] Fix LBFGS issue around using inplace ops (#16167)
[Hub] Fix github branch prefix v (#15552)
[Hub] url download bugfix for URLs served without Content-Length header

Performance

LibTorch binaries now ship with CuDNN enabled. Without this change, many folks saw significant perf differences while using LibTorch vs PyTorch, this should be fixed now. #14976
Make btriunpack work for high dimensional batches and faster than before (#15286)
improve performance of unique with inverse indices (#16145)
Re-enable OpenMP in binaries (got disabled because of a CMake refactor)

Other

create type hint stub files for module torch (#16089)
- This will restore auto-complete functionality in PyCharm, VSCode etc.
Fix sum_to behavior with zero dimensions (#15796)
Match NumPy by considering NaNs to be larger than any number when sorting (#15886)
Fixes various error message / settings in dynamic weight GRU / LSTMs (#15766)
C++ Frontend: Make call operator on module holder call forward (#15831)
C++ Frontend: Add the normalize transform to the core library (#15891)
Fix bug in torch::load and unpack torch::optim::detail namespace (#15926)
Implements Batched upper triangular, lower triangular (#15257)
Add torch.roll to documentation (#14880)
(better errors) Add backend checks for batch norm (#15955)

JIT

Add better support for bools in the graph fuser (#15057)
Allow tracing with fork/wait (#15184)
improve script/no script save error (#15321)
Add self to Python printer reserved words (#15318)
Better error when torch.load-ing a JIT model (#15578)
fix select after chunk op (#15672)
Add script standard library documentation + cleanup (#14912)

PyTorch V1.0.1 Updates / Bug Fix Release

Serious

Correctness

Crashes

Performance

Other

JIT

Sign up for our newsletter.

Related Posts

Deep Learning

What are Multi-Layer Perceptrons and When to Use MLPs vs Transformers

Deep Learning

YOLOv8 Setup Tutorial for Object Detection

Deep Learning

What is LLM Quantization - Condensing Models to Manageable Sizes

Topics

Have any questions?

Deep Learning

What are Multi-Layer Perceptrons and When to Use MLPs vs Transformers

Deep Learning

YOLOv8 Setup Tutorial for Object Detection

Deep Learning

What is LLM Quantization - Condensing Models to Manageable Sizes

Our Goal

PyTorch V1.0.1 Updates / Bug Fix Release

Serious

Correctness

Crashes

Performance

Other

JIT

Related Posts

Deep Learning

What are Multi-Layer Perceptrons and When to Use MLPs vs Transformers

Deep Learning

YOLOv8 Setup Tutorial for Object Detection

Deep Learning

What is LLM Quantization - Condensing Models to Manageable Sizes

Deep Learning

Large Action Model - Large Language Models for Performing Tasks

Sign up for our newsletter.

Topics

Have any questions?