Release Overview
TensorFlow 2.15 has been released! Highlights of this release (and 2.14) include a much simpler installation method for NVIDIA CUDA libraries for Linux, oneDNN CPU performance optimizations for Windows x64 and x86, full availability of tf.function types, an upgrade to Clang 17.0.1, and much more! For the full release note, please check here.
Note: Release updates on the new multi-backend Keras will be published on http://keras.io starting with Keras 3.0. For more information, please check here.
TensorFlow Core
NVIDIA CUDA libraries for Linux
The tensorflow pip package has a new, optional installation method for Linux that installs necessary NVIDIA CUDA libraries through pip. As long as the NVIDIA driver is already installed on the system, you may now run pip install tensorflow[and-cuda] to install TensorFlow's NVIDIA CUDA library dependencies in the Python environment. Aside from the NVIDIA driver, no other pre-existing NVIDIA CUDA packages are necessary. In TensorFlow 2.15, CUDA has been upgraded to version 12.2.
oneDNN CPU performance optimizations
For Windows x64 & x86 packages, oneDNN optimizations are now enabled by default on X86 CPUs. These optimizations can be enabled or disabled by setting the environment variable TF_ENABLE_ONEDNN_OPTS to 1 (enable) or 0 (disable) before running TensorFlow. To fall back to default settings, simply unset the environment variable.
tf.function
tf.function types are now fully available.
- tf.types.experimental.TraceType now allows custom tf.function inputs to declare Tensor decomposition and type casting support.
- Introducing tf.types.experimental.FunctionType as the comprehensive representation of the signature of tf.function callables. It can be accessed through the function_type property of tf.function’s and ConcreteFunctions. See the tf.types.experimental.FunctionType documentation for more details.
- Introducing tf.types.experimental.AtomicFunction as the fastest way to perform TF computations in Python. This capability can be accessed through the inference_fn property of ConcreteFunctions. (Does not support gradients.) See the tf.types.experimental.AtomicFunction documentation for how to call and use it.
Upgrade to Clang 17.0.1 and CUDA 12.2
TensorFlow PIP packages are now being built with Clang 17 and CUDA 12.2 to improve performance for NVIDIA Hopper-based GPUs. Moving forward, Clang 17 will be the default C++ compiler for TensorFlow. We recommend upgrading your compiler to Clang 17 when building TensorFlow from source.
TensorFlow Breaking Changes
- tf.types.experimental.GenericFunction has been renamed to tf.types.experimental.PolymorphicFunction.
Major Features and Improvements
- oneDNN CPU performance optimizations Windows x64 & x86.
- Windows x64 & x86 packages:
- oneDNN optimizations are enabled by default on X86 CPUs
- To explicitly enable or disable oneDNN optimizations, set the environment variable TF_ENABLE_ONEDNN_OPTS to 1 (enable) or 0 (disable) before running TensorFlow. To fall back to default settings, unset the environment variable.
- oneDNN optimizations can yield slightly different numerical results compared to when oneDNN optimizations are disabled due to floating-point round-off errors from
different computation approaches and orders. - To verify if oneDNN optimizations are on, look for a message with "oneDNN custom operations are on" in the log. If the exact phrase is not there, it means they are off.
- Windows x64 & x86 packages:
- Making the tf.function type system fully available:
- tf.types.experimental.TraceType now allows custom tf.function inputs to declare Tensor decomposition and type casting support.
- Introducing tf.types.experimental.FunctionType as the comprehensive representation of the signature of tf.function callables. It can be accessed through the function_type property of tf.functions and ConcreteFunctions. See the tf.types.experimental.FunctionType documentation for more details.
- Introducing tf.types.experimental.AtomicFunction as the fastest way to perform TF computations in Python.
- Can be accessed through inference_fn property of ConcreteFunctions
- Does not support gradients.
- See tf.types.experimental.AtomicFunction documentation for how to call and use it.
- tf.data:
- Moved option warm_start from tf.data.experimental.OptimizationOptions to tf.data.Options.
- tf.lite:
- sub_op and mul_op support broadcasting up to 6 dimensions.
- The tflite::SignatureRunner class, which provides support for named parameters and for multiple named computations within a single TF Lite model, is no longer considered experimental. Likewise for the following signature-related methods of tflite::Interpreter:
- tflite::Interpreter::GetSignatureRunner
- tflite::Interpreter::signature_keys
- tflite::Interpreter::signature_inputs
- tflite::Interpreter::signature_outputs
- tflite::Interpreter::input_tensor_by_signature
- tflite::Interpreter::output_tensor_by_signature
- Similarly, the following signature runner functions in the TF Lite C API are no longer considered experimental:
- TfLiteInterpreterGetSignatureCount
- TfLiteInterpreterGetSignatureKey
- TfLiteInterpreterGetSignatureRunner
- TfLiteSignatureRunnerAllocateTensors
- TfLiteSignatureRunnerGetInputCount
- TfLiteSignatureRunnerGetInputName
- TfLiteSignatureRunnerGetInputTensor
- TfLiteSignatureRunnerGetOutputCount
- TfLiteSignatureRunnerGetOutputName
- TfLiteSignatureRunnerGetOutputTensor
- TfLiteSignatureRunnerInvoke
- TfLiteSignatureRunnerResizeInputTensor
- New C API function TfLiteExtensionApisVersion added to tensorflow/lite/c/c_api.h.
- Add int8 and int16x8 support for RSQRT operator
- Android NDK r25 is supported.
Bug Fixes and Other Changes
- Add TensorFlow Quantizer to TensorFlow pip package.
- tf.sparse.segment_sum tf.sparse.segment_mean tf.sparse.segment_sqrt_n SparseSegmentSum/Mean/SqrtN[WithNumSegments]
- Added sparse_gradient option (default=false) that makes the gradient of these functions/ops sparse (IndexedSlices) instead of dense (Tensor), using new SparseSegmentSum/Mean/SqrtNGradV2 ops.
- tf.nn.embedding_lookup_sparse
- Optimized this function for some cases by fusing internal operations.
- tf.saved_model.SaveOptions
- Provided a new experimental_skip_saver argument which, if specified, will suppress the addition of SavedModel-native save and restore ops to the SavedModel, for cases where users already build custom save/restore ops and checkpoint formats for the model being saved, and the creation of the SavedModel-native save/restore ops simply cause longer model serialization times.
- Add ops to tensorflow.raw_ops that were missing.
- tf.CheckpointOptions
- It now takes in a new argument called experimental_write_callbacks. These are callbacks that will be executed after a saving event finishes writing the checkpoint file.
- Add an option disable_eager_executer_streaming_enqueue to tensorflow.ConfigProto.Experimental to control the eager runtime's behavior around parallel remote function invocations; when set to True, the eager runtime will be allowed to execute multiple function invocations in parallel.
- tf.constant_initializer
- It now takes a new argument called support_partition. If True, constant_initializers can create sharded variables. This is disabled by default, similar to existing behavior.
- tf.lite
- Added support for stablehlo.scatter.
- tf.estimator
- The tf.estimator API removal is in progress and will be targeted for the 2.16 release.
TensorFlow 2.15 Release Notes
Release Overview
TensorFlow 2.15 has been released! Highlights of this release (and 2.14) include a much simpler installation method for NVIDIA CUDA libraries for Linux, oneDNN CPU performance optimizations for Windows x64 and x86, full availability of tf.function types, an upgrade to Clang 17.0.1, and much more! For the full release note, please check here.
Note: Release updates on the new multi-backend Keras will be published on http://keras.io starting with Keras 3.0. For more information, please check here.
TensorFlow Core
NVIDIA CUDA libraries for Linux
The tensorflow pip package has a new, optional installation method for Linux that installs necessary NVIDIA CUDA libraries through pip. As long as the NVIDIA driver is already installed on the system, you may now run pip install tensorflow[and-cuda] to install TensorFlow's NVIDIA CUDA library dependencies in the Python environment. Aside from the NVIDIA driver, no other pre-existing NVIDIA CUDA packages are necessary. In TensorFlow 2.15, CUDA has been upgraded to version 12.2.
oneDNN CPU performance optimizations
For Windows x64 & x86 packages, oneDNN optimizations are now enabled by default on X86 CPUs. These optimizations can be enabled or disabled by setting the environment variable TF_ENABLE_ONEDNN_OPTS to 1 (enable) or 0 (disable) before running TensorFlow. To fall back to default settings, simply unset the environment variable.
tf.function
tf.function types are now fully available.
- tf.types.experimental.TraceType now allows custom tf.function inputs to declare Tensor decomposition and type casting support.
- Introducing tf.types.experimental.FunctionType as the comprehensive representation of the signature of tf.function callables. It can be accessed through the function_type property of tf.function’s and ConcreteFunctions. See the tf.types.experimental.FunctionType documentation for more details.
- Introducing tf.types.experimental.AtomicFunction as the fastest way to perform TF computations in Python. This capability can be accessed through the inference_fn property of ConcreteFunctions. (Does not support gradients.) See the tf.types.experimental.AtomicFunction documentation for how to call and use it.
Upgrade to Clang 17.0.1 and CUDA 12.2
TensorFlow PIP packages are now being built with Clang 17 and CUDA 12.2 to improve performance for NVIDIA Hopper-based GPUs. Moving forward, Clang 17 will be the default C++ compiler for TensorFlow. We recommend upgrading your compiler to Clang 17 when building TensorFlow from source.
TensorFlow Breaking Changes
- tf.types.experimental.GenericFunction has been renamed to tf.types.experimental.PolymorphicFunction.
Major Features and Improvements
- oneDNN CPU performance optimizations Windows x64 & x86.
- Windows x64 & x86 packages:
- oneDNN optimizations are enabled by default on X86 CPUs
- To explicitly enable or disable oneDNN optimizations, set the environment variable TF_ENABLE_ONEDNN_OPTS to 1 (enable) or 0 (disable) before running TensorFlow. To fall back to default settings, unset the environment variable.
- oneDNN optimizations can yield slightly different numerical results compared to when oneDNN optimizations are disabled due to floating-point round-off errors from
different computation approaches and orders. - To verify if oneDNN optimizations are on, look for a message with "oneDNN custom operations are on" in the log. If the exact phrase is not there, it means they are off.
- Windows x64 & x86 packages:
- Making the tf.function type system fully available:
- tf.types.experimental.TraceType now allows custom tf.function inputs to declare Tensor decomposition and type casting support.
- Introducing tf.types.experimental.FunctionType as the comprehensive representation of the signature of tf.function callables. It can be accessed through the function_type property of tf.functions and ConcreteFunctions. See the tf.types.experimental.FunctionType documentation for more details.
- Introducing tf.types.experimental.AtomicFunction as the fastest way to perform TF computations in Python.
- Can be accessed through inference_fn property of ConcreteFunctions
- Does not support gradients.
- See tf.types.experimental.AtomicFunction documentation for how to call and use it.
- tf.data:
- Moved option warm_start from tf.data.experimental.OptimizationOptions to tf.data.Options.
- tf.lite:
- sub_op and mul_op support broadcasting up to 6 dimensions.
- The tflite::SignatureRunner class, which provides support for named parameters and for multiple named computations within a single TF Lite model, is no longer considered experimental. Likewise for the following signature-related methods of tflite::Interpreter:
- tflite::Interpreter::GetSignatureRunner
- tflite::Interpreter::signature_keys
- tflite::Interpreter::signature_inputs
- tflite::Interpreter::signature_outputs
- tflite::Interpreter::input_tensor_by_signature
- tflite::Interpreter::output_tensor_by_signature
- Similarly, the following signature runner functions in the TF Lite C API are no longer considered experimental:
- TfLiteInterpreterGetSignatureCount
- TfLiteInterpreterGetSignatureKey
- TfLiteInterpreterGetSignatureRunner
- TfLiteSignatureRunnerAllocateTensors
- TfLiteSignatureRunnerGetInputCount
- TfLiteSignatureRunnerGetInputName
- TfLiteSignatureRunnerGetInputTensor
- TfLiteSignatureRunnerGetOutputCount
- TfLiteSignatureRunnerGetOutputName
- TfLiteSignatureRunnerGetOutputTensor
- TfLiteSignatureRunnerInvoke
- TfLiteSignatureRunnerResizeInputTensor
- New C API function TfLiteExtensionApisVersion added to tensorflow/lite/c/c_api.h.
- Add int8 and int16x8 support for RSQRT operator
- Android NDK r25 is supported.
Bug Fixes and Other Changes
- Add TensorFlow Quantizer to TensorFlow pip package.
- tf.sparse.segment_sum tf.sparse.segment_mean tf.sparse.segment_sqrt_n SparseSegmentSum/Mean/SqrtN[WithNumSegments]
- Added sparse_gradient option (default=false) that makes the gradient of these functions/ops sparse (IndexedSlices) instead of dense (Tensor), using new SparseSegmentSum/Mean/SqrtNGradV2 ops.
- tf.nn.embedding_lookup_sparse
- Optimized this function for some cases by fusing internal operations.
- tf.saved_model.SaveOptions
- Provided a new experimental_skip_saver argument which, if specified, will suppress the addition of SavedModel-native save and restore ops to the SavedModel, for cases where users already build custom save/restore ops and checkpoint formats for the model being saved, and the creation of the SavedModel-native save/restore ops simply cause longer model serialization times.
- Add ops to tensorflow.raw_ops that were missing.
- tf.CheckpointOptions
- It now takes in a new argument called experimental_write_callbacks. These are callbacks that will be executed after a saving event finishes writing the checkpoint file.
- Add an option disable_eager_executer_streaming_enqueue to tensorflow.ConfigProto.Experimental to control the eager runtime's behavior around parallel remote function invocations; when set to True, the eager runtime will be allowed to execute multiple function invocations in parallel.
- tf.constant_initializer
- It now takes a new argument called support_partition. If True, constant_initializers can create sharded variables. This is disabled by default, similar to existing behavior.
- tf.lite
- Added support for stablehlo.scatter.
- tf.estimator
- The tf.estimator API removal is in progress and will be targeted for the 2.16 release.