Deep Learning

scikit-learn 1.0 Released

September 24, 2021
88 min read
Scikit-Learn-Blog.png

scikit-learn 1.0 Now Available

scikit-learn is an open source machine learning library that supports supervised and unsupervised learning, and is used by an estimated 80% of data scientists, according to a recent Kaggle survey. 

The library contains implementations of many common ML algorithms and models, including the widely-used linear regression, decision tree, and gradient-boosting algorithms. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities.

This release includes some new key features as well as many improvements and bug fixes. Highlights include:

  • Keyword and positional arguments
  • Spline Transformers
  • Quantile Regressor
  • Feature Names Support
  • A more flexible plotting API
  • Online One-Class SVM
  • Histogram-based Gradient Boosting Models are now stable
  • New documentation improvements

For more details on the main highlights of the release, please refer to Release Highlights for scikit-learn 1.0.

To install the latest version (with pip):

pip install --upgrade scikit-learn

or with conda:

conda install -c conda-forge scikit-learn

Version 1.0.0

For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 1.0.

Legend for changelogs

  • Major Feature : something big that you couldn’t do before.
  • Feature : something that you couldn’t do before.
  • Efficiency : an existing feature now may not require as much computation or memory.
  • Enhancement : a miscellaneous minor improvement.
  • Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.
  • API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.

Minimal dependencies

Version 1.0.0 of scikit-learn requires python 3.7+, numpy 1.14.6+ and scipy 1.1.0+. Optional minimal dependency is matplotlib 2.2.2+.

Enforcing keyword-only arguments

In an effort to promote clear and non-ambiguous use of the library, most constructor and function parameters must now be passed as keyword arguments (i.e. using the param=value syntax) instead of positional. If a keyword-only parameter is used as positional, a TypeError is now raised.

Changed models

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

  • Fix manifold.TSNE now avoids numerical underflow issues during affinity matrix computation.
  • Fix manifold.Isomap now connects disconnected components of the neighbors graph along some minimum distance pairs, instead of changing every infinite distances to zero.
  • Fix The splitting criterion of tree.DecisionTreeClassifier and tree.DecisionTreeRegressor can be impacted by a fix in the handling of rounding errors. Previously some extra spurious splits could occur.

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)

Changelog

sklearn.base

sklearn.calibration

sklearn.cluster

sklearn.compose

sklearn.covariance

  • Fix Adds arrays check to covariance.ledoit_wolf and covariance.ledoit_wolf_shrinkage. #20416 by Hugo Defois.
  • API Change Deprecates the following keys in cv_results_: 'mean_score', 'std_score', and 'split(k)_score' in favor of 'mean_test_score' 'std_test_score', and 'split(k)_test_score'. #20583 by Thomas Fan.

sklearn.datasets

  • Enhancement datasets.fetch_openml now supports categories with missing values when returning a pandas dataframe. #19365 by Thomas Fan and Amanda Dsouza and EL-ATEIF Sara.
  • Enhancement datasets.fetch_kddcup99 raises a better message when the cached file is invalid. #19669 Thomas Fan.
  • Enhancement Replace usages of __file__ related to resource file I/O with importlib.resources to avoid the assumption that these resource files (e.g. iris.csv) already exist on a filesystem, and by extension to enable compatibility with tools such as PyOxidizer. #20297 by Jack Liu.
  • Fix Shorten data file names in the openml tests to better support installing on Windows and its default 260 character limit on file names. #20209 by Thomas Fan.
  • Fix datasets.fetch_kddcup99 returns dataframes when return_X_y=True and as_frame=True. #19011 by Thomas Fan.
  • API Change Deprecates datasets.load_boston in 1.0 and it will be removed in 1.2. Alternative code snippets to load similar datasets are provided. Please report to the docstring of the function for details. #20729 by Guillaume Lemaitre.

sklearn.decomposition

sklearn.dummy

sklearn.ensemble

sklearn.feature_extraction

sklearn.feature_selection

sklearn.inspection

sklearn.kernel_approximation

sklearn.linear_model

sklearn.manifold

  • Enhancement Implement 'auto' heuristic for the learning_rate in manifold.TSNE. It will become default in 1.2. The default initialization will change to pca in 1.2. PCA initialization will be scaled to have standard deviation 1e-4 in 1.2. #19491 by Dmitry Kobak.
  • Fix Change numerical precision to prevent underflow issues during affinity matrix computation for manifold.TSNE. #19472 by Dmitry Kobak.
  • Fix manifold.Isomap now uses scipy.sparse.csgraph.shortest_path to compute the graph shortest path. It also connects disconnected components of the neighbors graph along some minimum distance pairs, instead of changing every infinite distances to zero. #20531 by Roman Yurchak and Tom Dupre la Tour.
  • Fix Decrease the numerical default tolerance in the lobpcg call in manifold.spectral_embedding to prevent numerical instability. #21194 by Andrew Knyazev.

sklearn.metrics

sklearn.mixture

sklearn.model_selection

sklearn.naive_bayes

sklearn.neighbors

sklearn.neural_network

sklearn.pipeline

sklearn.preprocessing

sklearn.svm

sklearn.tree

sklearn.utils


Have any questions?
Contact Exxact Today


Topics

Scikit-Learn-Blog.png
Deep Learning

scikit-learn 1.0 Released

September 24, 202188 min read

scikit-learn 1.0 Now Available

scikit-learn is an open source machine learning library that supports supervised and unsupervised learning, and is used by an estimated 80% of data scientists, according to a recent Kaggle survey. 

The library contains implementations of many common ML algorithms and models, including the widely-used linear regression, decision tree, and gradient-boosting algorithms. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities.

This release includes some new key features as well as many improvements and bug fixes. Highlights include:

  • Keyword and positional arguments
  • Spline Transformers
  • Quantile Regressor
  • Feature Names Support
  • A more flexible plotting API
  • Online One-Class SVM
  • Histogram-based Gradient Boosting Models are now stable
  • New documentation improvements

For more details on the main highlights of the release, please refer to Release Highlights for scikit-learn 1.0.

To install the latest version (with pip):

pip install --upgrade scikit-learn

or with conda:

conda install -c conda-forge scikit-learn

Version 1.0.0

For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 1.0.

Legend for changelogs

  • Major Feature : something big that you couldn’t do before.
  • Feature : something that you couldn’t do before.
  • Efficiency : an existing feature now may not require as much computation or memory.
  • Enhancement : a miscellaneous minor improvement.
  • Fix : something that previously didn’t work as documentated – or according to reasonable expectations – should now work.
  • API Change : you will need to change your code to have the same effect in the future; or a feature will be removed in the future.

Minimal dependencies

Version 1.0.0 of scikit-learn requires python 3.7+, numpy 1.14.6+ and scipy 1.1.0+. Optional minimal dependency is matplotlib 2.2.2+.

Enforcing keyword-only arguments

In an effort to promote clear and non-ambiguous use of the library, most constructor and function parameters must now be passed as keyword arguments (i.e. using the param=value syntax) instead of positional. If a keyword-only parameter is used as positional, a TypeError is now raised.

Changed models

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

  • Fix manifold.TSNE now avoids numerical underflow issues during affinity matrix computation.
  • Fix manifold.Isomap now connects disconnected components of the neighbors graph along some minimum distance pairs, instead of changing every infinite distances to zero.
  • Fix The splitting criterion of tree.DecisionTreeClassifier and tree.DecisionTreeRegressor can be impacted by a fix in the handling of rounding errors. Previously some extra spurious splits could occur.

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)

Changelog

sklearn.base

sklearn.calibration

sklearn.cluster

sklearn.compose

sklearn.covariance

  • Fix Adds arrays check to covariance.ledoit_wolf and covariance.ledoit_wolf_shrinkage. #20416 by Hugo Defois.
  • API Change Deprecates the following keys in cv_results_: 'mean_score', 'std_score', and 'split(k)_score' in favor of 'mean_test_score' 'std_test_score', and 'split(k)_test_score'. #20583 by Thomas Fan.

sklearn.datasets

  • Enhancement datasets.fetch_openml now supports categories with missing values when returning a pandas dataframe. #19365 by Thomas Fan and Amanda Dsouza and EL-ATEIF Sara.
  • Enhancement datasets.fetch_kddcup99 raises a better message when the cached file is invalid. #19669 Thomas Fan.
  • Enhancement Replace usages of __file__ related to resource file I/O with importlib.resources to avoid the assumption that these resource files (e.g. iris.csv) already exist on a filesystem, and by extension to enable compatibility with tools such as PyOxidizer. #20297 by Jack Liu.
  • Fix Shorten data file names in the openml tests to better support installing on Windows and its default 260 character limit on file names. #20209 by Thomas Fan.
  • Fix datasets.fetch_kddcup99 returns dataframes when return_X_y=True and as_frame=True. #19011 by Thomas Fan.
  • API Change Deprecates datasets.load_boston in 1.0 and it will be removed in 1.2. Alternative code snippets to load similar datasets are provided. Please report to the docstring of the function for details. #20729 by Guillaume Lemaitre.

sklearn.decomposition

sklearn.dummy

sklearn.ensemble

sklearn.feature_extraction

sklearn.feature_selection

sklearn.inspection

sklearn.kernel_approximation

sklearn.linear_model

sklearn.manifold

  • Enhancement Implement 'auto' heuristic for the learning_rate in manifold.TSNE. It will become default in 1.2. The default initialization will change to pca in 1.2. PCA initialization will be scaled to have standard deviation 1e-4 in 1.2. #19491 by Dmitry Kobak.
  • Fix Change numerical precision to prevent underflow issues during affinity matrix computation for manifold.TSNE. #19472 by Dmitry Kobak.
  • Fix manifold.Isomap now uses scipy.sparse.csgraph.shortest_path to compute the graph shortest path. It also connects disconnected components of the neighbors graph along some minimum distance pairs, instead of changing every infinite distances to zero. #20531 by Roman Yurchak and Tom Dupre la Tour.
  • Fix Decrease the numerical default tolerance in the lobpcg call in manifold.spectral_embedding to prevent numerical instability. #21194 by Andrew Knyazev.

sklearn.metrics

sklearn.mixture

sklearn.model_selection

sklearn.naive_bayes

sklearn.neighbors

sklearn.neural_network

sklearn.pipeline

sklearn.preprocessing

sklearn.svm

sklearn.tree

sklearn.utils


Have any questions?
Contact Exxact Today


Topics