### scikit-learn 1.1 Now Available

__scikit-learn__ is an open source machine learning library that supports supervised and unsupervised learning, and is used by an estimated 80% of data scientists, according to a recent Kaggle survey.

The library contains implementations of many common ML algorithms and models, including the widely-used linear regression, decision tree, and gradient-boosting algorithms. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities.

Highlights include:

- Quantile loss in
`ensemble.HistGradientBoostingRegressor`

`get_feature_names_out`

Available in all Transformers- Grouping infrequent categories in
`OneHotEncoder`

- Performance improvements
- MiniBatchNMF: an online version of NMF
- BisectingKMeans: divide and cluster

For more details on the main highlights of the release, please refer to __Release Highlights for scikit-learn 1.1__.

To install the latest version (with pip):

pip install --upgrade scikit-learn

or with conda:

conda install -c conda-forge scikit-learn

#### Version 1.1.0

For a short description of the main highlights of the release, please refer to __Release Highlights for scikit-learn 1.1__.

**Major Feature**: something big that you couldn’t do before.**Feature**: something that you couldn’t do before.**Efficiency**: an existing feature now may not require as much computation or memory.**Enhancement**: a miscellaneous minor improvement.**Fix**: something that previously didn’t work as documentated – or according to reasonable expectations – should now work.**API Change**: you will need to change your code to have the same effect in the future; or a feature will be removed in the future.

Version 1.1.0 of scikit-learn requires python 3.8+, numpy 1.17.3+ and scipy 1.3.2+. Optional minimal dependency is matplotlib 3.1.2+.

**Changed models**

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

**Efficiency**`cluster.KMeans`

now defaults to`algorithm="lloyd"`

instead of`algorithm="auto"`

, which was equivalent to`algorithm="elkan"`

. Lloyd’s algorithm and Elkan’s algorithm converge to the same solution, up to numerical rounding errors, but in general Lloyd’s algorithm uses much less memory, and it is often faster.**Efficiency**Fitting`tree.DecisionTreeClassifier`

,`tree.DecisionTreeRegressor`

,`ensemble.RandomForestClassifier`

,`ensemble.RandomForestRegressor`

,`ensemble.GradientBoostingClassifier`

, and`ensemble.GradientBoostingRegressor`

is on average 15% faster than in previous versions thanks to a new sort algorithm to find the best split. Models might be different because of a different handling of splits with tied criterion values: both the old and the new sorting algorithm are unstable sorting algorithms. #22868 by Thomas Fan.**Fix**The eigenvectors initialization for`cluster.SpectralClustering`

and`manifold.SpectralEmbedding`

now samples from a Gaussian when using the`'amg'`

or`'lobpcg'`

solver. This change improves numerical stability of the solver, but may result in a different model.**Fix**`feature_selection.f_regression`

and`feature_selection.r_regression`

will now returned finite score by default instead of`np.nan`

and`np.inf`

for some corner case. You can use`force_finite=False`

if you really want to get non-finite values and keep the old behavior.**Fix**Panda’s DataFrames with all non-string columns such as a MultiIndex no longer warns when passed into an Estimator. Estimators will continue to ignore the column names in DataFrames with non-string columns. For`feature_names_in_`

to be defined, columns must be all strings. #22410 by Thomas Fan.**Fix**`preprocessing.KBinsDiscretizer`

changed handling of bin edges slightly, which might result in a different encoding with the same data.**Fix**`calibration.calibration_curve`

changed handling of bin edges slightly, which might result in a different output curve given the same data.**Fix**`discriminant_analysis.LinearDiscriminantAnalysis`

now uses the correct variance-scaling coefficient which may result in different model behavior.**Fix**`feature_selection.SelectFromModel.fit`

and`feature_selection.SelectFromModel.partial_fit`

can now be called with`prefit=True`

.`estimators_`

will be a deep copy of`estimator`

when`prefit=True`

. #23271 by Guillaume Lemaitre.

### Changelog

**Efficiency**Low-level routines for reductions on pairwise distances for dense float64 datasets have been refactored. The following functions and estimators now benefit from improved performances in terms of hardware scalability and speed-ups:`sklearn.metrics.pairwise_distances_argmin`

`sklearn.metrics.pairwise_distances_argmin_min`

`sklearn.cluster.AffinityPropagation`

`sklearn.cluster.Birch`

`sklearn.cluster.MeanShift`

`sklearn.cluster.OPTICS`

`sklearn.cluster.SpectralClustering`

`sklearn.feature_selection.mutual_info_regression`

`sklearn.neighbors.KNeighborsClassifier`

`sklearn.neighbors.KNeighborsRegressor`

`sklearn.neighbors.RadiusNeighborsClassifier`

`sklearn.neighbors.RadiusNeighborsRegressor`

`sklearn.neighbors.LocalOutlierFactor`

`sklearn.neighbors.NearestNeighbors`

`sklearn.manifold.Isomap`

`sklearn.manifold.LocallyLinearEmbedding`

`sklearn.manifold.TSNE`

`sklearn.manifold.trustworthiness`

`sklearn.semi_supervised.LabelPropagation`

`sklearn.semi_supervised.LabelSpreading`

For instance

`sklearn.neighbors.NearestNeighbors.kneighbors`

and`sklearn.neighbors.NearestNeighbors.radius_neighbors`

can respectively be up to ×20 and ×5 faster than previously. #21987, #22064, #22065, #22288 and #22320 by Julien Jerphanion.**Enhancement**All scikit-learn models now generate a more informative error message when some input contains unexpected`NaN`

or infinite values. In particular the message contains the input name (“X”, “y” or “sample_weight”) and if an unexpected`NaN`

value is found in`X`

, the error message suggests potential solutions. #21219 by Olivier Grisel.**Enhancement**All scikit-learn models now generate a more informative error message when setting invalid hyper-parameters with`set_params`

. #21542 by Olivier Grisel.**Enhancement**Removes random unique identifiers in the HTML representation. With this change, jupyter notebooks are reproducible as long as the cells are run in the same order. #23098 by Thomas Fan.**Fix**Estimators with`non_deterministic`

tag set to`True`

will skip both`check_methods_sample_order_invariance`

and`check_methods_subset_invariance`

tests. #22318 by Zhehao Liu.**API Change**The option for using the log loss, aka binomial or multinomial deviance, via the`loss`

parameters was made more consistent. The preferred way is by setting the value to`"log_loss"`

. Old option names are still valid and produce the same models, but are deprecated and will be removed in version 1.3.- For
`ensemble.GradientBoostingClassifier`

, the`loss`

parameter name “deviance” is deprecated in favor of the new name “log_loss”, which is now the default. #23036 by Christian Lorentzen. - For
`ensemble.HistGradientBoostingClassifier`

, the`loss`

parameter names “auto”, “binary_crossentropy” and “categorical_crossentropy” are deprecated in favor of the new name “log_loss”, which is now the default. #23040 by Christian Lorentzen. - For
`linear_model.SGDClassifier`

, the`loss`

parameter name “log” is deprecated in favor of the new name “log_loss”. #23046 by Christian Lorentzen.

- For
**API Change**Rich html representation of estimators is now enabled by default in Jupyter notebooks. It can be deactivated by setting`display='text'`

in`sklearn.set_config`

. #22856 by Jérémie du Boisberranger.**Enhancement**The error message is improved when importing`model_selection.HalvingGridSearchCV`

,`model_selection.HalvingRandomSearchCV`

, or`impute.IterativeImputer`

without importing the experimental flag. #23194 by Thomas Fan.**Enhancement**Added an extension in doc/conf.py to automatically generate the list of estimators that handle NaN values. #23198 by Lise Kleiber, Zhehao Liu and Chiara Marmo.

`sklearn.calibration`

**Enhancement**`calibration.calibration_curve`

accepts a parameter`pos_label`

to specify the positive class label. #21032 by Guillaume Lemaitre.**Enhancement**`calibration.CalibratedClassifierCV.fit`

now supports passing`fit_params`

, which are routed to the`base_estimator`

. #18170 by Benjamin Bossan.**Enhancement**`calibration.CalibrationDisplay`

accepts a parameter`pos_label`

to add this information to the plot. #21038 by Guillaume Lemaitre.**Fix**`calibration.calibration_curve`

handles bin edges more consistently now. #14975 by Andreas Müller and #22526 by Meekail Zain.**API Change**`calibration.calibration_curve`

’s`normalize`

parameter is now deprecated and will be removed in version 1.3. It is recommended that a proper probability (i.e. a classifier’s predict_proba positive class) is used for`y_prob`

. #23095 by Jordan Silke.

`sklearn.cluster`

**Major Feature**`BisectingKMeans`

introducing Bisecting K-Means algorithm #20031 by Michal Krawczyk, Tom Dupre la Tour and Jérémie du Boisberranger.**Enhancement**`cluster.SpectralClustering`

and`cluster.spectral_clustering`

now include the new`'cluster_qr'`

method that clusters samples in the embedding space as an alternative to the existing`'kmeans'`

and`'discrete'`

methods. See`cluster.spectral_clustering`

for more details. #21148 by Andrew Knyazev.**Enhancement**Adds get_feature_names_out to`cluster.Birch`

,`cluster.FeatureAgglomeration`

,`cluster.KMeans`

,`cluster.MiniBatchKMeans`

. #22255 by Thomas Fan.**Enhancement**`cluster.SpectralClustering`

now raises consistent error messages when passed invalid values for`n_clusters`

,`n_init`

,`gamma`

,`n_neighbors`

,`eigen_tol`

or`degree`

. #21881 by Hugo Vassard.**Enhancement**`cluster.AffinityPropagation`

now returns cluster centers and labels if they exist, even if the model has not fully converged. When returning these potentially-degenerate cluster centers and labels, a new warning message is shown. If no cluster centers were constructed, then the cluster centers remain an empty list with labels set to`-1`

and the original warning message is shown. #22217 by Meekail Zain.**Efficiency**In`cluster.KMeans`

, the default`algorithm`

is now`"lloyd"`

which is the full classical EM-style algorithm. Both`"auto"`

and`"full"`

are deprecated and will be removed in version 1.3. They are now aliases for`"lloyd"`

. The previous default was`"auto"`

, which relied on Elkan’s algorithm. Lloyd’s algorithm uses less memory than Elkan’s, it is faster on many datasets, and its results are identical, hence the change. #21735 by Aurélien Geron.**Fix**`cluster.KMeans`

’s`init`

parameter now properly supports array-like input and NumPy string scalars. #22154 by Thomas Fan.

`sklearn.compose`

**Fix**`compose.ColumnTransformer`

now removes validation errors from`__init__`

and`set_params`

methods. #22537 by iofall and Arisa Y..**Fix**get_feature_names_out functionality in`compose.ColumnTransformer`

was broken when columns were specified using`slice`

. This is fixed in #22775 and #22913 by randomgeek78.

`sklearn.covariance`

**Fix**`covariance.GraphicalLassoCV`

now accepts NumPy array for the parameter`alphas`

. #22493 by Guillaume Lemaitre.

`sklearn.cross_decomposition`

**Enhancement**the`inverse_transform`

method of`cross_decomposition.PLSRegression`

,`cross_decomposition.PLSCanonical`

and`cross_decomposition.CCA`

now allows reconstruction of a`X`

target when a`Y`

parameter is given. #19680 by Robin Thibaut.**Enhancement**Adds get_feature_names_out to all transformers in the`cross_decomposition`

module:`cross_decomposition.CCA`

,`cross_decomposition.PLSSVD`

,`cross_decomposition.PLSRegression`

, and`cross_decomposition.PLSCanonical`

. #22119 by Thomas Fan.**Fix**The shape of the coef_ attribute of`cross_decomposition.CCA`

,`cross_decomposition.PLSCanonical`

and`cross_decomposition.PLSRegression`

will change in version 1.3, from`(n_features, n_targets)`

to`(n_targets, n_features)`

, to be consistent with other linear models and to make it work with interface expecting a specific shape for`coef_`

(e.g.`feature_selection.RFE`

). #22016 by Guillaume Lemaitre.**API Change**add the fitted attribute`intercept_`

to`cross_decomposition.PLSCanonical`

,`cross_decomposition.PLSRegression`

, and`cross_decomposition.CCA`

. The method`predict`

is indeed equivalent to`Y = X @ coef_ + intercept_`

. #22015 by Guillaume Lemaitre.

`sklearn.datasets`

**Feature**`datasets.load_files`

now accepts a ignore list and an allow list based on file extensions. #19747 by Tony Attalla and #22498 by Meekail Zain.**Enhancement**`datasets.make_swiss_roll`

now supports the optional argument hole; when set to True, it returns the swiss-hole dataset. #21482 by Sebastian Pujalte.**Enhancement**`datasets.make_blobs`

no longer copies data during the generation process, therefore uses less memory. #22412 by Zhehao Liu.**Enhancement**`datasets.load_diabetes`

now accepts the parameter`scaled`

, to allow loading unscaled data. The scaled version of this dataset is now computed from the unscaled data, and can produce slightly different results that in previous version (within a 1e-4 absolute tolerance). #16605 by Mandy Gu.**Enhancement**`datasets.fetch_openml`

now has two optional arguments`n_retries`

and`delay`

. By default,`datasets.fetch_openml`

will retry 3 times in case of a network failure with a delay between each try. #21901 by Rileran.**Fix**`datasets.fetch_covtype`

is now concurrent-safe: data is downloaded to a temporary directory before being moved to the data directory. #23113 by Ilion Beyst.**API Change**`datasets.make_sparse_coded_signal`

now accepts a parameter`data_transposed`

to explicitly specify the shape of matrix`X`

. The default behavior`True`

is to return a transposed matrix`X`

corresponding to a`(n_features, n_samples)`

shape. The default value will change to`False`

in version 1.3. #21425 by Gabriel Stefanini Vicente.

`sklearn.decomposition`

**Major Feature**Added a new estimator`decomposition.MiniBatchNMF`

. It is a faster but less accurate version of non-negative matrix factorization, better suited for large datasets. #16948 by Chiara Marmo, Patricio Cerda and Jérémie du Boisberranger.**Enhancement**`decomposition.dict_learning`

,`decomposition.dict_learning_online`

and`decomposition.sparse_encode`

preserve dtype for`numpy.float32`

.`decomposition.DictionaryLearning`

,`decomposition.MiniBatchDictionaryLearning`

and`decomposition.SparseCoder`

preserve dtype for`numpy.float32`

. #22002 by Takeshi Oura.**Enhancement**`decomposition.PCA`

exposes a parameter`n_oversamples`

to tune`utils.randomized_svd`

and get accurate results when the number of features is large. #21109 by Smile.**Enhancement**The`decomposition.MiniBatchDictionaryLearning`

and`decomposition.dict_learning_online`

have been refactored and now have a stopping criterion based on a small change of the dictionary or objective function, controlled by the new`max_iter`

,`tol`

and`max_no_improvement`

parameters. In addition, some of their parameters and attributes are deprecated.- the
`n_iter`

parameter of both is deprecated. Use`max_iter`

instead. - the
`iter_offset`

,`return_inner_stats`

,`inner_stats`

and`return_n_iter`

parameters of`decomposition.dict_learning_online`

serve internal purpose and are deprecated. - the
`inner_stats_`

,`iter_offset_`

and`random_state_`

attributes of`decomposition.MiniBatchDictionaryLearning`

serve internal purpose and are deprecated. - the default value of the
`batch_size`

parameter of both will change from 3 to 256 in version 1.3.

- the
**Enhancement**`decomposition.SparsePCA`

and`decomposition.MiniBatchSparsePCA`

preserve dtype for`numpy.float32`

. #22111 by Takeshi Oura.**Enhancement**`decomposition.TruncatedSVD`

now allows`n_components == n_features`

, if`algorithm='randomized'`

. #22181 by Zach Deane-Mayer.**Enhancement**Adds get_feature_names_out to all transformers in the`decomposition`

module:`decomposition.DictionaryLearning`

,`decomposition.FactorAnalysis`

,`decomposition.FastICA`

,`decomposition.IncrementalPCA`

,`decomposition.KernelPCA`

,`decomposition.LatentDirichletAllocation`

,`decomposition.MiniBatchDictionaryLearning`

,`decomposition.MiniBatchSparsePCA`

,`decomposition.NMF`

,`decomposition.PCA`

,`decomposition.SparsePCA`

, and`decomposition.TruncatedSVD`

. #21334 by Thomas Fan.**Enhancement**`decomposition.TruncatedSVD`

exposes the parameter`n_oversamples`

and`power_iteration_normalizer`

to tune`utils.randomized_svd`

and get accurate results when the number of features is large, the rank of the matrix is high, or other features of the matrix make low rank approximation difficult. #21705 by Jay S. Stanley III.**Enhancement**`decomposition.PCA`

exposes the parameter`power_iteration_normalizer`

to tune`utils.randomized_svd`

and get more accurate results when low rank approximation is difficult. #21705 by Jay S. Stanley III.**Fix**`decomposition.FastICA`

now validates input parameters in`fit`

instead of`__init__`

. #21432 by Hannah Bohle and Maren Westermann.**Fix**`decomposition.FastICA`

now accepts`np.float32`

data without silent upcasting. The dtype is preserved by`fit`

and`fit_transform`

and the main fitted attributes use a dtype of the same precision as the training data. #22806 by Jihane Bennis and Olivier Grisel.**Fix**`decomposition.FactorAnalysis`

now validates input parameters in`fit`

instead of`__init__`

. #21713 by Haya and Krum Arnaudov.**Fix**`decomposition.KernelPCA`

now validates input parameters in`fit`

instead of`__init__`

. #21567 by Maggie Chege.**Fix**`decomposition.PCA`

and`decomposition.IncrementalPCA`

more safely calculate precision using the inverse of the covariance matrix if`self.noise_variance_`

is zero. #22300 by Meekail Zain and #15948 by @sysuresh.**Fix**Greatly reduced peak memory usage in`decomposition.PCA`

when calling`fit`

or`fit_transform`

. #22553 by Meekail Zain.**API Change**`decomposition.FastICA`

now supports unit variance for whitening. The default value of its`whiten`

argument will change from`True`

(which behaves like`'arbitrary-variance'`

) to`'unit-variance'`

in version 1.3. #19490 by Facundo Ferrin and Julien Jerphanion.

`sklearn.discriminant_analysis`

**Enhancement**Adds get_feature_names_out to`discriminant_analysis.LinearDiscriminantAnalysis`

. #22120 by Thomas Fan.**Fix**`discriminant_analysis.LinearDiscriminantAnalysis`

now uses the correct variance-scaling coefficient which may result in different model behavior. #15984 by Okon Samuel and #22696 by Meekail Zain.

`sklearn.dummy`

**Fix**`dummy.DummyRegressor`

no longer overrides the`constant`

parameter during`fit`

. #22486 by Thomas Fan.

`sklearn.ensemble`

**Major Feature**Added additional option`loss="quantile"`

to`ensemble.HistGradientBoostingRegressor`

for modelling quantiles. The quantile level can be specified with the new parameter`quantile`

. #21800 and #20567 by Christian Lorentzen.**Efficiency**`fit`

of`ensemble.GradientBoostingClassifier`

and`ensemble.GradientBoostingRegressor`

now calls`utils.check_array`

with parameter`force_all_finite=False`

for non initial warm-start runs as it has already been checked before. #22159 by Geoffrey Paris.**Enhancement**`ensemble.HistGradientBoostingClassifier`

is faster, for binary and in particular for multiclass problems thanks to the new private loss function module. #20811, #20567 and #21814 by Christian Lorentzen.**Enhancement**Adds support to use pre-fit models with`cv="prefit"`

in`ensemble.StackingClassifier`

and`ensemble.StackingRegressor`

. #16748 by Siqi He and #22215 by Meekail Zain.**Enhancement**`ensemble.RandomForestClassifier`

and`ensemble.ExtraTreesClassifier`

have the new`criterion="log_loss"`

, which is equivalent to`criterion="entropy"`

. #23047 by Christian Lorentzen.**Enhancement**Adds get_feature_names_out to`ensemble.VotingClassifier`

,`ensemble.VotingRegressor`

,`ensemble.StackingClassifier`

, and`ensemble.StackingRegressor`

. #22695 and #22697 by Thomas Fan.**Enhancement**`ensemble.RandomTreesEmbedding`

now has an informative get_feature_names_out function that includes both tree index and leaf index in the output feature names. #21762 by Zhehao Liu and Thomas Fan.**Efficiency**Fitting a`ensemble.RandomForestClassifier`

,`ensemble.RandomForestRegressor`

,`ensemble.ExtraTreesClassifier`

,`ensemble.ExtraTreesRegressor`

, and`ensemble.RandomTreesEmbedding`

is now faster in a multiprocessing setting, especially for subsequent fits with`warm_start`

enabled. #22106 by Pieter Gijsbers.**Fix**Change the parameter`validation_fraction`

in`ensemble.GradientBoostingClassifier`

and`ensemble.GradientBoostingRegressor`

so that an error is raised if anything other than a float is passed in as an argument. #21632 by Genesis Valencia.**Fix**Removed a potential source of CPU oversubscription in`ensemble.HistGradientBoostingClassifier`

and`ensemble.HistGradientBoostingRegressor`

when CPU resource usage is limited, for instance using cgroups quota in a docker container. #22566 by Jérémie du Boisberranger.**Fix**`ensemble.HistGradientBoostingClassifier`

and`ensemble.HistGradientBoostingRegressor`

no longer warns when fitting on a pandas DataFrame with a non-default`scoring`

parameter and early_stopping enabled. #22908 by Thomas Fan.**Fix**Fixes HTML repr for`ensemble.StackingClassifier`

and`ensemble.StackingRegressor`

. #23097 by Thomas Fan.**API Change**The attribute`loss_`

of`ensemble.GradientBoostingClassifier`

and`ensemble.GradientBoostingRegressor`

has been deprecated and will be removed in version 1.3. #23079 by Christian Lorentzen.**API Change**Changed the default of`max_features`

to 1.0 for`ensemble.RandomForestRegressor`

and to`"sqrt"`

for`ensemble.RandomForestClassifier`

. Note that these give the same fit results as before, but are much easier to understand. The old default value`"auto"`

has been deprecated and will be removed in version 1.3. The same changes are also applied for`ensemble.ExtraTreesRegressor`

and`ensemble.ExtraTreesClassifier`

. #20803 by Brian Sun.**Efficiency**Improve runtime performance of`ensemble.IsolationForest`

by skipping repetitive input checks. #23149 by Zhehao Liu.

`sklearn.feature_extraction`

**Feature**`feature_extraction.FeatureHasher`

now supports PyPy. #23023 by Thomas Fan.**Fix**`feature_extraction.FeatureHasher`

now validates input parameters in`transform`

instead of`__init__`

. #21573 by Hannah Bohle and Maren Westermann.**Fix**`feature_extraction.text.TfidfVectorizer`

now does not create a`feature_extraction.text.TfidfTransformer`

at`__init__`

as required by our API. #21832 by Guillaume Lemaitre.

`sklearn.feature_selection`

**Feature**Added auto mode to`feature_selection.SequentialFeatureSelector`

. If the argument`n_features_to_select`

is`'auto'`

, select features until the score improvement does not exceed the argument`tol`

. The default value of`n_features_to_select`

changed from`None`

to`'warn'`

in 1.1 and will become`'auto'`

in 1.3.`None`

and`'warn'`

will be removed in 1.3. #20145 by murata-yu.**Feature**Added the ability to pass callables to the`max_features`

parameter of`feature_selection.SelectFromModel`

. Also introduced new attribute`max_features_`

which is inferred from`max_features`

and the data during`fit`

. If`max_features`

is an integer, then`max_features_ = max_features`

. If`max_features`

is a callable, then`max_features_ = max_features(X)`

. #22356 by Meekail Zain.**Enhancement**`feature_selection.GenericUnivariateSelect`

preserves float32 dtype. #18482 by Thierry Gameiro and Daniel Kharsa and #22370 by Meekail Zain.**Enhancement**Add a parameter`force_finite`

to`feature_selection.f_regression`

and`feature_selection.r_regression`

. This parameter allows to force the output to be finite in the case where a feature or a the target is constant or that the feature and target are perfectly correlated (only for the F-statistic). #17819 by Juan Carlos Alfaro Jiménez.**Efficiency**Improve runtime performance of`feature_selection.chi2`

with boolean arrays. #22235 by Thomas Fan.**Efficiency**Reduced memory usage of`feature_selection.chi2`

. #21837 by Louis Wagner.

`sklearn.gaussian_process`

**Fix**`predict`

and`sample_y`

methods of`gaussian_process.GaussianProcessRegressor`

now return arrays of the correct shape in single-target and multi-target cases, and for both`normalize_y=False`

and`normalize_y=True`

. #22199 by Guillaume Lemaitre, Aidar Shakerimoff and Tenavi Nakamura-Zimmerer.**Fix**`gaussian_process.GaussianProcessClassifier`

raises a more informative error if`CompoundKernel`

is passed via`kernel`

. #22223 by MarcoM.

`sklearn.impute`

**Enhancement**`impute.SimpleImputer`

now warns with feature names when features which are skipped due to the lack of any observed values in the training set. #21617 by Christian Ritter.**Enhancement**Added support for`pd.NA`

in`impute.SimpleImputer`

. #21114 by Ying Xiong.**Enhancement**Adds get_feature_names_out to`impute.SimpleImputer`

,`impute.KNNImputer`

,`impute.IterativeImputer`

, and`impute.MissingIndicator`

. #21078 by Thomas Fan.**API Change**The`verbose`

parameter was deprecated for`impute.SimpleImputer`

. A warning will always be raised upon the removal of empty columns. #21448 by Oleh Kozynets and Christian Ritter.

`sklearn.inspection`

**Feature**Add a display to plot the boundary decision of a classifier by using the method`inspection.DecisionBoundaryDisplay.from_estimator`

. #16061 by Thomas Fan.**Enhancement**In`inspection.PartialDependenceDisplay.from_estimator`

, allow`kind`

to accept a list of strings to specify which type of plot to draw for each feature interaction. #19438 by Guillaume Lemaitre.**Enhancement**`inspection.PartialDependenceDisplay.from_estimator`

,`inspection.PartialDependenceDisplay.plot`

, and`inspection.plot_partial_dependence`

now support plotting centered Individual Conditional Expectation (cICE) and centered PDP curves controlled by setting the parameter`centered`

. #18310 by Johannes Elfner and Guillaume Lemaitre.

`sklearn.isotonic`

**Enhancement**Adds get_feature_names_out to`isotonic.IsotonicRegression`

. #22249 by Thomas Fan.

`sklearn.kernel_approximation`

**Enhancement**Adds get_feature_names_out to`kernel_approximation.AdditiveChi2Sampler`

.`kernel_approximation.Nystroem`

,`kernel_approximation.PolynomialCountSketch`

,`kernel_approximation.RBFSampler`

, and`kernel_approximation.SkewedChi2Sampler`

. #22137 and #22694 by Thomas Fan.

`sklearn.linear_model`

**Feature**`linear_model.ElasticNet`

,`linear_model.ElasticNetCV`

,`linear_model.Lasso`

and`linear_model.LassoCV`

support`sample_weight`

for sparse input`X`

. #22808 by Christian Lorentzen.**Feature**`linear_model.Ridge`

with`solver="lsqr"`

now supports to fit sparse input with`fit_intercept=True`

. #22950 by Christian Lorentzen.**Enhancement**`linear_model.QuantileRegressor`

support sparse input for the highs based solvers. #21086 by Venkatachalam Natchiappan. In addition, those solvers now use the CSC matrix right from the beginning which speeds up fitting. #22206 by Christian Lorentzen.**Enhancement**`linear_model.LogisticRegression`

is faster for`solvers="lbfgs"`

and`solver="newton-cg"`

, for binary and in particular for multiclass problems thanks to the new private loss function module. In the multiclass case, the memory consumption has also been reduced for these solvers as the target is now label encoded (mapped to integers) instead of label binarized (one-hot encoded). The more classes, the larger the benefit. #21808, #20567 and #21814 by Christian Lorentzen.**Enhancement**`linear_model.GammaRegressor`

,`linear_model.PoissonRegressor`

and`linear_model.TweedieRegressor`

are faster for`solvers="lbfgs"`

. #22548, #21808 and #20567 by Christian Lorentzen.**Enhancement**Rename parameter`base_estimator`

to`estimator`

in`linear_model.RANSACRegressor`

to improve readability and consistency.`base_estimator`

is deprecated and will be removed in 1.3. #22062 by Adrian Trujillo.**Enhancement**`linear_model.ElasticNet`

and and other linear model classes using coordinate descent show error messages when non-finite parameter weights are produced. #22148 by Christian Ritter and Norbert Preining.**Enhancement**`linear_model.ElasticNet`

and`linear_model.Lasso`

now raise consistent error messages when passed invalid values for`l1_ratio`

,`alpha`

,`max_iter`

and`tol`

. #22240 by Arturo Amor.**Enhancement**`linear_model.BayesianRidge`

and`linear_model.ARDRegression`

now preserve float32 dtype. #9087 by Arthur Imbert and #22525 by Meekail Zain.**Enhancement**`linear_model.RidgeClassifier`

is now supporting multilabel classification. #19689 by Guillaume Lemaitre.**Enhancement**`linear_model.RidgeCV`

and`linear_model.RidgeClassifierCV`

now raise consistent error message when passed invalid values for`alphas`

. #21606 by Arturo Amor.**Enhancement**`linear_model.Ridge`

and`linear_model.RidgeClassifier`

now raise consistent error message when passed invalid values for`alpha`

,`max_iter`

and`tol`

. #21341 by Arturo Amor.**Enhancement**`linear_model.orthogonal_mp_gram`

preservse dtype for`numpy.float32`

. #22002 by Takeshi Oura.**Fix**`linear_model.LassoLarsIC`

now correctly computes AIC and BIC. An error is now raised when`n_features > n_samples`

and when the noise variance is not provided. #21481 by Guillaume Lemaitre and Andrés Babino.**Fix**`linear_model.TheilSenRegressor`

now validates input parameter`max_subpopulation`

in`fit`

instead of`__init__`

. #21767 by Maren Westermann.**Fix**`linear_model.ElasticNetCV`

now produces correct warning when`l1_ratio=0`

. #21724 by Yar Khine Phyo.**Fix**`linear_model.LogisticRegression`

and`linear_model.LogisticRegressionCV`

now set the`n_iter_`

attribute with a shape that respects the docstring and that is consistent with the shape obtained when using the other solvers in the one-vs-rest setting. Previously, it would record only the maximum of the number of iterations for each binary sub-problem while now all of them are recorded. #21998 by Olivier Grisel.**Fix**The property`family`

of`linear_model.TweedieRegressor`

is not validated in`__init__`

anymore. Instead, this (private) property is deprecated in`linear_model.GammaRegressor`

,`linear_model.PoissonRegressor`

and`linear_model.TweedieRegressor`

, and will be removed in 1.3. #22548 by Christian Lorentzen.**Fix**The`coef_`

and`intercept_`

attributes of`linear_model.LinearRegression`

are now correctly computed in the presence of sample weights when the input is sparse. #22891 by Jérémie du Boisberranger.**Fix**The`coef_`

and`intercept_`

attributes of`linear_model.Ridge`

with`solver="sparse_cg"`

and`solver="lbfgs"`

are now correctly computed in the presence of sample weights when the input is sparse. #22899 by Jérémie du Boisberranger.**Fix**`linear_model.SGDRegressor`

and`linear_model.SGDClassifier`

now computes the validation error correctly when early stopping is enabled. #23256 by Zhehao Liu.**API Change**`linear_model.LassoLarsIC`

now exposes`noise_variance`

as a parameter in order to provide an estimate of the noise variance. This is particularly relevant when`n_features > n_samples`

and the estimator of the noise variance cannot be computed. #21481 by Guillaume Lemaitre.

`sklearn.manifold`

**Feature**`manifold.Isomap`

now supports radius-based neighbors via the`radius`

argument. #19794 by Zhehao Liu.**Enhancement**`manifold.spectral_embedding`

and`manifold.SpectralEmbedding`

supports`np.float32`

dtype and will preserve this dtype. #21534 by Andrew Knyazev.**Enhancement**Adds get_feature_names_out to`manifold.Isomap`

and`manifold.LocallyLinearEmbedding`

. #22254 by Thomas Fan.**Enhancement**added`metric_params`

to`manifold.TSNE`

constructor for additional parameters of distance metric to use in optimization. #21805 by Jeanne Dionisi and #22685 by Meekail Zain.**Enhancement**`manifold.trustworthiness`

raises an error if`n_neighbours >= n_samples / 2`

to ensure a correct support for the function. #18832 by Hong Shao Yang and #23033 by Meekail Zain.**Fix**`manifold.spectral_embedding`

now uses Gaussian instead of the previous uniform on [0, 1] random initial approximations to eigenvectors in eigen_solvers`lobpcg`

and`amg`

to improve their numerical stability. #21565 by Andrew Knyazev.

`sklearn.metrics`

**Feature**`metrics.r2_score`

and`metrics.explained_variance_score`

have a new`force_finite`

parameter. Setting this parameter to`False`

will return the actual non-finite score in case of perfect predictions or constant`y_true`

, instead of the finite approximation (`1.0`

and`0.0`

respectively) currently returned by default. #17266 by Sylvain Marié.**Feature**`metrics.d2_pinball_score`

and`metrics.d2_absolute_error_score`

calculate the D2 regression score for the pinball loss and the absolute error respectively.`metrics.d2_absolute_error_score`

is a special case of`metrics.d2_pinball_score`

with a fixed quantile parameter`alpha=0.5`

for ease of use and discovery. The D2 scores are generalizations of the`r2_score`

and can be interpeted as the fraction of deviance explained. #22118 by Ohad Michel.**Enhancement**`metrics.top_k_accuracy_score`

raises an improved error message when`y_true`

is binary and`y_score`

is 2d. #22284 by Thomas Fan.**Enhancement**`metrics.roc_auc_score`

now supports`average=None`

in the multiclass case when`multiclass='ovr'`

which will return the score per class. #19158 by Nicki Skafte.**Enhancement**Adds`im_kw`

parameter to`metrics.ConfusionMatrixDisplay.from_estimator`

`metrics.ConfusionMatrixDisplay.from_predictions`

, and`metrics.ConfusionMatrixDisplay.plot`

. The`im_kw`

parameter is passed to the`matplotlib.pyplot.imshow`

call when plotting the confusion matrix. #20753 by Thomas Fan.**Fix**`metrics.silhouette_score`

now supports integer input for precomputed distances. #22108 by Thomas Fan.**Fix**Fixed a bug in`metrics.normalized_mutual_info_score`

which could return unbounded values. #22635 by Jérémie du Boisberranger.**Fix**Fixes`metrics.precision_recall_curve`

and`metrics.average_precision_score`

when true labels are all negative. #19085 by Varun Agrawal.**API Change**`metrics.SCORERS`

is now deprecated and will be removed in 1.3. Please use`metrics.get_scorer_names`

to retrieve the names of all available scorers. #22866 by Adrin Jalali.**API Change**Parameters`sample_weight`

and`multioutput`

of`metrics.mean_absolute_percentage_error`

are now keyword-only, in accordance with SLEP009. A deprecation cycle was introduced. #21576 by Paul-Emile Dugnat.**API Change**The`"wminkowski"`

metric of`metrics.DistanceMetric`

is deprecated and will be removed in version 1.3. Instead the existing`"minkowski"`

metric now takes in an optional`w`

parameter for weights. This deprecation aims at remaining consistent with SciPy 1.8 convention. #21873 by Yar Khine Phyo.**API Change**`metrics.DistanceMetric`

has been moved from`sklearn.neighbors`

to`sklearn.metrics`

. Using`neighbors.DistanceMetric`

for imports is still valid for backward compatibility, but this alias will be removed in 1.3. #21177 by Julien Jerphanion.

`sklearn.mixture`

**Enhancement**`mixture.GaussianMixture`

and`mixture.BayesianGaussianMixture`

can now be initialized using k-means++ and random data points. #20408 by Gordon Walsh, Alberto Ceballos and Andres Rios.**Fix**Fix a bug that correctly initialize`precisions_cholesky_`

in`mixture.GaussianMixture`

when providing`precisions_init`

by taking its square root. #22058 by Guillaume Lemaitre.**Fix**`mixture.GaussianMixture`

now normalizes`weights_`

more safely, preventing rounding errors when calling`mixture.GaussianMixture.sample`

with`n_components=1`

. #23034 by Meekail Zain.

`sklearn.model_selection`

**Enhancement**it is now possible to pass`scoring="matthews_corrcoef"`

to all model selection tools with a`scoring`

argument to use the Matthews correlation coefficient (MCC). #22203 by Olivier Grisel.**Enhancement**raise an error during cross-validation when the fits for all the splits failed. Similarly raise an error during grid-search when the fits for all the models and all the splits failed. #21026 by Loïc Estève.**Fix**`model_selection.GridSearchCV`

,`model_selection.HalvingGridSearchCV`

now validate input parameters in`fit`

instead of`__init__`

. #21880 by Mrinal Tyagi.**Fix**`model_selection.learning_curve`

now supports`partial_fit`

with regressors. #22982 by Thomas Fan.

`sklearn.multiclass`

**Enhancement**`multiclass.OneVsRestClassifier`

now supports a`verbose`

parameter so progress on fitting can be seen. #22508 by Chris Combs.**Fix**`multiclass.OneVsOneClassifier.predict`

returns correct predictions when the inner classifier only has a predict_proba. #22604 by Thomas Fan.

`sklearn.neighbors`

**Enhancement**Adds get_feature_names_out to`neighbors.RadiusNeighborsTransformer`

,`neighbors.KNeighborsTransformer`

and`neighbors.NeighborhoodComponentsAnalysis`

. #22212 by Meekail Zain.**Fix**`neighbors.KernelDensity`

now validates input parameters in`fit`

instead of`__init__`

. #21430 by Desislava Vasileva and Lucy Jimenez.**Fix**`neighbors.KNeighborsRegressor.predict`

now works properly when given an array-like input if`KNeighborsRegressor`

is first constructed with a callable passed to the`weights`

parameter. #22687 by Meekail Zain.

`sklearn.neural_network`

**Enhancement**`neural_network.MLPClassifier`

and`neural_network.MLPRegressor`

show error messages when optimizers produce non-finite parameter weights. #22150 by Christian Ritter and Norbert Preining.**Enhancement**Adds get_feature_names_out to`neural_network.BernoulliRBM`

. #22248 by Thomas Fan.

`sklearn.pipeline`

**Enhancement**Added support for “passthrough” in`pipeline.FeatureUnion`

. Setting a transformer to “passthrough” will pass the features unchanged. #20860 by Shubhraneel Pal.**Fix**`pipeline.Pipeline`

now does not validate hyper-parameters in`__init__`

but in`.fit()`

. #21888 by iofall and Arisa Y..**Fix**`pipeline.FeatureUnion`

does not validate hyper-parameters in`__init__`

. Validation is now handled in`.fit()`

and`.fit_transform()`

. #21954 by iofall and Arisa Y..**Fix**Defines`__sklearn_is_fitted__`

in`pipeline.FeatureUnion`

to return correct result with`utils.validation.check_is_fitted`

. #22953 by randomgeek78.

`sklearn.preprocessing`

**Feature**`preprocessing.OneHotEncoder`

now supports grouping infrequent categories into a single feature. Grouping infrequent categories is enabled by specifying how to select infrequent categories with`min_frequency`

or`max_categories`

. #16018 by Thomas Fan.**Enhancement**Adds a`subsample`

parameter to`preprocessing.KBinsDiscretizer`

. This allows specifying a maximum number of samples to be used while fitting the model. The option is only available when`strategy`

is set to`quantile`

. #21445 by Felipe Bidu and Amanda Dsouza.**Enhancement**Adds`encoded_missing_value`

to`preprocessing.OrdinalEncoder`

to configure the encoded value for missing data. #21988 by Thomas Fan.**Enhancement**Added the`get_feature_names_out`

method and a new parameter`feature_names_out`

to`preprocessing.FunctionTransformer`

. You can set`feature_names_out`

to ‘one-to-one’ to use the input features names as the output feature names, or you can set it to a callable that returns the output feature names. This is especially useful when the transformer changes the number of features. If`feature_names_out`

is None (which is the default), then`get_output_feature_names`

is not defined. #21569 by Aurélien Geron.**Enhancement**Adds get_feature_names_out to`preprocessing.Normalizer`

,`preprocessing.KernelCenterer`

,`preprocessing.OrdinalEncoder`

, and`preprocessing.Binarizer`

. #21079 by Thomas Fan.**Fix**`preprocessing.PowerTransformer`

with`method='yeo-johnson'`

better supports significantly non-Gaussian data when searching for an optimal lambda. #20653 by Thomas Fan.**Fix**`preprocessing.LabelBinarizer`

now validates input parameters in`fit`

instead of`__init__`

. #21434 by Krum Arnaudov.**Fix**`preprocessing.FunctionTransformer`

with`check_inverse=True`

now provides informative error message when input has mixed dtypes. #19916 by Zhehao Liu.**Fix**`preprocessing.KBinsDiscretizer`

handles bin edges more consistently now. #14975 by Andreas Müller and #22526 by Meekail Zain.**Fix**Adds`preprocessing.KBinsDiscretizer.get_feature_names_out`

support when`encode="ordinal"`

. #22735 by Thomas Fan.

`sklearn.random_projection`

**Enhancement**Adds an`inverse_transform`

method and a`compute_inverse_transform`

parameter to`random_projection.GaussianRandomProjection`

and`random_projection.SparseRandomProjection`

. When the parameter is set to True, the pseudo-inverse of the components is computed during`fit`

and stored as`inverse_components_`

. #21701 by Aurélien Geron.**Enhancement**`random_projection.SparseRandomProjection`

and`random_projection.GaussianRandomProjection`

preserves dtype for`numpy.float32`

. #22114 by Takeshi Oura.**Enhancement**Adds get_feature_names_out to all transformers in the`sklearn.random_projection`

module:`random_projection.GaussianRandomProjection`

and`random_projection.SparseRandomProjection`

. #21330 by Loïc Estève.

`sklearn.svm`

**Enhancement**`svm.OneClassSVM`

,`svm.NuSVC`

,`svm.NuSVR`

,`svm.SVC`

and`svm.SVR`

now expose`n_iter_`

, the number of iterations of the libsvm optimization routine. #21408 by Juan Martín Loyola.**Enhancement**`svm.SVR`

,`svm.SVC`

,`svm.NuSVR`

,`svm.OneClassSVM`

,`svm.NuSVC`

now raise an error when the dual-gap estimation produce non-finite parameter weights. #22149 by Christian Ritter and Norbert Preining.**Fix**`svm.NuSVC`

,`svm.NuSVR`

,`svm.SVC`

,`svm.SVR`

,`svm.OneClassSVM`

now validate input parameters in`fit`

instead of`__init__`

. #21436 by Haidar Almubarak.

`sklearn.tree`

**Enhancement**`tree.DecisionTreeClassifier`

and`tree.ExtraTreeClassifier`

have the new`criterion="log_loss"`

, which is equivalent to`criterion="entropy"`

. #23047 by Christian Lorentzen.**Fix**Fix a bug in the Poisson splitting criterion for`tree.DecisionTreeRegressor`

. #22191 by Christian Lorentzen.**API Change**Changed the default value of`max_features`

to 1.0 for`tree.ExtraTreeRegressor`

and to`"sqrt"`

for`tree.ExtraTreeClassifier`

, which will not change the fit result. The original default value`"auto"`

has been deprecated and will be removed in version 1.3. Setting`max_features`

to`"auto"`

is also deprecated for`tree.DecisionTreeClassifier`

and`tree.DecisionTreeRegressor`

. #22476 by Zhehao Liu.

`sklearn.utils`

**Enhancement**`utils.check_array`

and`utils.multiclass.type_of_target`

now accept an`input_name`

parameter to make the error message more informative when passed invalid input data (e.g. with NaN or infinite values). #21219 by Olivier Grisel.**Enhancement**`utils.check_array`

returns a float ndarray with`np.nan`

when passed a`Float32`

or`Float64`

pandas extension array with`pd.NA`

. #21278 by Thomas Fan.**Enhancement**`utils.estimator_html_repr`

shows a more helpful error message when running in a jupyter notebook that is not trusted. #21316 by Thomas Fan.**Enhancement**`utils.estimator_html_repr`

displays an arrow on the top left corner of the HTML representation to show how the elements are clickable. #21298 by Thomas Fan.**Enhancement**`utils.check_array`

with`dtype=None`

returns numeric arrays when passed in a pandas DataFrame with mixed dtypes.`dtype="numeric"`

will also make better infer the dtype when the DataFrame has mixed dtypes. #22237 by Thomas Fan.**Enhancement**`utils.check_scalar`

now has better messages when displaying the type. #22218 by Thomas Fan.**Fix**Changes the error message of the`ValidationError`

raised by`utils.check_X_y`

when y is None so that it is compatible with the`check_requires_y_none`

estimator check. #22578 by Claudio Salvatore Arcidiacono.**Fix**`utils.class_weight.compute_class_weight`

now only requires that all classes in`y`

have a weight in`class_weight`

. An error is still raised when a class is present in`y`

but not in`class_weight`

. #22595 by Thomas Fan.**Fix**`utils.estimator_html_repr`

has an improved visualization for nested meta-estimators. #21310 by Thomas Fan.**Fix**`utils.check_scalar`

raises an error when`include_boundaries={"left", "right"}`

and the boundaries are not set. #22027 by Marie Lanternier.**Fix**`utils.metaestimators.available_if`

correctly returns a bounded method that can be pickled. #23077 by Thomas Fan.**API Change**`utils.estimator_checks.check_estimator`

’s argument is now called`estimator`

(previous name was`Estimator`

). #22188 by Mathurin Massias.**API Change**`utils.metaestimators.if_delegate_has_method`

is deprecated and will be removed in version 1.3. Use`utils.metaestimators.available_if`

instead. #22830 by Jérémie du Boisberranger.

**Have any questions?**__Contact Exxact Today__

## scikit-learn 1.1 Released

### scikit-learn 1.1 Now Available

__scikit-learn__ is an open source machine learning library that supports supervised and unsupervised learning, and is used by an estimated 80% of data scientists, according to a recent Kaggle survey.

The library contains implementations of many common ML algorithms and models, including the widely-used linear regression, decision tree, and gradient-boosting algorithms. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities.

Highlights include:

- Quantile loss in
`ensemble.HistGradientBoostingRegressor`

`get_feature_names_out`

Available in all Transformers- Grouping infrequent categories in
`OneHotEncoder`

- Performance improvements
- MiniBatchNMF: an online version of NMF
- BisectingKMeans: divide and cluster

For more details on the main highlights of the release, please refer to __Release Highlights for scikit-learn 1.1__.

To install the latest version (with pip):

pip install --upgrade scikit-learn

or with conda:

conda install -c conda-forge scikit-learn

#### Version 1.1.0

For a short description of the main highlights of the release, please refer to __Release Highlights for scikit-learn 1.1__.

**Major Feature**: something big that you couldn’t do before.**Feature**: something that you couldn’t do before.**Efficiency**: an existing feature now may not require as much computation or memory.**Enhancement**: a miscellaneous minor improvement.**Fix**: something that previously didn’t work as documentated – or according to reasonable expectations – should now work.**API Change**: you will need to change your code to have the same effect in the future; or a feature will be removed in the future.

Version 1.1.0 of scikit-learn requires python 3.8+, numpy 1.17.3+ and scipy 1.3.2+. Optional minimal dependency is matplotlib 3.1.2+.

**Changed models**

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

**Efficiency**`cluster.KMeans`

now defaults to`algorithm="lloyd"`

instead of`algorithm="auto"`

, which was equivalent to`algorithm="elkan"`

. Lloyd’s algorithm and Elkan’s algorithm converge to the same solution, up to numerical rounding errors, but in general Lloyd’s algorithm uses much less memory, and it is often faster.**Efficiency**Fitting`tree.DecisionTreeClassifier`

,`tree.DecisionTreeRegressor`

,`ensemble.RandomForestClassifier`

,`ensemble.RandomForestRegressor`

,`ensemble.GradientBoostingClassifier`

, and`ensemble.GradientBoostingRegressor`

is on average 15% faster than in previous versions thanks to a new sort algorithm to find the best split. Models might be different because of a different handling of splits with tied criterion values: both the old and the new sorting algorithm are unstable sorting algorithms. #22868 by Thomas Fan.**Fix**The eigenvectors initialization for`cluster.SpectralClustering`

and`manifold.SpectralEmbedding`

now samples from a Gaussian when using the`'amg'`

or`'lobpcg'`

solver. This change improves numerical stability of the solver, but may result in a different model.**Fix**`feature_selection.f_regression`

and`feature_selection.r_regression`

will now returned finite score by default instead of`np.nan`

and`np.inf`

for some corner case. You can use`force_finite=False`

if you really want to get non-finite values and keep the old behavior.**Fix**Panda’s DataFrames with all non-string columns such as a MultiIndex no longer warns when passed into an Estimator. Estimators will continue to ignore the column names in DataFrames with non-string columns. For`feature_names_in_`

to be defined, columns must be all strings. #22410 by Thomas Fan.**Fix**`preprocessing.KBinsDiscretizer`

changed handling of bin edges slightly, which might result in a different encoding with the same data.**Fix**`calibration.calibration_curve`

changed handling of bin edges slightly, which might result in a different output curve given the same data.**Fix**`discriminant_analysis.LinearDiscriminantAnalysis`

now uses the correct variance-scaling coefficient which may result in different model behavior.**Fix**`feature_selection.SelectFromModel.fit`

and`feature_selection.SelectFromModel.partial_fit`

can now be called with`prefit=True`

.`estimators_`

will be a deep copy of`estimator`

when`prefit=True`

. #23271 by Guillaume Lemaitre.

### Changelog

**Efficiency**Low-level routines for reductions on pairwise distances for dense float64 datasets have been refactored. The following functions and estimators now benefit from improved performances in terms of hardware scalability and speed-ups:`sklearn.metrics.pairwise_distances_argmin`

`sklearn.metrics.pairwise_distances_argmin_min`

`sklearn.cluster.AffinityPropagation`

`sklearn.cluster.Birch`

`sklearn.cluster.MeanShift`

`sklearn.cluster.OPTICS`

`sklearn.cluster.SpectralClustering`

`sklearn.feature_selection.mutual_info_regression`

`sklearn.neighbors.KNeighborsClassifier`

`sklearn.neighbors.KNeighborsRegressor`

`sklearn.neighbors.RadiusNeighborsClassifier`

`sklearn.neighbors.RadiusNeighborsRegressor`

`sklearn.neighbors.LocalOutlierFactor`

`sklearn.neighbors.NearestNeighbors`

`sklearn.manifold.Isomap`

`sklearn.manifold.LocallyLinearEmbedding`

`sklearn.manifold.TSNE`

`sklearn.manifold.trustworthiness`

`sklearn.semi_supervised.LabelPropagation`

`sklearn.semi_supervised.LabelSpreading`

For instance

`sklearn.neighbors.NearestNeighbors.kneighbors`

and`sklearn.neighbors.NearestNeighbors.radius_neighbors`

can respectively be up to ×20 and ×5 faster than previously. #21987, #22064, #22065, #22288 and #22320 by Julien Jerphanion.**Enhancement**All scikit-learn models now generate a more informative error message when some input contains unexpected`NaN`

or infinite values. In particular the message contains the input name (“X”, “y” or “sample_weight”) and if an unexpected`NaN`

value is found in`X`

, the error message suggests potential solutions. #21219 by Olivier Grisel.**Enhancement**All scikit-learn models now generate a more informative error message when setting invalid hyper-parameters with`set_params`

. #21542 by Olivier Grisel.**Enhancement**Removes random unique identifiers in the HTML representation. With this change, jupyter notebooks are reproducible as long as the cells are run in the same order. #23098 by Thomas Fan.**Fix**Estimators with`non_deterministic`

tag set to`True`

will skip both`check_methods_sample_order_invariance`

and`check_methods_subset_invariance`

tests. #22318 by Zhehao Liu.**API Change**The option for using the log loss, aka binomial or multinomial deviance, via the`loss`

parameters was made more consistent. The preferred way is by setting the value to`"log_loss"`

. Old option names are still valid and produce the same models, but are deprecated and will be removed in version 1.3.- For
`ensemble.GradientBoostingClassifier`

, the`loss`

parameter name “deviance” is deprecated in favor of the new name “log_loss”, which is now the default. #23036 by Christian Lorentzen. - For
`ensemble.HistGradientBoostingClassifier`

, the`loss`

parameter names “auto”, “binary_crossentropy” and “categorical_crossentropy” are deprecated in favor of the new name “log_loss”, which is now the default. #23040 by Christian Lorentzen. - For
`linear_model.SGDClassifier`

, the`loss`

parameter name “log” is deprecated in favor of the new name “log_loss”. #23046 by Christian Lorentzen.

- For
**API Change**Rich html representation of estimators is now enabled by default in Jupyter notebooks. It can be deactivated by setting`display='text'`

in`sklearn.set_config`

. #22856 by Jérémie du Boisberranger.**Enhancement**The error message is improved when importing`model_selection.HalvingGridSearchCV`

,`model_selection.HalvingRandomSearchCV`

, or`impute.IterativeImputer`

without importing the experimental flag. #23194 by Thomas Fan.**Enhancement**Added an extension in doc/conf.py to automatically generate the list of estimators that handle NaN values. #23198 by Lise Kleiber, Zhehao Liu and Chiara Marmo.

`sklearn.calibration`

**Enhancement**`calibration.calibration_curve`

accepts a parameter`pos_label`

to specify the positive class label. #21032 by Guillaume Lemaitre.**Enhancement**`calibration.CalibratedClassifierCV.fit`

now supports passing`fit_params`

, which are routed to the`base_estimator`

. #18170 by Benjamin Bossan.**Enhancement**`calibration.CalibrationDisplay`

accepts a parameter`pos_label`

to add this information to the plot. #21038 by Guillaume Lemaitre.**Fix**`calibration.calibration_curve`

handles bin edges more consistently now. #14975 by Andreas Müller and #22526 by Meekail Zain.**API Change**`calibration.calibration_curve`

’s`normalize`

parameter is now deprecated and will be removed in version 1.3. It is recommended that a proper probability (i.e. a classifier’s predict_proba positive class) is used for`y_prob`

. #23095 by Jordan Silke.

`sklearn.cluster`

**Major Feature**`BisectingKMeans`

introducing Bisecting K-Means algorithm #20031 by Michal Krawczyk, Tom Dupre la Tour and Jérémie du Boisberranger.**Enhancement**`cluster.SpectralClustering`

and`cluster.spectral_clustering`

now include the new`'cluster_qr'`

method that clusters samples in the embedding space as an alternative to the existing`'kmeans'`

and`'discrete'`

methods. See`cluster.spectral_clustering`

for more details. #21148 by Andrew Knyazev.**Enhancement**Adds get_feature_names_out to`cluster.Birch`

,`cluster.FeatureAgglomeration`

,`cluster.KMeans`

,`cluster.MiniBatchKMeans`

. #22255 by Thomas Fan.**Enhancement**`cluster.SpectralClustering`

now raises consistent error messages when passed invalid values for`n_clusters`

,`n_init`

,`gamma`

,`n_neighbors`

,`eigen_tol`

or`degree`

. #21881 by Hugo Vassard.**Enhancement**`cluster.AffinityPropagation`

now returns cluster centers and labels if they exist, even if the model has not fully converged. When returning these potentially-degenerate cluster centers and labels, a new warning message is shown. If no cluster centers were constructed, then the cluster centers remain an empty list with labels set to`-1`

and the original warning message is shown. #22217 by Meekail Zain.**Efficiency**In`cluster.KMeans`

, the default`algorithm`

is now`"lloyd"`

which is the full classical EM-style algorithm. Both`"auto"`

and`"full"`

are deprecated and will be removed in version 1.3. They are now aliases for`"lloyd"`

. The previous default was`"auto"`

, which relied on Elkan’s algorithm. Lloyd’s algorithm uses less memory than Elkan’s, it is faster on many datasets, and its results are identical, hence the change. #21735 by Aurélien Geron.**Fix**`cluster.KMeans`

’s`init`

parameter now properly supports array-like input and NumPy string scalars. #22154 by Thomas Fan.

`sklearn.compose`

**Fix**`compose.ColumnTransformer`

now removes validation errors from`__init__`

and`set_params`

methods. #22537 by iofall and Arisa Y..**Fix**get_feature_names_out functionality in`compose.ColumnTransformer`

was broken when columns were specified using`slice`

. This is fixed in #22775 and #22913 by randomgeek78.

`sklearn.covariance`

**Fix**`covariance.GraphicalLassoCV`

now accepts NumPy array for the parameter`alphas`

. #22493 by Guillaume Lemaitre.

`sklearn.cross_decomposition`

**Enhancement**the`inverse_transform`

method of`cross_decomposition.PLSRegression`

,`cross_decomposition.PLSCanonical`

and`cross_decomposition.CCA`

now allows reconstruction of a`X`

target when a`Y`

parameter is given. #19680 by Robin Thibaut.**Enhancement**Adds get_feature_names_out to all transformers in the`cross_decomposition`

module:`cross_decomposition.CCA`

,`cross_decomposition.PLSSVD`

,`cross_decomposition.PLSRegression`

, and`cross_decomposition.PLSCanonical`

. #22119 by Thomas Fan.**Fix**The shape of the coef_ attribute of`cross_decomposition.CCA`

,`cross_decomposition.PLSCanonical`

and`cross_decomposition.PLSRegression`

will change in version 1.3, from`(n_features, n_targets)`

to`(n_targets, n_features)`

, to be consistent with other linear models and to make it work with interface expecting a specific shape for`coef_`

(e.g.`feature_selection.RFE`

). #22016 by Guillaume Lemaitre.**API Change**add the fitted attribute`intercept_`

to`cross_decomposition.PLSCanonical`

,`cross_decomposition.PLSRegression`

, and`cross_decomposition.CCA`

. The method`predict`

is indeed equivalent to`Y = X @ coef_ + intercept_`

. #22015 by Guillaume Lemaitre.

`sklearn.datasets`

**Feature**`datasets.load_files`

now accepts a ignore list and an allow list based on file extensions. #19747 by Tony Attalla and #22498 by Meekail Zain.**Enhancement**`datasets.make_swiss_roll`

now supports the optional argument hole; when set to True, it returns the swiss-hole dataset. #21482 by Sebastian Pujalte.**Enhancement**`datasets.make_blobs`

no longer copies data during the generation process, therefore uses less memory. #22412 by Zhehao Liu.**Enhancement**`datasets.load_diabetes`

now accepts the parameter`scaled`

, to allow loading unscaled data. The scaled version of this dataset is now computed from the unscaled data, and can produce slightly different results that in previous version (within a 1e-4 absolute tolerance). #16605 by Mandy Gu.**Enhancement**`datasets.fetch_openml`

now has two optional arguments`n_retries`

and`delay`

. By default,`datasets.fetch_openml`

will retry 3 times in case of a network failure with a delay between each try. #21901 by Rileran.**Fix**`datasets.fetch_covtype`

is now concurrent-safe: data is downloaded to a temporary directory before being moved to the data directory. #23113 by Ilion Beyst.**API Change**`datasets.make_sparse_coded_signal`

now accepts a parameter`data_transposed`

to explicitly specify the shape of matrix`X`

. The default behavior`True`

is to return a transposed matrix`X`

corresponding to a`(n_features, n_samples)`

shape. The default value will change to`False`

in version 1.3. #21425 by Gabriel Stefanini Vicente.

`sklearn.decomposition`

**Major Feature**Added a new estimator`decomposition.MiniBatchNMF`

. It is a faster but less accurate version of non-negative matrix factorization, better suited for large datasets. #16948 by Chiara Marmo, Patricio Cerda and Jérémie du Boisberranger.**Enhancement**`decomposition.dict_learning`

,`decomposition.dict_learning_online`

and`decomposition.sparse_encode`

preserve dtype for`numpy.float32`

.`decomposition.DictionaryLearning`

,`decomposition.MiniBatchDictionaryLearning`

and`decomposition.SparseCoder`

preserve dtype for`numpy.float32`

. #22002 by Takeshi Oura.**Enhancement**`decomposition.PCA`

exposes a parameter`n_oversamples`

to tune`utils.randomized_svd`

and get accurate results when the number of features is large. #21109 by Smile.**Enhancement**The`decomposition.MiniBatchDictionaryLearning`

and`decomposition.dict_learning_online`

have been refactored and now have a stopping criterion based on a small change of the dictionary or objective function, controlled by the new`max_iter`

,`tol`

and`max_no_improvement`

parameters. In addition, some of their parameters and attributes are deprecated.- the
`n_iter`

parameter of both is deprecated. Use`max_iter`

instead. - the
`iter_offset`

,`return_inner_stats`

,`inner_stats`

and`return_n_iter`

parameters of`decomposition.dict_learning_online`

serve internal purpose and are deprecated. - the
`inner_stats_`

,`iter_offset_`

and`random_state_`

attributes of`decomposition.MiniBatchDictionaryLearning`

serve internal purpose and are deprecated. - the default value of the
`batch_size`

parameter of both will change from 3 to 256 in version 1.3.

- the
**Enhancement**`decomposition.SparsePCA`

and`decomposition.MiniBatchSparsePCA`

preserve dtype for`numpy.float32`

. #22111 by Takeshi Oura.**Enhancement**`decomposition.TruncatedSVD`

now allows`n_components == n_features`

, if`algorithm='randomized'`

. #22181 by Zach Deane-Mayer.**Enhancement**Adds get_feature_names_out to all transformers in the`decomposition`

module:`decomposition.DictionaryLearning`

,`decomposition.FactorAnalysis`

,`decomposition.FastICA`

,`decomposition.IncrementalPCA`

,`decomposition.KernelPCA`

,`decomposition.LatentDirichletAllocation`

,`decomposition.MiniBatchDictionaryLearning`

,`decomposition.MiniBatchSparsePCA`

,`decomposition.NMF`

,`decomposition.PCA`

,`decomposition.SparsePCA`

, and`decomposition.TruncatedSVD`

. #21334 by Thomas Fan.**Enhancement**`decomposition.TruncatedSVD`

exposes the parameter`n_oversamples`

and`power_iteration_normalizer`

to tune`utils.randomized_svd`

and get accurate results when the number of features is large, the rank of the matrix is high, or other features of the matrix make low rank approximation difficult. #21705 by Jay S. Stanley III.**Enhancement**`decomposition.PCA`

exposes the parameter`power_iteration_normalizer`

to tune`utils.randomized_svd`

and get more accurate results when low rank approximation is difficult. #21705 by Jay S. Stanley III.**Fix**`decomposition.FastICA`

now validates input parameters in`fit`

instead of`__init__`

. #21432 by Hannah Bohle and Maren Westermann.**Fix**`decomposition.FastICA`

now accepts`np.float32`

data without silent upcasting. The dtype is preserved by`fit`

and`fit_transform`

and the main fitted attributes use a dtype of the same precision as the training data. #22806 by Jihane Bennis and Olivier Grisel.**Fix**`decomposition.FactorAnalysis`

now validates input parameters in`fit`

instead of`__init__`

. #21713 by Haya and Krum Arnaudov.**Fix**`decomposition.KernelPCA`

now validates input parameters in`fit`

instead of`__init__`

. #21567 by Maggie Chege.**Fix**`decomposition.PCA`

and`decomposition.IncrementalPCA`

more safely calculate precision using the inverse of the covariance matrix if`self.noise_variance_`

is zero. #22300 by Meekail Zain and #15948 by @sysuresh.**Fix**Greatly reduced peak memory usage in`decomposition.PCA`

when calling`fit`

or`fit_transform`

. #22553 by Meekail Zain.**API Change**`decomposition.FastICA`

now supports unit variance for whitening. The default value of its`whiten`

argument will change from`True`

(which behaves like`'arbitrary-variance'`

) to`'unit-variance'`

in version 1.3. #19490 by Facundo Ferrin and Julien Jerphanion.

`sklearn.discriminant_analysis`

**Enhancement**Adds get_feature_names_out to`discriminant_analysis.LinearDiscriminantAnalysis`

. #22120 by Thomas Fan.**Fix**`discriminant_analysis.LinearDiscriminantAnalysis`

now uses the correct variance-scaling coefficient which may result in different model behavior. #15984 by Okon Samuel and #22696 by Meekail Zain.

`sklearn.dummy`

**Fix**`dummy.DummyRegressor`

no longer overrides the`constant`

parameter during`fit`

. #22486 by Thomas Fan.

`sklearn.ensemble`

**Major Feature**Added additional option`loss="quantile"`

to`ensemble.HistGradientBoostingRegressor`

for modelling quantiles. The quantile level can be specified with the new parameter`quantile`

. #21800 and #20567 by Christian Lorentzen.**Efficiency**`fit`

of`ensemble.GradientBoostingClassifier`

and`ensemble.GradientBoostingRegressor`

now calls`utils.check_array`

with parameter`force_all_finite=False`

for non initial warm-start runs as it has already been checked before. #22159 by Geoffrey Paris.**Enhancement**`ensemble.HistGradientBoostingClassifier`

is faster, for binary and in particular for multiclass problems thanks to the new private loss function module. #20811, #20567 and #21814 by Christian Lorentzen.**Enhancement**Adds support to use pre-fit models with`cv="prefit"`

in`ensemble.StackingClassifier`

and`ensemble.StackingRegressor`

. #16748 by Siqi He and #22215 by Meekail Zain.**Enhancement**`ensemble.RandomForestClassifier`

and`ensemble.ExtraTreesClassifier`

have the new`criterion="log_loss"`

, which is equivalent to`criterion="entropy"`

. #23047 by Christian Lorentzen.**Enhancement**Adds get_feature_names_out to`ensemble.VotingClassifier`

,`ensemble.VotingRegressor`

,`ensemble.StackingClassifier`

, and`ensemble.StackingRegressor`

. #22695 and #22697 by Thomas Fan.**Enhancement**`ensemble.RandomTreesEmbedding`

now has an informative get_feature_names_out function that includes both tree index and leaf index in the output feature names. #21762 by Zhehao Liu and Thomas Fan.**Efficiency**Fitting a`ensemble.RandomForestClassifier`

,`ensemble.RandomForestRegressor`

,`ensemble.ExtraTreesClassifier`

,`ensemble.ExtraTreesRegressor`

, and`ensemble.RandomTreesEmbedding`

is now faster in a multiprocessing setting, especially for subsequent fits with`warm_start`

enabled. #22106 by Pieter Gijsbers.**Fix**Change the parameter`validation_fraction`

in`ensemble.GradientBoostingClassifier`

and`ensemble.GradientBoostingRegressor`

so that an error is raised if anything other than a float is passed in as an argument. #21632 by Genesis Valencia.**Fix**Removed a potential source of CPU oversubscription in`ensemble.HistGradientBoostingClassifier`

and`ensemble.HistGradientBoostingRegressor`

when CPU resource usage is limited, for instance using cgroups quota in a docker container. #22566 by Jérémie du Boisberranger.**Fix**`ensemble.HistGradientBoostingClassifier`

and`ensemble.HistGradientBoostingRegressor`

no longer warns when fitting on a pandas DataFrame with a non-default`scoring`

parameter and early_stopping enabled. #22908 by Thomas Fan.**Fix**Fixes HTML repr for`ensemble.StackingClassifier`

and`ensemble.StackingRegressor`

. #23097 by Thomas Fan.**API Change**The attribute`loss_`

of`ensemble.GradientBoostingClassifier`

and`ensemble.GradientBoostingRegressor`

has been deprecated and will be removed in version 1.3. #23079 by Christian Lorentzen.**API Change**Changed the default of`max_features`

to 1.0 for`ensemble.RandomForestRegressor`

and to`"sqrt"`

for`ensemble.RandomForestClassifier`

. Note that these give the same fit results as before, but are much easier to understand. The old default value`"auto"`

has been deprecated and will be removed in version 1.3. The same changes are also applied for`ensemble.ExtraTreesRegressor`

and`ensemble.ExtraTreesClassifier`

. #20803 by Brian Sun.**Efficiency**Improve runtime performance of`ensemble.IsolationForest`

by skipping repetitive input checks. #23149 by Zhehao Liu.

`sklearn.feature_extraction`

**Feature**`feature_extraction.FeatureHasher`

now supports PyPy. #23023 by Thomas Fan.**Fix**`feature_extraction.FeatureHasher`

now validates input parameters in`transform`

instead of`__init__`

. #21573 by Hannah Bohle and Maren Westermann.**Fix**`feature_extraction.text.TfidfVectorizer`

now does not create a`feature_extraction.text.TfidfTransformer`

at`__init__`

as required by our API. #21832 by Guillaume Lemaitre.

`sklearn.feature_selection`

**Feature**Added auto mode to`feature_selection.SequentialFeatureSelector`

. If the argument`n_features_to_select`

is`'auto'`

, select features until the score improvement does not exceed the argument`tol`

. The default value of`n_features_to_select`

changed from`None`

to`'warn'`

in 1.1 and will become`'auto'`

in 1.3.`None`

and`'warn'`

will be removed in 1.3. #20145 by murata-yu.**Feature**Added the ability to pass callables to the`max_features`

parameter of`feature_selection.SelectFromModel`

. Also introduced new attribute`max_features_`

which is inferred from`max_features`

and the data during`fit`

. If`max_features`

is an integer, then`max_features_ = max_features`

. If`max_features`

is a callable, then`max_features_ = max_features(X)`

. #22356 by Meekail Zain.**Enhancement**`feature_selection.GenericUnivariateSelect`

preserves float32 dtype. #18482 by Thierry Gameiro and Daniel Kharsa and #22370 by Meekail Zain.**Enhancement**Add a parameter`force_finite`

to`feature_selection.f_regression`

and`feature_selection.r_regression`

. This parameter allows to force the output to be finite in the case where a feature or a the target is constant or that the feature and target are perfectly correlated (only for the F-statistic). #17819 by Juan Carlos Alfaro Jiménez.**Efficiency**Improve runtime performance of`feature_selection.chi2`

with boolean arrays. #22235 by Thomas Fan.**Efficiency**Reduced memory usage of`feature_selection.chi2`

. #21837 by Louis Wagner.

`sklearn.gaussian_process`

**Fix**`predict`

and`sample_y`

methods of`gaussian_process.GaussianProcessRegressor`

now return arrays of the correct shape in single-target and multi-target cases, and for both`normalize_y=False`

and`normalize_y=True`

. #22199 by Guillaume Lemaitre, Aidar Shakerimoff and Tenavi Nakamura-Zimmerer.**Fix**`gaussian_process.GaussianProcessClassifier`

raises a more informative error if`CompoundKernel`

is passed via`kernel`

. #22223 by MarcoM.

`sklearn.impute`

**Enhancement**`impute.SimpleImputer`

now warns with feature names when features which are skipped due to the lack of any observed values in the training set. #21617 by Christian Ritter.**Enhancement**Added support for`pd.NA`

in`impute.SimpleImputer`

. #21114 by Ying Xiong.**Enhancement**Adds get_feature_names_out to`impute.SimpleImputer`

,`impute.KNNImputer`

,`impute.IterativeImputer`

, and`impute.MissingIndicator`

. #21078 by Thomas Fan.**API Change**The`verbose`

parameter was deprecated for`impute.SimpleImputer`

. A warning will always be raised upon the removal of empty columns. #21448 by Oleh Kozynets and Christian Ritter.

`sklearn.inspection`

**Feature**Add a display to plot the boundary decision of a classifier by using the method`inspection.DecisionBoundaryDisplay.from_estimator`

. #16061 by Thomas Fan.**Enhancement**In`inspection.PartialDependenceDisplay.from_estimator`

, allow`kind`

to accept a list of strings to specify which type of plot to draw for each feature interaction. #19438 by Guillaume Lemaitre.**Enhancement**`inspection.PartialDependenceDisplay.from_estimator`

,`inspection.PartialDependenceDisplay.plot`

, and`inspection.plot_partial_dependence`

now support plotting centered Individual Conditional Expectation (cICE) and centered PDP curves controlled by setting the parameter`centered`

. #18310 by Johannes Elfner and Guillaume Lemaitre.

`sklearn.isotonic`

**Enhancement**Adds get_feature_names_out to`isotonic.IsotonicRegression`

. #22249 by Thomas Fan.

`sklearn.kernel_approximation`

**Enhancement**Adds get_feature_names_out to`kernel_approximation.AdditiveChi2Sampler`

.`kernel_approximation.Nystroem`

,`kernel_approximation.PolynomialCountSketch`

,`kernel_approximation.RBFSampler`

, and`kernel_approximation.SkewedChi2Sampler`

. #22137 and #22694 by Thomas Fan.

`sklearn.linear_model`

**Feature**`linear_model.ElasticNet`

,`linear_model.ElasticNetCV`

,`linear_model.Lasso`

and`linear_model.LassoCV`

support`sample_weight`

for sparse input`X`

. #22808 by Christian Lorentzen.**Feature**`linear_model.Ridge`

with`solver="lsqr"`

now supports to fit sparse input with`fit_intercept=True`

. #22950 by Christian Lorentzen.**Enhancement**`linear_model.QuantileRegressor`

support sparse input for the highs based solvers. #21086 by Venkatachalam Natchiappan. In addition, those solvers now use the CSC matrix right from the beginning which speeds up fitting. #22206 by Christian Lorentzen.**Enhancement**`linear_model.LogisticRegression`

is faster for`solvers="lbfgs"`

and`solver="newton-cg"`

, for binary and in particular for multiclass problems thanks to the new private loss function module. In the multiclass case, the memory consumption has also been reduced for these solvers as the target is now label encoded (mapped to integers) instead of label binarized (one-hot encoded). The more classes, the larger the benefit. #21808, #20567 and #21814 by Christian Lorentzen.**Enhancement**`linear_model.GammaRegressor`

,`linear_model.PoissonRegressor`

and`linear_model.TweedieRegressor`

are faster for`solvers="lbfgs"`

. #22548, #21808 and #20567 by Christian Lorentzen.**Enhancement**Rename parameter`base_estimator`

to`estimator`

in`linear_model.RANSACRegressor`

to improve readability and consistency.`base_estimator`

is deprecated and will be removed in 1.3. #22062 by Adrian Trujillo.**Enhancement**`linear_model.ElasticNet`

and and other linear model classes using coordinate descent show error messages when non-finite parameter weights are produced. #22148 by Christian Ritter and Norbert Preining.**Enhancement**`linear_model.ElasticNet`

and`linear_model.Lasso`

now raise consistent error messages when passed invalid values for`l1_ratio`

,`alpha`

,`max_iter`

and`tol`

. #22240 by Arturo Amor.**Enhancement**`linear_model.BayesianRidge`

and`linear_model.ARDRegression`

now preserve float32 dtype. #9087 by Arthur Imbert and #22525 by Meekail Zain.**Enhancement**`linear_model.RidgeClassifier`

is now supporting multilabel classification. #19689 by Guillaume Lemaitre.**Enhancement**`linear_model.RidgeCV`

and`linear_model.RidgeClassifierCV`

now raise consistent error message when passed invalid values for`alphas`

. #21606 by Arturo Amor.**Enhancement**`linear_model.Ridge`

and`linear_model.RidgeClassifier`

now raise consistent error message when passed invalid values for`alpha`

,`max_iter`

and`tol`

. #21341 by Arturo Amor.**Enhancement**`linear_model.orthogonal_mp_gram`

preservse dtype for`numpy.float32`

. #22002 by Takeshi Oura.**Fix**`linear_model.LassoLarsIC`

now correctly computes AIC and BIC. An error is now raised when`n_features > n_samples`

and when the noise variance is not provided. #21481 by Guillaume Lemaitre and Andrés Babino.**Fix**`linear_model.TheilSenRegressor`

now validates input parameter`max_subpopulation`

in`fit`

instead of`__init__`

. #21767 by Maren Westermann.**Fix**`linear_model.ElasticNetCV`

now produces correct warning when`l1_ratio=0`

. #21724 by Yar Khine Phyo.**Fix**`linear_model.LogisticRegression`

and`linear_model.LogisticRegressionCV`

now set the`n_iter_`

attribute with a shape that respects the docstring and that is consistent with the shape obtained when using the other solvers in the one-vs-rest setting. Previously, it would record only the maximum of the number of iterations for each binary sub-problem while now all of them are recorded. #21998 by Olivier Grisel.**Fix**The property`family`

of`linear_model.TweedieRegressor`

is not validated in`__init__`

anymore. Instead, this (private) property is deprecated in`linear_model.GammaRegressor`

,`linear_model.PoissonRegressor`

and`linear_model.TweedieRegressor`

, and will be removed in 1.3. #22548 by Christian Lorentzen.**Fix**The`coef_`

and`intercept_`

attributes of`linear_model.LinearRegression`

are now correctly computed in the presence of sample weights when the input is sparse. #22891 by Jérémie du Boisberranger.**Fix**The`coef_`

and`intercept_`

attributes of`linear_model.Ridge`

with`solver="sparse_cg"`

and`solver="lbfgs"`

are now correctly computed in the presence of sample weights when the input is sparse. #22899 by Jérémie du Boisberranger.**Fix**`linear_model.SGDRegressor`

and`linear_model.SGDClassifier`

now computes the validation error correctly when early stopping is enabled. #23256 by Zhehao Liu.**API Change**`linear_model.LassoLarsIC`

now exposes`noise_variance`

as a parameter in order to provide an estimate of the noise variance. This is particularly relevant when`n_features > n_samples`

and the estimator of the noise variance cannot be computed. #21481 by Guillaume Lemaitre.

`sklearn.manifold`

**Feature**`manifold.Isomap`

now supports radius-based neighbors via the`radius`

argument. #19794 by Zhehao Liu.**Enhancement**`manifold.spectral_embedding`

and`manifold.SpectralEmbedding`

supports`np.float32`

dtype and will preserve this dtype. #21534 by Andrew Knyazev.**Enhancement**Adds get_feature_names_out to`manifold.Isomap`

and`manifold.LocallyLinearEmbedding`

. #22254 by Thomas Fan.**Enhancement**added`metric_params`

to`manifold.TSNE`

constructor for additional parameters of distance metric to use in optimization. #21805 by Jeanne Dionisi and #22685 by Meekail Zain.**Enhancement**`manifold.trustworthiness`

raises an error if`n_neighbours >= n_samples / 2`

to ensure a correct support for the function. #18832 by Hong Shao Yang and #23033 by Meekail Zain.**Fix**`manifold.spectral_embedding`

now uses Gaussian instead of the previous uniform on [0, 1] random initial approximations to eigenvectors in eigen_solvers`lobpcg`

and`amg`

to improve their numerical stability. #21565 by Andrew Knyazev.

`sklearn.metrics`

**Feature**`metrics.r2_score`

and`metrics.explained_variance_score`

have a new`force_finite`

parameter. Setting this parameter to`False`

will return the actual non-finite score in case of perfect predictions or constant`y_true`

, instead of the finite approximation (`1.0`

and`0.0`

respectively) currently returned by default. #17266 by Sylvain Marié.**Feature**`metrics.d2_pinball_score`

and`metrics.d2_absolute_error_score`

calculate the D2 regression score for the pinball loss and the absolute error respectively.`metrics.d2_absolute_error_score`

is a special case of`metrics.d2_pinball_score`

with a fixed quantile parameter`alpha=0.5`

for ease of use and discovery. The D2 scores are generalizations of the`r2_score`

and can be interpeted as the fraction of deviance explained. #22118 by Ohad Michel.**Enhancement**`metrics.top_k_accuracy_score`

raises an improved error message when`y_true`

is binary and`y_score`

is 2d. #22284 by Thomas Fan.**Enhancement**`metrics.roc_auc_score`

now supports`average=None`

in the multiclass case when`multiclass='ovr'`

which will return the score per class. #19158 by Nicki Skafte.**Enhancement**Adds`im_kw`

parameter to`metrics.ConfusionMatrixDisplay.from_estimator`

`metrics.ConfusionMatrixDisplay.from_predictions`

, and`metrics.ConfusionMatrixDisplay.plot`

. The`im_kw`

parameter is passed to the`matplotlib.pyplot.imshow`

call when plotting the confusion matrix. #20753 by Thomas Fan.**Fix**`metrics.silhouette_score`

now supports integer input for precomputed distances. #22108 by Thomas Fan.**Fix**Fixed a bug in`metrics.normalized_mutual_info_score`

which could return unbounded values. #22635 by Jérémie du Boisberranger.**Fix**Fixes`metrics.precision_recall_curve`

and`metrics.average_precision_score`

when true labels are all negative. #19085 by Varun Agrawal.**API Change**`metrics.SCORERS`

is now deprecated and will be removed in 1.3. Please use`metrics.get_scorer_names`

to retrieve the names of all available scorers. #22866 by Adrin Jalali.**API Change**Parameters`sample_weight`

and`multioutput`

of`metrics.mean_absolute_percentage_error`

are now keyword-only, in accordance with SLEP009. A deprecation cycle was introduced. #21576 by Paul-Emile Dugnat.**API Change**The`"wminkowski"`

metric of`metrics.DistanceMetric`

is deprecated and will be removed in version 1.3. Instead the existing`"minkowski"`

metric now takes in an optional`w`

parameter for weights. This deprecation aims at remaining consistent with SciPy 1.8 convention. #21873 by Yar Khine Phyo.**API Change**`metrics.DistanceMetric`

has been moved from`sklearn.neighbors`

to`sklearn.metrics`

. Using`neighbors.DistanceMetric`

for imports is still valid for backward compatibility, but this alias will be removed in 1.3. #21177 by Julien Jerphanion.

`sklearn.mixture`

**Enhancement**`mixture.GaussianMixture`

and`mixture.BayesianGaussianMixture`

can now be initialized using k-means++ and random data points. #20408 by Gordon Walsh, Alberto Ceballos and Andres Rios.**Fix**Fix a bug that correctly initialize`precisions_cholesky_`

in`mixture.GaussianMixture`

when providing`precisions_init`

by taking its square root. #22058 by Guillaume Lemaitre.**Fix**`mixture.GaussianMixture`

now normalizes`weights_`

more safely, preventing rounding errors when calling`mixture.GaussianMixture.sample`

with`n_components=1`

. #23034 by Meekail Zain.

`sklearn.model_selection`

**Enhancement**it is now possible to pass`scoring="matthews_corrcoef"`

to all model selection tools with a`scoring`

argument to use the Matthews correlation coefficient (MCC). #22203 by Olivier Grisel.**Enhancement**raise an error during cross-validation when the fits for all the splits failed. Similarly raise an error during grid-search when the fits for all the models and all the splits failed. #21026 by Loïc Estève.**Fix**`model_selection.GridSearchCV`

,`model_selection.HalvingGridSearchCV`

now validate input parameters in`fit`

instead of`__init__`

. #21880 by Mrinal Tyagi.**Fix**`model_selection.learning_curve`

now supports`partial_fit`

with regressors. #22982 by Thomas Fan.

`sklearn.multiclass`

**Enhancement**`multiclass.OneVsRestClassifier`

now supports a`verbose`

parameter so progress on fitting can be seen. #22508 by Chris Combs.**Fix**`multiclass.OneVsOneClassifier.predict`

returns correct predictions when the inner classifier only has a predict_proba. #22604 by Thomas Fan.

`sklearn.neighbors`

**Enhancement**Adds get_feature_names_out to`neighbors.RadiusNeighborsTransformer`

,`neighbors.KNeighborsTransformer`

and`neighbors.NeighborhoodComponentsAnalysis`

. #22212 by Meekail Zain.**Fix**`neighbors.KernelDensity`

now validates input parameters in`fit`

instead of`__init__`

. #21430 by Desislava Vasileva and Lucy Jimenez.**Fix**`neighbors.KNeighborsRegressor.predict`

now works properly when given an array-like input if`KNeighborsRegressor`

is first constructed with a callable passed to the`weights`

parameter. #22687 by Meekail Zain.

`sklearn.neural_network`

**Enhancement**`neural_network.MLPClassifier`

and`neural_network.MLPRegressor`

show error messages when optimizers produce non-finite parameter weights. #22150 by Christian Ritter and Norbert Preining.**Enhancement**Adds get_feature_names_out to`neural_network.BernoulliRBM`

. #22248 by Thomas Fan.

`sklearn.pipeline`

**Enhancement**Added support for “passthrough” in`pipeline.FeatureUnion`

. Setting a transformer to “passthrough” will pass the features unchanged. #20860 by Shubhraneel Pal.**Fix**`pipeline.Pipeline`

now does not validate hyper-parameters in`__init__`

but in`.fit()`

. #21888 by iofall and Arisa Y..**Fix**`pipeline.FeatureUnion`

does not validate hyper-parameters in`__init__`

. Validation is now handled in`.fit()`

and`.fit_transform()`

. #21954 by iofall and Arisa Y..**Fix**Defines`__sklearn_is_fitted__`

in`pipeline.FeatureUnion`

to return correct result with`utils.validation.check_is_fitted`

. #22953 by randomgeek78.

`sklearn.preprocessing`

**Feature**`preprocessing.OneHotEncoder`

now supports grouping infrequent categories into a single feature. Grouping infrequent categories is enabled by specifying how to select infrequent categories with`min_frequency`

or`max_categories`

. #16018 by Thomas Fan.**Enhancement**Adds a`subsample`

parameter to`preprocessing.KBinsDiscretizer`

. This allows specifying a maximum number of samples to be used while fitting the model. The option is only available when`strategy`

is set to`quantile`

. #21445 by Felipe Bidu and Amanda Dsouza.**Enhancement**Adds`encoded_missing_value`

to`preprocessing.OrdinalEncoder`

to configure the encoded value for missing data. #21988 by Thomas Fan.**Enhancement**Added the`get_feature_names_out`

method and a new parameter`feature_names_out`

to`preprocessing.FunctionTransformer`

. You can set`feature_names_out`

to ‘one-to-one’ to use the input features names as the output feature names, or you can set it to a callable that returns the output feature names. This is especially useful when the transformer changes the number of features. If`feature_names_out`

is None (which is the default), then`get_output_feature_names`

is not defined. #21569 by Aurélien Geron.**Enhancement**Adds get_feature_names_out to`preprocessing.Normalizer`

,`preprocessing.KernelCenterer`

,`preprocessing.OrdinalEncoder`

, and`preprocessing.Binarizer`

. #21079 by Thomas Fan.**Fix**`preprocessing.PowerTransformer`

with`method='yeo-johnson'`

better supports significantly non-Gaussian data when searching for an optimal lambda. #20653 by Thomas Fan.**Fix**`preprocessing.LabelBinarizer`

now validates input parameters in`fit`

instead of`__init__`

. #21434 by Krum Arnaudov.**Fix**`preprocessing.FunctionTransformer`

with`check_inverse=True`

now provides informative error message when input has mixed dtypes. #19916 by Zhehao Liu.**Fix**`preprocessing.KBinsDiscretizer`

handles bin edges more consistently now. #14975 by Andreas Müller and #22526 by Meekail Zain.**Fix**Adds`preprocessing.KBinsDiscretizer.get_feature_names_out`

support when`encode="ordinal"`

. #22735 by Thomas Fan.

`sklearn.random_projection`

**Enhancement**Adds an`inverse_transform`

method and a`compute_inverse_transform`

parameter to`random_projection.GaussianRandomProjection`

and`random_projection.SparseRandomProjection`

. When the parameter is set to True, the pseudo-inverse of the components is computed during`fit`

and stored as`inverse_components_`

. #21701 by Aurélien Geron.**Enhancement**`random_projection.SparseRandomProjection`

and`random_projection.GaussianRandomProjection`

preserves dtype for`numpy.float32`

. #22114 by Takeshi Oura.**Enhancement**Adds get_feature_names_out to all transformers in the`sklearn.random_projection`

module:`random_projection.GaussianRandomProjection`

and`random_projection.SparseRandomProjection`

. #21330 by Loïc Estève.

`sklearn.svm`

**Enhancement**`svm.OneClassSVM`

,`svm.NuSVC`

,`svm.NuSVR`

,`svm.SVC`

and`svm.SVR`

now expose`n_iter_`

, the number of iterations of the libsvm optimization routine. #21408 by Juan Martín Loyola.**Enhancement**`svm.SVR`

,`svm.SVC`

,`svm.NuSVR`

,`svm.OneClassSVM`

,`svm.NuSVC`

now raise an error when the dual-gap estimation produce non-finite parameter weights. #22149 by Christian Ritter and Norbert Preining.**Fix**`svm.NuSVC`

,`svm.NuSVR`

,`svm.SVC`

,`svm.SVR`

,`svm.OneClassSVM`

now validate input parameters in`fit`

instead of`__init__`

. #21436 by Haidar Almubarak.

`sklearn.tree`

**Enhancement**`tree.DecisionTreeClassifier`

and`tree.ExtraTreeClassifier`

have the new`criterion="log_loss"`

, which is equivalent to`criterion="entropy"`

. #23047 by Christian Lorentzen.**Fix**Fix a bug in the Poisson splitting criterion for`tree.DecisionTreeRegressor`

. #22191 by Christian Lorentzen.**API Change**Changed the default value of`max_features`

to 1.0 for`tree.ExtraTreeRegressor`

and to`"sqrt"`

for`tree.ExtraTreeClassifier`

, which will not change the fit result. The original default value`"auto"`

has been deprecated and will be removed in version 1.3. Setting`max_features`

to`"auto"`

is also deprecated for`tree.DecisionTreeClassifier`

and`tree.DecisionTreeRegressor`

. #22476 by Zhehao Liu.

`sklearn.utils`

**Enhancement**`utils.check_array`

and`utils.multiclass.type_of_target`

now accept an`input_name`

parameter to make the error message more informative when passed invalid input data (e.g. with NaN or infinite values). #21219 by Olivier Grisel.**Enhancement**`utils.check_array`

returns a float ndarray with`np.nan`

when passed a`Float32`

or`Float64`

pandas extension array with`pd.NA`

. #21278 by Thomas Fan.**Enhancement**`utils.estimator_html_repr`

shows a more helpful error message when running in a jupyter notebook that is not trusted. #21316 by Thomas Fan.**Enhancement**`utils.estimator_html_repr`

displays an arrow on the top left corner of the HTML representation to show how the elements are clickable. #21298 by Thomas Fan.**Enhancement**`utils.check_array`

with`dtype=None`

returns numeric arrays when passed in a pandas DataFrame with mixed dtypes.`dtype="numeric"`

will also make better infer the dtype when the DataFrame has mixed dtypes. #22237 by Thomas Fan.**Enhancement**`utils.check_scalar`

now has better messages when displaying the type. #22218 by Thomas Fan.**Fix**Changes the error message of the`ValidationError`

raised by`utils.check_X_y`

when y is None so that it is compatible with the`check_requires_y_none`

estimator check. #22578 by Claudio Salvatore Arcidiacono.**Fix**`utils.class_weight.compute_class_weight`

now only requires that all classes in`y`

have a weight in`class_weight`

. An error is still raised when a class is present in`y`

but not in`class_weight`

. #22595 by Thomas Fan.**Fix**`utils.estimator_html_repr`

has an improved visualization for nested meta-estimators. #21310 by Thomas Fan.**Fix**`utils.check_scalar`

raises an error when`include_boundaries={"left", "right"}`

and the boundaries are not set. #22027 by Marie Lanternier.**Fix**`utils.metaestimators.available_if`

correctly returns a bounded method that can be pickled. #23077 by Thomas Fan.**API Change**`utils.estimator_checks.check_estimator`

’s argument is now called`estimator`

(previous name was`Estimator`

). #22188 by Mathurin Massias.**API Change**`utils.metaestimators.if_delegate_has_method`

is deprecated and will be removed in version 1.3. Use`utils.metaestimators.available_if`

instead. #22830 by Jérémie du Boisberranger.

**Have any questions?**__Contact Exxact Today__