When using a dataset for analysis, you must check your data to ensure it only contains finite numbers and no NaN values (Not a Number). If you try to pass a dataset that contains NaN or infinity values to a function for analysis, you will raise the error: ValueError: input contains nan, infinity or a value too large for dtype(‘float64’).

To solve this error, you can check your data set for NaN values using numpy.isnan() and infinite values using numpy.isfinite(). You can replace NaN values using nan_to_num() if your data is in a numpy array or SciKit-Learn’s SimpleImputer.

This tutorial will go through the error in detail and how to solve it with the help of code examples.


Python ValueError: input contains nan, infinity or a value too large for dtype(‘float64’)

What is a ValueError?

In Python, a value is the information stored within a particular object. You will encounter a ValueError in Python when you use a built-in operation or function that receives an argument with the right type but an inappropriate value.

What is a NaN in Python?

In Python, a NaN stands for Not a Number and represents undefined entries and missing values in a dataset.

What is inf in Python?

Infinity in Python is a number that is greater than every other numeric value and can either be positive or negative. All arithmetic operations performed on an infinite value will produce an infinite number. Infinity is a float value; there is no way to represent infinity as an integer. We can use float() to represent infinity as follows:

pos_inf=float('inf')

neg_inf=-float('inf')

print('Positive infinity: ', pos_inf)

print('Negative infinity: ', neg_inf)
Positive infinity:  inf
Negative infinity:  -inf

We can also use the math, decimal, sympy, and numpy modules to represent infinity in Python.

Let’s look at some examples where we want to clean our data of NaN and infinity values.

Example #1: Dataset with NaN Values

In this example, we will generate a dataset consisting of random numbers and then randomly populate the dataset with NaN values. We will try to cluster the values in the dataset using the AffinityPropagation in the Scikit-Learn library.

Note: The use of the AffinityPropagation to cluster on random data is just an example to demonstrate the source of the error. The function you are trying to use may be completely different to AffinityPropagation, but the data preprocessing described in this tutorial will still apply.

The data generation looks as follows:

# Import numpy and AffinityPropagation

import numpy as np

from sklearn.cluster import AffinityPropagation

# Number of NaN values to put into data

n = 4

data = np.random.randn(20)

# Get random indices in the data

index_nan = np.random.choice(data.size, n, replace=False)

# Replace data with NaN

data.ravel()[index_nan]=np.nan

print(data)

Let’s look at the data:

[-0.0063374  -0.974195    0.94467842  0.38736788  0.84908087         nan
  1.00582645         nan  1.87585201 -0.98264992 -1.64822932  1.24843544
  0.88220504 -1.4204208   0.53238027         nan  0.83446561         nan
 -0.04655628 -1.09054183]

The data consists of twenty random values, four of which are NaN, and the rest are numerical values. Let’s try to fit the data using the AffinityPropagation() class.

af= AffinityPropagation(random_state=5).fit([data])
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

We raise the error because the AffinityPropagation.fit() cannot handle NaN, infinity or extremely large values. Our data contains NaN values, and we need to preprocess the data to replace them with suitable values.

Solution #1: using nan_to_num()

To check if a dataset contains NaN values, we can use the isnan() function from NumPy. If we pair this function with any(), we will check if there are any instances of NaN. We can replace the NaN values using the nan_to_num() method. Let’s look at the code and the clean data:

print(np.any(np.isnan(data)))

data = np.nan_to_num(data)

print(data)
True
[-0.0063374  -0.974195    0.94467842  0.38736788  0.84908087  0.
  1.00582645  0.          1.87585201 -0.98264992 -1.64822932  1.24843544
  0.88220504 -1.4204208   0.53238027  0.          0.83446561  0.
 -0.04655628 -1.09054183]

The np.any() part of the code returns True because our dataset contains at least one NaN value. The clean data has zeros in place of the NaN values. Let’s fit on the clean data:

af= AffinityPropagation(random_state=5).fit([data])

This code will execute without any errors.

Solution #2: using SimpleImputer

Scikit-Learn provides a class for imputation called SimpleImputer. We can use the SimpleImputer to replace NaN values. To replace NaN values in a one-dimensional dataset, we need to set the strategy parameter in the SimpleImputer to constant. First, we will generate the data:

import numpy as np

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

print(data)

The data looks like this:

[ 1.4325319   0.61439789  0.3614522   1.38531346         nan  0.6900916
  0.50743745  0.48544145         nan         nan  0.17253557         nan
 -1.05027802  0.09648188  1.15971533  0.29005307  2.35040023  0.44103513
 -0.03235852 -0.78142219]

We can use the SimpleImputer class to fit and transform the data as follows:

from sklearn.impute import SimpleImputer

imp_mean = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=0)

imputer = imp_mean.fit([data])

data = imputer.transform([data])

print(data)

The clean data looks like this:

[[ 1.4325319   0.61439789  0.3614522   1.38531346  0.          0.6900916
   0.50743745  0.48544145  0.          0.          0.17253557  0.
  -1.05027802  0.09648188  1.15971533  0.29005307  2.35040023  0.44103513
  -0.03235852 -0.78142219]]

And we can pass the clean data to the AffinityPropagation clustering method as follows:

af= AffinityPropagation(random_state=5).fit(data)

We can also use the SimpleImputer class on multi-dimensional data to replace NaN values using the mean along each column. We have to set the imputation strategy to “mean”, and using the mean is only valid for numeric data. Let’s look at an example of a 3×3 nested list that contains NaN values:

from sklearn.impute import SimpleImputer

data = [[7, 2, np.nan], 
        [4, np.nan, 6], 
        [10, 5, 9]]

We can replace the NaN values as follows:

imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')

imp_mean.fit(data)

data = imp_mean.transform(data)

print(data)
[[ 7.   2.   7.5]
 [ 4.   3.5  6. ]
 [10.   5.   9. ]]

We replaced the np.nan values with the mean of the real numbers along the columns of the nested list. For example, in the third column, the real numbers are 6 and 9, so the mean is 7.5, which replaces the np.nan value in the third column.

We can use the other imputation strategies media and most_frequent.

Example #2: Dataset with NaN and inf Values

This example will generate a dataset consisting of random numbers and then randomly populate the dataset with NaN and infinity values. We will try to cluster the values in the dataset using the AffinityPropagation in the Scikit-Learn library. The data generation looks as follows:

import numpy as np

from sklearn.cluster import AffinityPropagation

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

index_inf = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

data.ravel()[index_inf]=np.inf

print(data)
[-0.76148741         inf  0.10339756         nan         inf -0.75013509
  1.2740893          nan -1.68682986         nan  0.57540185 -2.0435754
  0.99287213         inf  0.5838198          inf -0.62896815 -0.45368201
  0.49864775 -1.08881703]

The data consists of twenty random values, four of which are NaN, four are infinity, and the rest are numerical values. Let’s try to fit the data using the AffinityPropagation() class.

af= AffinityPropagation(random_state=5).fit([data])
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

We raise the error because the dataset contains NaN values and infinity values.

Solution #1: Using nan_to_num

To check if a dataset contains NaN values, we can use the isnan() function from NumPy. If we pair this function with any(), we will check if there are any instances of NaN.

To check if a dataset contains infinite values, we can use the isfinite() function from NumPy. If we pair this function with any(), we will check if there are any instances of infinity.

We can replace the NaN and infinity values using the nan_to_num() method. The method will set NaN values to zero and infinity values to a very large number. Let’s look at the code and the clean data:

print(np.any(np.isnan(data)))

print(np.all(np.isfinite(data)))

data = np.nan_to_num(data)

print(data)
True

False

[-7.61487414e-001  1.79769313e+308  1.03397556e-001  0.00000000e+000
  1.79769313e+308 -7.50135085e-001  1.27408930e+000  0.00000000e+000
 -1.68682986e+000  0.00000000e+000  5.75401847e-001 -2.04357540e+000
  9.92872128e-001  1.79769313e+308  5.83819800e-001  1.79769313e+308
 -6.28968155e-001 -4.53682014e-001  4.98647752e-001 -1.08881703e+000]

We replaced the NaN values with zeroes and the infinity values with 1.79769313e+308. We can fit on the clean data as follows:

af= AffinityPropagation(random_state=5).fit([data])

This code will execute without any errors. If we do not want to replace infinity with a very large number but with zero, we can convert the infinity values to NaN using:

data[data==np.inf] = np.nan

And then pass the data to the nan_to_num method, converting all the NaN values to zeroes.

Solution #2: Using fillna()

We can use Pandas to convert our dataset to a DataFrame and replace the NaN and infinity values using the Pandas fillna() method. First, let’s look at the data generation:

import numpy as np

import pandas as pd

from sklearn.cluster import AffinityPropagation

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

index_inf = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

data.ravel()[index_inf]=np.inf

print(data
[ 0.41339801         inf         nan  0.7854321   0.23319745         nan
  0.50342482         inf -0.82102161 -0.81934623  0.23176869 -0.61882322
  0.12434801 -0.21218049         inf -1.54067848         nan  1.78086445
         inf  0.4881174 ]

The data consists of twenty random values, four of which are NaN, four are infinity, and the rest are numerical values. We can convert the numpy array to a DataFrame as follows:

df = pd.DataFrame(data)

Once we have the DataFrame, we can use the replace method to replace the infinity values with NaN values. Then, we will call the fillna() method to replace all NaN values in the DataFrame.

df.replace([np.inf, -np.inf], np.nan, inplace=True)

df = df.fillna(0)

We can use the to_numpy() method to convert the DataFrame back to a numpy array as follows:

data = df.to_numpy()

print(data)
[[ 0.41339801]
 [ 0.        ]
 [ 0.        ]
 [ 0.7854321 ]
 [ 0.23319745]
 [ 0.        ]
 [ 0.50342482]
 [ 0.        ]
 [-0.82102161]
 [-0.81934623]
 [ 0.23176869]
 [-0.61882322]
 [ 0.12434801]
 [-0.21218049]
 [ 0.        ]
 [-1.54067848]
 [ 0.        ]
 [ 1.78086445]
 [ 0.        ]
 [ 0.4881174 ]]

We can now fit on the clean data using the AffinityPropagation class as follows:

af= AffinityPropagation(random_state=5).fit(data)

print(af.cluster_centers_)

The clustering algorithm gives us the following cluster centres:

[[ 0.        ]
 [ 0.50342482]
 [-0.81934623]
 [-1.54067848]
 [ 1.78086445]]

We can also use Pandas to drop columns with NaN values using the dropna() method. For further reading on using Pandas for data preprocessing, go to the article: Introduction to Pandas: A Complete Tutorial for Beginners.

Solution #3: using SimpleImputer

Let’s look at an example of using the SimpleImputer to replace NaN and infinity values. First, we will look at the data generation:

import numpy as np

n = 4

data = np.random.randn(20)

index_nan = np.random.choice(data.size, n, replace=False)

index_inf = np.random.choice(data.size, n, replace=False)

data.ravel()[index_nan]=np.nan

data.ravel()[index_inf]=np.inf

print(data)
[-0.5318616          nan  0.12842066         inf         inf         nan
  1.24679674  0.09636847  0.67969774  1.2029146          nan  0.60090616
 -0.46642723         nan  1.58596659  0.47893738  1.52861316         inf
 -1.36273437         inf]

The data consists of twenty random values, four of which are NaN, four are infinity, and the rest are numerical values. Let’s try to use the SimpleImputer to clean our data:

from sklearn.impute import SimpleImputer

imp_mean = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=0)

imputer = imp_mean.fit([data])

data = imputer.transform([data])

print(data)
ValueError: Input contains infinity or a value too large for dtype('float64').

We raise the error because the SimpleImputer method does not support infinite values. To solve this error, you can replace the np.inf with np.nan values as follows:

data[data==np.inf] = np.nan

imp_mean = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=0)

imputer = imp_mean.fit([data])

data = imputer.transform([data])

print(data)

With all infinity values replaced with NaN values, we can use the SimpleImputer to transform the data. Let’s look at the clean dataset:

[[-0.5318616   0.          0.12842066  0.          0.          0.
   1.24679674  0.09636847  0.67969774  1.2029146   0.          0.60090616
  -0.46642723  0.          1.58596659  0.47893738  1.52861316  0.
  -1.36273437  0.        ]]

Consider the case where we have multi-dimensional data with NaN and infinity values, and we want to use the SimpleImputer method. In that case, we can replace the infinite by using the Pandas replace() method as follows:

from sklearn.impute import SimpleImputer

data = [[7, 2, np.nan], 
        [4, np.nan, 6], 
        [10, 5, np.inf]]

df = pd.DataFrame(data)

df.replace([np.inf, -np.inf], np.nan, inplace=True)

data = df.to_numpy()

Then we can use the SimpleImputer to fit and transform the data. In this case, we will replace the missing values with the mean along the column where each NaN value occurs.

imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')

imp_mean.fit(data)

data = imp_mean.transform(data)

print(data)

The clean data looks like this:

[[ 7.   2.   6. ]
 [ 4.   3.5  6. ]
 [10.   5.   6. ]]

Summary

Congratulations on reading to the end of this tutorial! If you pass a NaN or an infinite value to a function, you may raise the error: ValueError: input contains nan, infinity or a value too large for dtype(‘float64’). This commonly occurs as a result of not preprocessing data before analysis. To solve this error, check your data for NaN and inf values and either remove them or replace them with real numbers.

You can only replace NaN values with the SimpleImputer method. If you try to replace infinity values with the SimpleImputer, you will raise the ValueError. Ensure that you convert all positive and negative infinity values to NaN before using the SimpleImputer.

For further reading on ValueErrors, go to the article: How to Solve Python ValueError: I/O operation on closed file.

or further reading on Scikit-learn, go to the article: How to Solve Sklearn ValueError: Unknown label type: ‘continuous’.

Go to the online courses page on Python to learn more about coding in Python for data science and machine learning.

Have fun and happy researching!