NaN stands for Not a Number. You may encounter the error ValueError: cannot convert float NaN to integer when attempting to convert a column in a Pandas DataFrame from a float to an integer, and the column contains NaN values.

You can solve this error by either dropping the rows with the NaN values or replacing the NaN values with another value that you can convert to an integer.

This tutorial will go through how to resolve the error with examples.


ValueError: cannot convert float nan to integer

What is a ValueError?

In Python, a value is the information stored within a certain object. You will encounter a ValueError in Python when you use a built-in operation or function that receives an argument that has the right type but an inappropriate value. Let’s look at an example of converting several a ValueError:

value = 'string'

print(float(value))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
print(float(value))

ValueError: could not convert string to float: 'string'

The above code throws the ValueError because the value ‘string‘ is an inappropriate (non-convertible) string. You can only convert numerical strings using the float() method, for example:

value = '5'
print(float(value))
5.0

The code does not throw an error because the float function can convert a numerical string. The value of 5 is appropriate for the float function.

What is a NaN?

In Python, a NaN stands for Not a Number and represents undefined entries and missing values in a dataset. NaN is a special floating-point value that cannot be converted to any other type than float. Therefore if we try to convert a NaN to an integer we will throw: ValueError: cannot convert float nan to integer.

Example: NaN Values in a DataFrame

You may encounter this ValueError when you attempt to convert a column in a pandas DataFrame from a float to an integer, yet the column contains NaN values. Let’s look at an example DataFrame that stores the exam results for three subjects: Physics, Chemistry, Biology. The results are on a scale of 0 to 100.

import pandas as pd

import numpy as np

df = pd.DataFrame({'Physics':[50, 60, 70, 55, 47, 90],
'Chemistry':[70, 75, 55, 63, np.nan, 80],
'Biology':[80, np.nan, 55, 70, np.nan, 66]})

print(df)

In the above program, we import both pandas and numpy and create a DataFrame to store the exam results. We then print the DataFrame to the console. Let’s run the code to see the DataFrame:

Physics  Chemistry  Biology
0       50       70.0     80.0
1       60       75.0      NaN
2       70       55.0     55.0
3       55       63.0     70.0
4       47        NaN      NaN
5       90       80.0     66.0

The columns Chemistry and Biology are of the data type float, which we can verify using dtype:

print(df['Physics'].dtype)

print(df['Chemistry'].dtype)

print(df['Biology'].dtype)
int64
float64
float64

Let’s try to convert the Chemistry and Biology columns from float to integer:

df['Chemistry'] = df['Chemistry'].astype(int)
df['Biology'] = df['Biology'].astype(int)
ValueError: Cannot convert non-finite values (NA or inf) to integer

The program throws the ValueError because the NaN values in the Chemistry and Biology columns cannot be converted to integer values.

Solution #1: Drop Rows with NaN Values Using dropna()

To solve this error we can remove the rows from the DataFrame that contains NaN values using the dropna() function. Let’s look at how to do this:

import pandas as pd

import numpy as np

df = pd.DataFrame({'Physics':[50, 60, 70, 55, 47, 90],
'Chemistry':[70, 75, 55, 63, np.nan, 80],
'Biology':[80, np.nan, 55, 70, np.nan, 66]})

df = df.dropna()

print(df)

df['Chemistry'] = df['Chemistry'].astype(int)

df['Biology'] = df['Biology'].astype(int)

print(df)

print(df['Chemistry'].dtype)

print(df['Biology'].dtype)

The above program drops the rows that contain NaN values then converts each of the Chemistry and Biology columns to integer. The program prints the DataFrame after applying dropna(), after converting the columns and the data types of the Chemistry and Biology columns. Let’s run the program to get the output.

   Physics  Chemistry  Biology
0       50       70.0     80.0
2       70       55.0     55.0
3       55       63.0     70.0
5       90       80.0     66.0

   Physics  Chemistry  Biology
0       50         70       80
2       70         55       55
3       55         63       70
5       90         80       66

int64
int64

Solution #2: Replacing NaN Values Using fillna()

Opting to remove rows that contain NaN values will result in losing important information. Instead of removing the rows, we can replace the NaN values with other values. In this example, we will replace the NaN values with zeros but they can be any other value. Let’s look at how to use the fillna() function:

import pandas as pd

import numpy as np

df = pd.DataFrame({'Physics':[50, 60, 70, 55, 47, 90],
'Chemistry':[70, 75, 55, 63, np.nan, 80],
'Biology':[80, np.nan, 55, 70, np.nan, 66]})

df['Chemistry'] = df['Chemistry'].fillna(0)

df['Biology'] = df['Biology'].fillna(0)

df['Chemistry'] = df['Chemistry'].astype(int)

df['Biology'] = df['Biology'].astype(int)

print(df)

print(df['Chemistry'].dtype)

print(df['Biology'].dtype)

The above program returns:

   Physics  Chemistry  Biology
0       50         70       80
1       60         75        0
2       70         55       55
3       55         63       70
4       47          0        0
5       90         80       66
int64
int64

Both solutions allow us to convert the float columns to integer columns, but fillna() preserves values in the rows not containing NaN values.

Summary

Congratulations on reading to the end of this article! You will raise the error ValueError: cannot convert float nan to integer when you try to convert a NaN value to an integer. This commonly occurs when you try to convert a column in a DataFrame that contains NaN values from a float to an integer. You can solve this error by dropping the rows that contain the NaN values using dropna() or you can use fillna() to replace the NaN values with other values that you can convert to integers.

For the solution to another common ValueError involving NaN values, go to the article: How to Solve Python ValueError: input contains nan, infinity or a value too large for dtype(‘float64’)

Go to the Python online courses page to learn more about coding in Python for data science and machine learning.

Have fun and happy researching!