DataFrame.select_dtypes(
include = None,
exclude = None
)
The method select_dtypes of pandas dataframes returns the subset of the dataframe formed by the columns of the specified types and can specify the types that you want to select and / or those who want to exclude.
Types can be referenced by name (np.number, for example) or by a text string ("category" , for example).
- If we want to select all the numeric types we must use np.number or "number".
- The object type returns all columns of type object, including texts.
- To select dates we can use np.datetime64, "datetime" or "datetime64".
- To select timedeltas we can use np.timedelta64, "timedelta" or "timedelta64".
- If we want to select the categorical types of pandas we must use "category".
- To select the pandas datetimezn types we can use "datetimetz" or "datetime64 [ns, tz]".
- include: Type or list of types to be included in the selection.
- exclude: Type or list of types to be excluded from the selection.
The select_dtypes method returns a view of the original dataframe in DateTime format (regardless of the number of columns returned).
If we start from the dataset tips provided by seaborn:
tips = sns.load_dataset("tips")
tips.head()
...we could select only numeric fields (regardless of their exact type) with the following code:
tips_number = tips.select_dtypes(np.number)
tips_number.head
In the case of providing both the include and the exclude parameters, the fields that meet both criteria will be selected. For example, the tips dataset includes numeric data of types float and int:
tips.dtypes
total_bill float64
tip float64
sex category
smoker category
day category
time category
size int64
dtype: object
If we include the numeric types and exclude the float type, the result includes only the "size" column:
tips_number = tips.select_dtypes(include = np.number, exclude = float)
tips_number.head()
If, on the other hand, we include only the float type but exclude numeric types, the result is an empty dataframe:
tips_number = tips.select_dtypes(include = float, exclude = np.number)
tips_number.head()