# Summarizing a Single Feature

Print Cheatsheet

### Ordinal and Nominal Categorical Data

Categorical variables can be either ordinal (ordered) or nominal (unordered).

Examples of ordinal variables include places (1st, 2nd, 3rd) and survey responses (on a scale of 1 to 5, how much do you agree with a statement).

Examples of nominal variables include tree species, student names, and account names.

### Pandas .describe() method

The pandas method, `.describe()` provides summary statistics for all features in a dataset. Setting `include = 'all'` includes summary statistics for both quantitative and categorical features.

`df.describe(include = 'all')`

### Central tendency statistics

To summarize the central tendency, or typical value, of a quantitative variable, we can use statistics such as the mean, median, and mode. These can be calculated using the pandas methods `.mean()`, `.median()`, and `.mode()`, respectively.

```#calculate mean of a columndf.column_name.mean()
#calculate median of a columndf.column_name.median()
#calculate mode of a columndf.column_name.mode()
```

To summarize the spread, or variation, of a quantitative variable, we can use statistics such as the range, interquartile range, variance, standard deviation, and mean absolute deviation. These can be calculated as shown.

```#calculate range of a columndf.column_name.max() - df.column_name.min()
#calculate IQR of a columndf.column_name.quantile(0.75) - df.column_name.quantile(0.25)
#calculate variance of a columndf.column_name.var()
#calculate standard deviation of a columndf.column_name.std()
```

### Visualize the distribution of a quantitative/continuous feature

To inspect the distribution of a quantitative variable, we can use visualizations such as histograms and box plots. We can create these plots using the seaborn functions `histplot()` and `boxplot()`, respectively.

```import matplotlib.pyplot as pltimport seaborn as sns
#create histogramsns.histplot(x = 'column_name', data = data_name)plt.show()
#create boxplotsns.boxplot(x = 'column_name', data = data_name)plt.show()```

### Summary statistics for categorical data

To summarize the distribution of a categorical/discrete feature, we can calculate the number or proportion of observations in each category using the pandas method `.value_counts`.

```#calculate the number in each categorydf.column_name.value_counts()
#calculate the proportion in each categorydf.column_name.value_counts(normalize = True)```

### Visualizing categorical data

To inspect and explore categorical features, we can use visualizations such as bar charts or pie charts. The provided code demonstrates how to create these plots.

```import matplotlib.pyplot as pltimport seaborn as sns
#create bar chartsns.countplot(x = 'column_name', data = data_name)plt.show()
#create pie chartdf.column_name.value_counts().plot.pie()plt.show()```