A correlation coefficient is a value between -1 and +1 that measures the strength and direction of a linear relationship between two variables. A value near +1 indicates a strong positive correlation, a value near -1 indicates a strong negative correlation, and a value close to 0 suggests little to no correlation.
The median and interquartile range (IQR) are robust statistics because they are not heavily affected by outliers or skewed data, unlike the mean and standard deviation.
Outliers are values that are much higher or lower than most of the data. They are far from the rest of the distribution and can affect how data is analyzed.
Data can be aggregated by summarizing a numeric variable for each category in a dataset. This helps in comparing values across different groups.
Summary statistics are used to measure and describe the variables in a dataset, providing an overview of the data.
A distribution represents all possible values of a variable and how often each value occurs. It helps describe patterns in data, showing how values are spread across a dataset.
The mean, or average, represents the center of a numeric distribution. It is calculated by adding all values to a dataset and dividing them by the total number of values.
The standard deviation measures how spread out values are in a numeric distribution. It calculates the average distance of each value from the mean, showing how much the data varies.
A skewed distribution is asymmetrical, with a rapid change in frequency on one side and a slower, trailing change on the other.
The median represents the center of a numeric distribution by identifying the middle value when all data points are arranged in order from smallest to largest.
Categorical variables can be described using frequencies, proportions, or ratios to summarize how often each category appears in a dataset.
The interquartile range (IQR) measures the spread of values by calculating the range between the first quartile (Q1) and the third quartile (Q3), representing the middle 50% of the data.
Scatter plots and correlation coefficients help show relationships between two numeric variables. Scatter plots visualize the data, while correlation coefficients measure the strength and direction of the relationship.