Statistics Glossary

Quick reference for statistical terms and concepts. Click on any term to learn more.

A

Alpha Level (α)

The probability threshold for rejecting the null hypothesis, typically set at 0.05, representing a 5% risk of Type I error.

Category: Hypothesis Testing

Alternative Hypothesis (H₁)

The hypothesis that states there is a statistically significant effect, difference, or relationship between variables, opposing the null hypothesis.

Category: Hypothesis Testing

ANOVA (Analysis of Variance)

A statistical method used to compare means across three or more groups to determine if at least one group mean significantly differs from the others.

Category: Inferential Statistics

B

Bar Chart

A graphical display using rectangular bars to represent categorical data, where bar heights or lengths show frequency or value.

Category: Data Visualization

Bias

A systematic error in data collection, analysis, or interpretation that causes results to deviate from the true value in a particular direction.

Category: Sampling

Binomial Distribution

A discrete probability distribution describing the number of successes in a fixed number of independent trials, each with the same probability of success.

Category: Probability Distributions

Box Plot

A graphical representation of data distribution showing the median, quartiles, and potential outliers through a five-number summary.

Category: Data Visualization

C

Categorical Data

Data that can be divided into distinct groups or categories without any inherent numerical value or order.

Category: Types of Data

Census

A complete count or survey of every member in a population, collecting data from all individuals rather than a sample.

Category: Sampling

Chi-Square (χ²)

A statistical test measuring how observed frequencies differ from expected frequencies, commonly used for testing independence and goodness of fit.

Category: Inferential Statistics

Coefficient

A numerical value that represents the strength and/or direction of a relationship between variables in statistical analysis.

Category: Regression

Confidence Interval

A range of values that likely contains the true population parameter, calculated from sample data with a specified level of confidence.

Category: Inferential Statistics

Continuous Data

Numerical data that can take any value within a range, including decimals and fractions, with infinite possible values between any two points.

Category: Types of Data

Correlation

A statistical measure that describes the strength and direction of a linear relationship between two variables, ranging from -1 to +1.

Category: Descriptive Statistics

Covariance

A measure of how two variables change together, indicating whether increases in one variable correspond to increases or decreases in another.

Category: Descriptive Statistics

D

Degrees of Freedom

The number of independent values that can vary in a statistical calculation, typically the sample size minus constraints applied.

Category: Inferential Statistics

Dependent Variable

The variable being measured or observed in an experiment that is expected to change in response to the independent variable.

Category: Regression

Descriptive Statistics

Statistical methods used to summarize and describe the main features of a dataset, including measures of central tendency and variability.

Category: Descriptive Statistics

Discrete Data

Numerical data that can only take specific, separate values, typically whole numbers that are counted rather than measured.

Category: Types of Data

Distribution

The pattern or spread of data values across their possible range, showing how frequently different values occur.

Category: Probability Distributions

E

Effect Size

A quantitative measure of the magnitude of a phenomenon or the strength of the relationship between variables, independent of sample size.

Category: Hypothesis Testing

Expected Value

The weighted average of all possible values of a random variable, where weights are the probabilities of each outcome occurring.

Category: Probability

F

F-Distribution

A continuous probability distribution that arises as the ratio of two chi-square distributions, used primarily in ANOVA and regression analysis.

Category: Probability Distributions

Frequency

The number of times a particular value or category occurs in a dataset, often displayed in frequency tables or histograms.

Category: Descriptive Statistics

H

Histogram

A graphical representation of data distribution using adjacent bars to show the frequency of data within consecutive intervals or bins.

Category: Data Visualization

Hypothesis

A testable statement or prediction about the relationship between variables or characteristics of a population that can be evaluated using statistical methods.

Category: Hypothesis Testing

I

Independent Variable

The variable that is manipulated or controlled in an experiment to observe its effect on the dependent variable.

Category: Regression

Inferential Statistics

Statistical methods that use sample data to draw conclusions, make predictions, or generalize findings to a larger population.

Category: Inferential Statistics

Interquartile Range (IQR)

The difference between the third quartile (Q3) and first quartile (Q1), representing the middle 50% of data values.

Category: Descriptive Statistics

K

Kurtosis

A measure of the 'tailedness' of a probability distribution, indicating how much data is in the tails compared to a normal distribution.

Category: Descriptive Statistics

L

Linear Regression

A statistical method that models the relationship between a dependent variable and one or more independent variables by fitting a straight line to the data.

Category: Regression

M

Margin of Error

The range of values above and below a sample statistic in a confidence interval, representing the maximum expected difference from the true population value.

Category: Inferential Statistics

Mean

The arithmetic average of a set of values, calculated by summing all values and dividing by the count.

Category: Descriptive Statistics

Median

The middle value in a dataset when values are arranged in order. Half the values are below the median and half are above.

Category: Descriptive Statistics

Mode

The most frequently occurring value in a dataset. A dataset can have one mode, multiple modes, or no mode.

Category: Descriptive Statistics

N

Normal Distribution

A symmetric, bell-shaped probability distribution where data clusters around the mean, with predictable proportions falling within standard deviations.

Category: Probability Distributions

Null Hypothesis

A statement of no effect or no difference, serving as the default assumption in hypothesis testing. Denoted as H₀.

Category: Hypothesis Testing

O

Outlier

A data point that differs significantly from other observations, potentially indicating measurement error, data entry errors, or genuine extreme values.

Category: Descriptive Statistics

P

P-Value

The probability of obtaining test results at least as extreme as observed, assuming the null hypothesis is true.

Category: Hypothesis Testing

Parameter

A numerical characteristic of a population, such as the population mean (μ) or population standard deviation (σ), typically estimated from sample data.

Category: Inferential Statistics

Percentile

A measure indicating the value below which a given percentage of observations fall, dividing data into 100 equal parts.

Category: Descriptive Statistics

Population

The complete set of all individuals, objects, or measurements that share a common characteristic and are of interest for a statistical study.

Category: Sampling

Probability

A numerical measure between 0 and 1 representing the likelihood that an event will occur, where 0 means impossible and 1 means certain.

Category: Probability

Q

Quartile

Values that divide a dataset into four equal parts, with Q1, Q2 (median), and Q3 representing the 25th, 50th, and 75th percentiles respectively.

Category: Descriptive Statistics

R

Random Sampling

A sampling method where every member of the population has an equal chance of being selected, ensuring the sample represents the population fairly.

Category: Sampling

Range

The simplest measure of variability, calculated as the difference between the maximum and minimum values in a dataset.

Category: Descriptive Statistics

Regression

A statistical technique for modeling and analyzing the relationship between a dependent variable and one or more independent variables.

Category: Regression

Residual

The difference between an observed value and the value predicted by a statistical model, representing the error or unexplained variation.

Category: Regression

S

Sample

A subset of individuals or observations selected from a larger population for the purpose of statistical analysis and making inferences.

Category: Sampling

Sampling Distribution

The probability distribution of a statistic obtained from a large number of samples drawn from a specific population.

Category: Inferential Statistics

Skewness

A measure of the asymmetry of a probability distribution, indicating whether data tails extend more to the left or right of the mean.

Category: Descriptive Statistics

Standard Deviation

A measure of the amount of variation or dispersion in a set of values, showing how spread out the data is from the mean.

Category: Descriptive Statistics

Standard Error

The standard deviation of a sampling distribution, measuring the typical distance between a sample statistic and the population parameter it estimates.

Category: Inferential Statistics

Statistical Power

The probability that a statistical test will correctly reject a false null hypothesis, representing the ability to detect a real effect when one exists.

Category: Hypothesis Testing

Statistical Significance

A result is statistically significant when it is unlikely to have occurred by chance alone, typically when the p-value is less than the chosen alpha level.

Category: Hypothesis Testing

T

t-Distribution

A probability distribution similar to the normal distribution but with heavier tails, used when sample sizes are small or population standard deviation is unknown.

Category: Probability Distributions

t-Test

A statistical test used to compare means between one or two groups, determining whether differences are statistically significant.

Category: Hypothesis Testing

Type I Error

Rejecting a true null hypothesis; a 'false positive' where you conclude there is an effect when there actually isn't one.

Category: Hypothesis Testing

Type II Error

Failing to reject a false null hypothesis; a 'false negative' where you conclude there is no effect when there actually is one.

Category: Hypothesis Testing

V

Variance

A measure of how spread out data points are from the mean. Calculated as the average of squared differences from the mean.

Category: Descriptive Statistics

Z

Z-Score

A standardized measure indicating how many standard deviations a data point is from the mean, used for comparing values across different distributions.

Category: Descriptive Statistics