Statistics Glossary
Quick reference for statistical terms and concepts. Click on any term to learn more.
A
Alpha Level (α)
The probability threshold for rejecting the null hypothesis, typically set at 0.05, representing a 5% risk of Type I error.
Alternative Hypothesis (H₁)
The hypothesis that states there is a statistically significant effect, difference, or relationship between variables, opposing the null hypothesis.
ANOVA (Analysis of Variance)
A statistical method used to compare means across three or more groups to determine if at least one group mean significantly differs from the others.
B
Bar Chart
A graphical display using rectangular bars to represent categorical data, where bar heights or lengths show frequency or value.
Bias
A systematic error in data collection, analysis, or interpretation that causes results to deviate from the true value in a particular direction.
Binomial Distribution
A discrete probability distribution describing the number of successes in a fixed number of independent trials, each with the same probability of success.
Box Plot
A graphical representation of data distribution showing the median, quartiles, and potential outliers through a five-number summary.
C
Categorical Data
Data that can be divided into distinct groups or categories without any inherent numerical value or order.
Census
A complete count or survey of every member in a population, collecting data from all individuals rather than a sample.
Chi-Square (χ²)
A statistical test measuring how observed frequencies differ from expected frequencies, commonly used for testing independence and goodness of fit.
Coefficient
A numerical value that represents the strength and/or direction of a relationship between variables in statistical analysis.
Confidence Interval
A range of values that likely contains the true population parameter, calculated from sample data with a specified level of confidence.
Continuous Data
Numerical data that can take any value within a range, including decimals and fractions, with infinite possible values between any two points.
Correlation
A statistical measure that describes the strength and direction of a linear relationship between two variables, ranging from -1 to +1.
Covariance
A measure of how two variables change together, indicating whether increases in one variable correspond to increases or decreases in another.
D
Degrees of Freedom
The number of independent values that can vary in a statistical calculation, typically the sample size minus constraints applied.
Dependent Variable
The variable being measured or observed in an experiment that is expected to change in response to the independent variable.
Descriptive Statistics
Statistical methods used to summarize and describe the main features of a dataset, including measures of central tendency and variability.
Discrete Data
Numerical data that can only take specific, separate values, typically whole numbers that are counted rather than measured.
Distribution
The pattern or spread of data values across their possible range, showing how frequently different values occur.
E
Effect Size
A quantitative measure of the magnitude of a phenomenon or the strength of the relationship between variables, independent of sample size.
Expected Value
The weighted average of all possible values of a random variable, where weights are the probabilities of each outcome occurring.
F
F-Distribution
A continuous probability distribution that arises as the ratio of two chi-square distributions, used primarily in ANOVA and regression analysis.
Frequency
The number of times a particular value or category occurs in a dataset, often displayed in frequency tables or histograms.
H
Histogram
A graphical representation of data distribution using adjacent bars to show the frequency of data within consecutive intervals or bins.
Hypothesis
A testable statement or prediction about the relationship between variables or characteristics of a population that can be evaluated using statistical methods.
I
Independent Variable
The variable that is manipulated or controlled in an experiment to observe its effect on the dependent variable.
Inferential Statistics
Statistical methods that use sample data to draw conclusions, make predictions, or generalize findings to a larger population.
K
Kurtosis
A measure of the 'tailedness' of a probability distribution, indicating how much data is in the tails compared to a normal distribution.
L
Linear Regression
A statistical method that models the relationship between a dependent variable and one or more independent variables by fitting a straight line to the data.
M
Margin of Error
The range of values above and below a sample statistic in a confidence interval, representing the maximum expected difference from the true population value.
Mean
The arithmetic average of a set of values, calculated by summing all values and dividing by the count.
Median
The middle value in a dataset when values are arranged in order. Half the values are below the median and half are above.
Mode
The most frequently occurring value in a dataset. A dataset can have one mode, multiple modes, or no mode.
N
Normal Distribution
A symmetric, bell-shaped probability distribution where data clusters around the mean, with predictable proportions falling within standard deviations.
Null Hypothesis
A statement of no effect or no difference, serving as the default assumption in hypothesis testing. Denoted as H₀.
O
Outlier
A data point that differs significantly from other observations, potentially indicating measurement error, data entry errors, or genuine extreme values.
P
P-Value
The probability of obtaining test results at least as extreme as observed, assuming the null hypothesis is true.
Parameter
A numerical characteristic of a population, such as the population mean (μ) or population standard deviation (σ), typically estimated from sample data.
Percentile
A measure indicating the value below which a given percentage of observations fall, dividing data into 100 equal parts.
Population
The complete set of all individuals, objects, or measurements that share a common characteristic and are of interest for a statistical study.
Probability
A numerical measure between 0 and 1 representing the likelihood that an event will occur, where 0 means impossible and 1 means certain.
Q
Quartile
Values that divide a dataset into four equal parts, with Q1, Q2 (median), and Q3 representing the 25th, 50th, and 75th percentiles respectively.
R
Random Sampling
A sampling method where every member of the population has an equal chance of being selected, ensuring the sample represents the population fairly.
Range
The simplest measure of variability, calculated as the difference between the maximum and minimum values in a dataset.
Regression
A statistical technique for modeling and analyzing the relationship between a dependent variable and one or more independent variables.
Residual
The difference between an observed value and the value predicted by a statistical model, representing the error or unexplained variation.
S
Sample
A subset of individuals or observations selected from a larger population for the purpose of statistical analysis and making inferences.
Sampling Distribution
The probability distribution of a statistic obtained from a large number of samples drawn from a specific population.
Skewness
A measure of the asymmetry of a probability distribution, indicating whether data tails extend more to the left or right of the mean.
Standard Deviation
A measure of the amount of variation or dispersion in a set of values, showing how spread out the data is from the mean.
Standard Error
The standard deviation of a sampling distribution, measuring the typical distance between a sample statistic and the population parameter it estimates.
Statistical Power
The probability that a statistical test will correctly reject a false null hypothesis, representing the ability to detect a real effect when one exists.
Statistical Significance
A result is statistically significant when it is unlikely to have occurred by chance alone, typically when the p-value is less than the chosen alpha level.
T
t-Distribution
A probability distribution similar to the normal distribution but with heavier tails, used when sample sizes are small or population standard deviation is unknown.
t-Test
A statistical test used to compare means between one or two groups, determining whether differences are statistically significant.
Type I Error
Rejecting a true null hypothesis; a 'false positive' where you conclude there is an effect when there actually isn't one.
Type II Error
Failing to reject a false null hypothesis; a 'false negative' where you conclude there is no effect when there actually is one.
V
Variance
A measure of how spread out data points are from the mean. Calculated as the average of squared differences from the mean.
Z
Z-Score
A standardized measure indicating how many standard deviations a data point is from the mean, used for comparing values across different distributions.