Summarizing Data with Tables
Learn to organize and summarize data using frequency tables, relative frequency distributions, and cumulative frequency tables.
On This Page
Why Summarize Data in Tables?
Raw data is often overwhelming. Consider a dataset with 10,000 observations—you can’t understand it by staring at 10,000 numbers! Tables help us:
- Organize data into manageable form
- Identify patterns and common values
- Communicate findings clearly
- Prepare for visualization and further analysis
Frequency Tables for Categorical Data
A frequency table shows how often each category or value occurs.
Basic Frequency Table
A survey asked 50 people their favorite color. Raw data (abbreviated):
Blue, Red, Green, Blue, Blue, Red, Yellow, Green, Blue, …
Frequency Table:
| Color | Frequency |
|---|---|
| Blue | 18 |
| Red | 14 |
| Green | 10 |
| Yellow | 5 |
| Purple | 3 |
| Total | 50 |
Now we can immediately see that Blue is most popular, followed by Red.
Adding Relative Frequency
Relative frequency expresses each count as a proportion or percentage of the total.
| Color | Frequency | Relative Frequency | Percentage |
|---|---|---|---|
| Blue | 18 | 36% | |
| Red | 14 | 28% | |
| Green | 10 | 20% | |
| Yellow | 5 | 10% | |
| Purple | 3 | 6% | |
| Total | 50 | 1.00 | 100% |
Interpretation: 36% of respondents prefer blue—more than a third!
Frequency Tables for Numerical Data
For numerical data, we often group values into classes (intervals or bins).
Creating Classes
Guidelines for choosing classes:
- Use 5-15 classes (fewer for small datasets)
- Make all classes the same width
- Make sure classes don’t overlap
- Include all data values
Round up to a convenient number.
Data: 25 exam scores ranging from 52 to 98.
Step 1: Choose number of classes: 5 classes
Step 2: Calculate width: round to 10
Step 3: Create classes starting at 50:
| Class | Frequency | Relative Frequency |
|---|---|---|
| 50-59 | 2 | 0.08 |
| 60-69 | 4 | 0.16 |
| 70-79 | 8 | 0.32 |
| 80-89 | 7 | 0.28 |
| 90-99 | 4 | 0.16 |
| Total | 25 | 1.00 |
Interpretation: Most students (32%) scored in the 70s, with a fairly symmetric distribution.
Class Boundaries and Midpoints
Class boundaries eliminate gaps between classes. For the class 50-59:
- Lower boundary: 49.5
- Upper boundary: 59.5
Class midpoint is the center of each class:
| Class | Midpoint |
|---|---|
| 50-59 | |
| 60-69 | |
| 70-79 | |
| 80-89 | |
| 90-99 |
Midpoints are useful for estimating the mean from grouped data.
Cumulative Frequency Tables
A cumulative frequency table shows the running total of frequencies.
Cumulative Frequency
Shows how many observations fall at or below each class.
| Class | Frequency | Cumulative Frequency |
|---|---|---|
| 50-59 | 2 | 2 |
| 60-69 | 4 | 2 + 4 = 6 |
| 70-79 | 8 | 6 + 8 = 14 |
| 80-89 | 7 | 14 + 7 = 21 |
| 90-99 | 4 | 21 + 4 = 25 |
Interpretation:
- 6 students scored 69 or below
- 14 students scored 79 or below
- 21 students scored 89 or below
Cumulative Relative Frequency
Express cumulative frequencies as proportions.
| Class | Freq | Rel. Freq | Cum. Freq | Cum. Rel. Freq |
|---|---|---|---|---|
| 50-59 | 2 | 0.08 | 2 | 0.08 |
| 60-69 | 4 | 0.16 | 6 | 0.24 |
| 70-79 | 8 | 0.32 | 14 | 0.56 |
| 80-89 | 7 | 0.28 | 21 | 0.84 |
| 90-99 | 4 | 0.16 | 25 | 1.00 |
Interpretation:
- 24% of students scored 69 or below (bottom quarter)
- 56% scored 79 or below (just over half)
- 84% scored 89 or below
Two-Way Tables (Contingency Tables)
Two-way tables show the relationship between two categorical variables.
A survey asked 200 people about their preferred coffee type.
| Espresso | Latte | Drip | Row Total | |
|---|---|---|---|---|
| Male | 35 | 25 | 40 | 100 |
| Female | 20 | 45 | 35 | 100 |
| Column Total | 55 | 70 | 75 | 200 |
Reading the table:
- 35 males prefer espresso
- 45 females prefer latte
- 70 people total prefer latte
Joint, Marginal, and Conditional Distributions
Using the coffee preference table:
Joint Distribution (proportion in each cell):
| Espresso | Latte | Drip | |
|---|---|---|---|
| Male | 35/200 = 0.175 | 0.125 | 0.200 |
| Female | 0.100 | 0.225 | 0.175 |
Marginal Distribution (row or column totals):
- Male: 100/200 = 50%
- Female: 100/200 = 50%
- Espresso: 55/200 = 27.5%
Conditional Distribution (given one variable):
- Among males, what % prefer latte? 25/100 = 25%
- Among females, what % prefer latte? 45/100 = 45%
Insight: Females are much more likely to prefer lattes (45% vs 25%)!
Common Mistakes to Avoid
1. Overlapping Classes
❌ Wrong: 50-60, 60-70, 70-80 (where does 60 go?)
✅ Correct: 50-59, 60-69, 70-79 OR use “less than” notation like 50 to under 60, 60 to under 70
2. Unequal Class Widths
Unless there’s a specific reason, keep class widths equal for easier interpretation.
3. Too Many or Too Few Classes
- Too few: Lose important patterns
- Too many: Table becomes as confusing as raw data
4. Missing the “Other” Category
For categorical data, include an “Other” category if responses don’t fit existing categories.
5. Not Including Totals
Always include row and column totals—they’re essential for calculations.
When to Use Each Table Type
| Table Type | Use When |
|---|---|
| Simple frequency | Summarizing one categorical variable |
| Grouped frequency | Summarizing numerical data with many values |
| Relative frequency | Comparing groups of different sizes |
| Cumulative frequency | Finding percentiles, proportion above/below a value |
| Two-way table | Exploring relationship between two categorical variables |
Real-World Application
An ER wants to understand patient arrival patterns.
Data: 500 patient arrivals over one week
Frequency Table by Time of Day:
| Time Period | Patients | Rel. Freq | Cum. Freq |
|---|---|---|---|
| 12am-4am | 45 | 9.0% | 45 |
| 4am-8am | 55 | 11.0% | 100 |
| 8am-12pm | 120 | 24.0% | 220 |
| 12pm-4pm | 95 | 19.0% | 315 |
| 4pm-8pm | 110 | 22.0% | 425 |
| 8pm-12am | 75 | 15.0% | 500 |
Insights for staffing decisions:
- Peak hours: 8am-12pm (24%) and 4pm-8pm (22%)
- Lowest: 12am-4am (9%)
- By 4pm, 63% of daily patients have arrived
Two-Way Table by Severity:
| Minor | Moderate | Severe | Total | |
|---|---|---|---|---|
| Weekday | 180 | 100 | 45 | 325 |
| Weekend | 95 | 55 | 25 | 175 |
| Total | 275 | 155 | 70 | 500 |
Insight: Weekends have proportionally similar severity distribution (54.3% minor vs 55.4% weekday minor).
Summary
In this lesson, you learned:
- Frequency tables organize data by counting occurrences
- Relative frequency expresses counts as proportions (sum to 1.00)
- Grouped frequency tables use classes for numerical data
- Class width = Range / Number of Classes
- Cumulative frequency shows running totals (useful for percentiles)
- Two-way tables show relationships between categorical variables
- Joint, marginal, and conditional distributions extract different information
Practice Problems
1. Create a grouped frequency table for these ages (in years): 23, 45, 31, 28, 52, 38, 41, 29, 35, 47, 33, 26, 44, 39, 31, 48, 27, 36, 42, 34
Use 5 classes starting at 20.
2. Using your table from Problem 1, add relative frequency and cumulative frequency columns.
3. In a two-way table, 120 out of 200 urban residents and 60 out of 150 rural residents support a policy. Calculate:
- Joint distribution of “Urban + Support”
- Conditional distribution: % support among urban residents
- Marginal distribution: % who support overall
Click to see answers
1. Range = 52 - 23 = 29, Width = 29/5 ≈ 6 (use 6)
| Class | Frequency |
|---|---|
| 20-25 | 1 |
| 26-31 | 6 |
| 32-37 | 4 |
| 38-43 | 4 |
| 44-49 | 4 |
| 50-55 | 1 |
| Total | 20 |
2.
| Class | Freq | Rel. Freq | Cum. Freq | Cum. Rel. Freq |
|---|---|---|---|---|
| 20-25 | 1 | 0.05 | 1 | 0.05 |
| 26-31 | 6 | 0.30 | 7 | 0.35 |
| 32-37 | 4 | 0.20 | 11 | 0.55 |
| 38-43 | 4 | 0.20 | 15 | 0.75 |
| 44-49 | 4 | 0.20 | 19 | 0.95 |
| 50-55 | 1 | 0.05 | 20 | 1.00 |
3.
- Total = 200 + 150 = 350
- Total support = 120 + 60 = 180
- Joint (Urban + Support) = 120/350 = 34.3%
- Conditional (Support | Urban) = 120/200 = 60%
- Marginal (Support) = 180/350 = 51.4%
Next Steps
Now that you can organize data in tables:
- Measures of Central Tendency - Summarize with numbers
- Data Visualization - Turn tables into charts
- Percentiles and Quartiles - Use cumulative frequencies
Was this lesson helpful?
Help us improve by sharing your feedback or spreading the word.