Data Visualization
Master the art of visualizing data: histograms, box plots, scatter plots, bar charts, and more. Learn to choose the right chart for your data.
On This Page
Why Visualize Data?
A picture is worth a thousand numbers. Data visualization:
- Reveals patterns not obvious in tables
- Identifies outliers and unusual observations
- Shows distributions at a glance
- Communicates findings effectively to any audience
- Guides analysis by suggesting what methods to use
Choosing the Right Chart
The best visualization depends on your data type and what you want to show:
| Data Type | Purpose | Best Chart |
|---|---|---|
| Categorical | Compare frequencies | Bar chart, Pie chart |
| Numerical | Show distribution | Histogram, Box plot |
| Two numerical variables | Show relationship | Scatter plot |
| Numerical over time | Show trends | Line chart |
| Categorical relationship | Compare groups | Grouped bar chart |
Charts for Categorical Data
Bar Charts
Bar charts display frequencies or percentages for categorical data using rectangular bars.
Survey of 500 developers:
| Language | Count |
|---|---|
| Python | 175 |
| JavaScript | 150 |
| Java | 85 |
| C++ | 50 |
| Other | 40 |
Bar chart characteristics:
- Bars are separated (gaps between them)
- Bar height represents frequency or percentage
- Categories can be in any order (often descending by frequency)
- Easy to compare across categories
Best practices:
- Start the y-axis at zero (avoid misleading comparisons)
- Order bars logically (alphabetical, by size, or natural order)
- Use horizontal bars for long category names
- Limit to 5-7 categories for clarity
Pie Charts
Pie charts show proportions as slices of a circle.
Browser market share:
- Chrome: 65%
- Safari: 19%
- Firefox: 8%
- Edge: 5%
- Other: 3%
When pie charts work:
- Parts sum to a meaningful whole (100%)
- Few categories (ideally 5 or fewer)
- You want to emphasize proportions
Rules for pie charts:
- Categories must sum to 100%
- Use only for few categories (≤5)
- Order slices by size (largest first, clockwise)
- Avoid 3D effects (distorts perception)
Charts for Numerical Data
Histograms
Histograms show the distribution of numerical data by grouping values into bins.
Distribution of 100 exam scores:
| Score Range | Frequency |
|---|---|
| 40-49 | 3 |
| 50-59 | 8 |
| 60-69 | 15 |
| 70-79 | 32 |
| 80-89 | 28 |
| 90-100 | 14 |
Histogram characteristics:
- Bars touch (no gaps) - emphasizes continuous nature
- x-axis: numerical values (binned)
- y-axis: frequency or relative frequency
- Shape reveals the distribution
Reading Histogram Shapes
| Shape | Description | Example |
|---|---|---|
| Symmetric | Mirror image on both sides | Heights, IQ scores |
| Right-skewed | Tail extends to the right | Income, home prices |
| Left-skewed | Tail extends to the left | Age at death, easy test scores |
| Uniform | All bars roughly equal height | Fair die rolls |
| Bimodal | Two peaks | Mixed populations |
Box Plots (Box-and-Whisker)
Box plots display the five-number summary visually.
- Minimum: Smallest value (or lower fence)
- Q1: First quartile (25th percentile)
- Median: Middle value (50th percentile)
- Q3: Third quartile (75th percentile)
- Maximum: Largest value (or upper fence)
For exam scores with five-number summary: 45, 65, 78, 88, 98
Min Q1 Median Q3 Max
|-----|========|========|-----|
45 65 78 88 98
|________IQR________|
= 88-65 = 23Components:
- Box: Contains middle 50% of data (Q1 to Q3)
- Line in box: Median
- Whiskers: Extend to min/max (within 1.5×IQR)
- Dots beyond whiskers: Outliers
Identifying Outliers with Box Plots
Outliers are points beyond the “fences”:
Given: Q1 = 65, Q3 = 88, IQR = 23
- Lower fence =
- Upper fence =
Any score below 30.5 or above 122.5 is an outlier.
Box Plot vs Histogram
| Feature | Box Plot | Histogram |
|---|---|---|
| Shows exact shape | No | Yes |
| Shows outliers explicitly | Yes | Sometimes |
| Compares groups | Excellent | Difficult |
| Shows quartiles | Yes | No |
| Works for small datasets | Yes | Not well |
Stem-and-Leaf Plots
A hybrid between a table and histogram—shows shape while preserving actual values.
Data: 23, 25, 28, 31, 32, 35, 35, 38, 41, 42, 47
Stem | Leaf
-----|------
2 | 3 5 8
3 | 1 2 5 5 8
4 | 1 2 7Reading: “3 | 1 2 5 5 8” represents 31, 32, 35, 35, 38
Advantage: Preserves original data values Disadvantage: Only works for small datasets
Dot Plots
Simple visualization showing each data point as a dot.
Quiz scores: 7, 8, 8, 8, 9, 9, 10, 10, 10, 10
7: •
8: • • •
9: • •
10: • • • •Best for: Small datasets where you want to see individual values
Charts for Relationships
Scatter Plots
Scatter plots show the relationship between two numerical variables.
Each point represents one person:
- x-axis: Height (inches)
- y-axis: Weight (pounds)
What to look for:
- Direction: Positive (up), Negative (down), or None
- Form: Linear, Curved, or No pattern
- Strength: How tightly points cluster around the pattern
- Outliers: Points far from the overall pattern
Describing Scatter Plot Patterns
| Pattern | Description |
|---|---|
| Positive linear | As x increases, y increases (upward slope) |
| Negative linear | As x increases, y decreases (downward slope) |
| No relationship | Points scattered randomly |
| Curved | Non-linear pattern (quadratic, exponential, etc.) |
| Clusters | Distinct groups of points |
Line Charts
Line charts show how a numerical variable changes over time.
- x-axis: Time (days, months, years)
- y-axis: Stock price
- Line connects sequential points
Best for:
- Time series data
- Showing trends
- Comparing multiple series over time
Best practices:
- Time always goes on the x-axis
- Don’t connect unrelated points
- Use markers to show actual data points
- Be careful with multiple lines (limit to 4-5)
Comparing Distributions
Side-by-Side Box Plots
Three class sections taking the same exam:
Class A: |-----|=====|=====|-----|
Class B: |---|===|========|------|
Class C: |---|========|======|---|
40 50 60 70 80 90Comparisons:
- Class B has highest median
- Class A has smallest spread
- Class C is most symmetric
Back-to-Back Stem-and-Leaf
Male vs Female heights (inches):
Female | Stem | Male
-------|------|--------
8 6 4 | 5 |
9 7 5 | 6 | 2 4 6 8
8 4 2 | 6 | 5 7 9
| 7 | 0 2 4Reading: Female 58 is “8 | 5” and Male 70 is “7 | 0”
Common Visualization Mistakes
1. Truncated Axes
Starting a bar chart y-axis at a non-zero value exaggerates differences.
Misleading: Y-axis from 95 to 100 makes 96→98 look like a 50% increase Correct: Y-axis from 0 to 100 shows the true proportion
Always start bar charts at zero!
2. 3D Effects
3D adds nothing but distortion. Stick to 2D.
3. Chartjunk
Excessive decoration (pictures, shading, unnecessary gridlines) distracts from data.
4. Wrong Chart Type
- Pie chart for more than 6 categories
- Line chart for categorical data
- Bar chart for continuous data
5. Missing Labels
Always include:
- Title
- Axis labels with units
- Legend (if multiple series)
- Data source
Choosing the Right Visualization: A Decision Tree
What do you want to show?
├── Distribution of ONE variable
│ ├── Categorical → Bar chart
│ └── Numerical → Histogram or Box plot
├── Relationship between TWO variables
│ ├── Both numerical → Scatter plot
│ ├── Both categorical → Grouped bar chart
│ └── One categorical, one numerical → Box plots by group
├── Change over TIME
│ └── Line chart
└── Part-to-whole COMPOSITION
└── Pie chart or Stacked bar (if few categories)
Summary
In this lesson, you learned:
- Bar charts for categorical data frequencies
- Pie charts for part-to-whole (use sparingly)
- Histograms show numerical data distribution
- Box plots display five-number summary and outliers
- Scatter plots reveal relationships between two numerical variables
- Line charts show trends over time
- Data type determines appropriate visualization
- Avoid misleading charts (truncated axes, 3D effects, chartjunk)
Practice Problems
1. You have data on customer satisfaction ratings (1-5 stars). What chart would you use?
2. You want to compare the distribution of salaries between three departments. What chart would you use?
3. You have monthly revenue data for the past two years. What chart would you use?
4. Given this histogram shape description: “Most data clustered on the right with a long tail stretching to the left.” Is this left-skewed or right-skewed?
Click to see answers
1. Bar chart - The ratings are ordinal categorical data. A bar chart shows frequency of each rating level clearly.
2. Side-by-side box plots - Perfect for comparing distributions across groups. Shows medians, spreads, and outliers for easy comparison.
3. Line chart - Time series data is best displayed with lines connecting sequential time points.
4. Left-skewed (negatively skewed) - The tail extends to the left (lower values). Most data is clustered at higher values.
Next Steps
Now that you can visualize data:
- Percentiles and Quartiles - Understand box plot components
- Correlation Analysis - Quantify scatter plot relationships
- Histogram Generator Tool - Create histograms for your data
Was this lesson helpful?
Help us improve by sharing your feedback or spreading the word.