beginner 25 minutes

Data Visualization

Master the art of visualizing data: histograms, box plots, scatter plots, bar charts, and more. Learn to choose the right chart for your data.

On This Page
Advertisement

Why Visualize Data?

A picture is worth a thousand numbers. Data visualization:

  • Reveals patterns not obvious in tables
  • Identifies outliers and unusual observations
  • Shows distributions at a glance
  • Communicates findings effectively to any audience
  • Guides analysis by suggesting what methods to use

Choosing the Right Chart

The best visualization depends on your data type and what you want to show:

Data TypePurposeBest Chart
CategoricalCompare frequenciesBar chart, Pie chart
NumericalShow distributionHistogram, Box plot
Two numerical variablesShow relationshipScatter plot
Numerical over timeShow trendsLine chart
Categorical relationshipCompare groupsGrouped bar chart

Charts for Categorical Data

Bar Charts

Bar charts display frequencies or percentages for categorical data using rectangular bars.

Bar Chart: Favorite Programming Languages

Survey of 500 developers:

LanguageCount
Python175
JavaScript150
Java85
C++50
Other40

Bar chart characteristics:

  • Bars are separated (gaps between them)
  • Bar height represents frequency or percentage
  • Categories can be in any order (often descending by frequency)
  • Easy to compare across categories

Best practices:

  • Start the y-axis at zero (avoid misleading comparisons)
  • Order bars logically (alphabetical, by size, or natural order)
  • Use horizontal bars for long category names
  • Limit to 5-7 categories for clarity

Pie Charts

Pie charts show proportions as slices of a circle.

Pie Chart: Market Share

Browser market share:

  • Chrome: 65%
  • Safari: 19%
  • Firefox: 8%
  • Edge: 5%
  • Other: 3%

When pie charts work:

  • Parts sum to a meaningful whole (100%)
  • Few categories (ideally 5 or fewer)
  • You want to emphasize proportions

Rules for pie charts:

  • Categories must sum to 100%
  • Use only for few categories (≤5)
  • Order slices by size (largest first, clockwise)
  • Avoid 3D effects (distorts perception)

Charts for Numerical Data

Histograms

Histograms show the distribution of numerical data by grouping values into bins.

Histogram: Exam Scores

Distribution of 100 exam scores:

Score RangeFrequency
40-493
50-598
60-6915
70-7932
80-8928
90-10014

Histogram characteristics:

  • Bars touch (no gaps) - emphasizes continuous nature
  • x-axis: numerical values (binned)
  • y-axis: frequency or relative frequency
  • Shape reveals the distribution

Reading Histogram Shapes

ShapeDescriptionExample
SymmetricMirror image on both sidesHeights, IQ scores
Right-skewedTail extends to the rightIncome, home prices
Left-skewedTail extends to the leftAge at death, easy test scores
UniformAll bars roughly equal heightFair die rolls
BimodalTwo peaksMixed populations

Box Plots (Box-and-Whisker)

Box plots display the five-number summary visually.

Five-Number Summary
  • Minimum: Smallest value (or lower fence)
  • Q1: First quartile (25th percentile)
  • Median: Middle value (50th percentile)
  • Q3: Third quartile (75th percentile)
  • Maximum: Largest value (or upper fence)
Box Plot Anatomy

For exam scores with five-number summary: 45, 65, 78, 88, 98

Min    Q1     Median    Q3     Max
 |-----|========|========|-----|
45    65       78       88    98
      |________IQR________|
            = 88-65 = 23

Components:

  • Box: Contains middle 50% of data (Q1 to Q3)
  • Line in box: Median
  • Whiskers: Extend to min/max (within 1.5×IQR)
  • Dots beyond whiskers: Outliers

Identifying Outliers with Box Plots

Outliers are points beyond the “fences”:

Lower fence=Q11.5×IQR\text{Lower fence} = Q_1 - 1.5 \times IQR Upper fence=Q3+1.5×IQR\text{Upper fence} = Q_3 + 1.5 \times IQR

Finding Outliers

Given: Q1 = 65, Q3 = 88, IQR = 23

  • Lower fence = 651.5(23)=6534.5=30.565 - 1.5(23) = 65 - 34.5 = 30.5
  • Upper fence = 88+1.5(23)=88+34.5=122.588 + 1.5(23) = 88 + 34.5 = 122.5

Any score below 30.5 or above 122.5 is an outlier.

Box Plot vs Histogram

FeatureBox PlotHistogram
Shows exact shapeNoYes
Shows outliers explicitlyYesSometimes
Compares groupsExcellentDifficult
Shows quartilesYesNo
Works for small datasetsYesNot well

Stem-and-Leaf Plots

A hybrid between a table and histogram—shows shape while preserving actual values.

Stem-and-Leaf Plot

Data: 23, 25, 28, 31, 32, 35, 35, 38, 41, 42, 47

Stem | Leaf
-----|------
  2  | 3 5 8
  3  | 1 2 5 5 8
  4  | 1 2 7

Reading: “3 | 1 2 5 5 8” represents 31, 32, 35, 35, 38

Advantage: Preserves original data values Disadvantage: Only works for small datasets

Dot Plots

Simple visualization showing each data point as a dot.

Dot Plot

Quiz scores: 7, 8, 8, 8, 9, 9, 10, 10, 10, 10

7:  •
8:  • • •
9:  • •
10: • • • •

Best for: Small datasets where you want to see individual values

Charts for Relationships

Scatter Plots

Scatter plots show the relationship between two numerical variables.

Scatter Plot: Height vs Weight

Each point represents one person:

  • x-axis: Height (inches)
  • y-axis: Weight (pounds)

What to look for:

  • Direction: Positive (up), Negative (down), or None
  • Form: Linear, Curved, or No pattern
  • Strength: How tightly points cluster around the pattern
  • Outliers: Points far from the overall pattern

Describing Scatter Plot Patterns

PatternDescription
Positive linearAs x increases, y increases (upward slope)
Negative linearAs x increases, y decreases (downward slope)
No relationshipPoints scattered randomly
CurvedNon-linear pattern (quadratic, exponential, etc.)
ClustersDistinct groups of points

Line Charts

Line charts show how a numerical variable changes over time.

Line Chart: Stock Price
  • x-axis: Time (days, months, years)
  • y-axis: Stock price
  • Line connects sequential points

Best for:

  • Time series data
  • Showing trends
  • Comparing multiple series over time

Best practices:

  • Time always goes on the x-axis
  • Don’t connect unrelated points
  • Use markers to show actual data points
  • Be careful with multiple lines (limit to 4-5)

Comparing Distributions

Side-by-Side Box Plots

Comparing Test Scores by Class

Three class sections taking the same exam:

Class A: |-----|=====|=====|-----|
Class B:   |---|===|========|------|
Class C: |---|========|======|---|
         40   50   60   70   80   90

Comparisons:

  • Class B has highest median
  • Class A has smallest spread
  • Class C is most symmetric

Back-to-Back Stem-and-Leaf

Comparing Two Groups

Male vs Female heights (inches):

Female | Stem | Male
-------|------|--------
 8 6 4 |  5   |
 9 7 5 |  6   | 2 4 6 8
 8 4 2 |  6   | 5 7 9
       |  7   | 0 2 4

Reading: Female 58 is “8 | 5” and Male 70 is “7 | 0”

Common Visualization Mistakes

1. Truncated Axes

Starting a bar chart y-axis at a non-zero value exaggerates differences.

Misleading Truncated Axis

Misleading: Y-axis from 95 to 100 makes 96→98 look like a 50% increase Correct: Y-axis from 0 to 100 shows the true proportion

Always start bar charts at zero!

2. 3D Effects

3D adds nothing but distortion. Stick to 2D.

3. Chartjunk

Excessive decoration (pictures, shading, unnecessary gridlines) distracts from data.

4. Wrong Chart Type

  • Pie chart for more than 6 categories
  • Line chart for categorical data
  • Bar chart for continuous data

5. Missing Labels

Always include:

  • Title
  • Axis labels with units
  • Legend (if multiple series)
  • Data source

Choosing the Right Visualization: A Decision Tree

What do you want to show?
├── Distribution of ONE variable
│   ├── Categorical → Bar chart
│   └── Numerical → Histogram or Box plot
├── Relationship between TWO variables
│   ├── Both numerical → Scatter plot
│   ├── Both categorical → Grouped bar chart
│   └── One categorical, one numerical → Box plots by group
├── Change over TIME
│   └── Line chart
└── Part-to-whole COMPOSITION
    └── Pie chart or Stacked bar (if few categories)

Summary

In this lesson, you learned:

  • Bar charts for categorical data frequencies
  • Pie charts for part-to-whole (use sparingly)
  • Histograms show numerical data distribution
  • Box plots display five-number summary and outliers
  • Scatter plots reveal relationships between two numerical variables
  • Line charts show trends over time
  • Data type determines appropriate visualization
  • Avoid misleading charts (truncated axes, 3D effects, chartjunk)

Practice Problems

1. You have data on customer satisfaction ratings (1-5 stars). What chart would you use?

2. You want to compare the distribution of salaries between three departments. What chart would you use?

3. You have monthly revenue data for the past two years. What chart would you use?

4. Given this histogram shape description: “Most data clustered on the right with a long tail stretching to the left.” Is this left-skewed or right-skewed?

Click to see answers

1. Bar chart - The ratings are ordinal categorical data. A bar chart shows frequency of each rating level clearly.

2. Side-by-side box plots - Perfect for comparing distributions across groups. Shows medians, spreads, and outliers for easy comparison.

3. Line chart - Time series data is best displayed with lines connecting sequential time points.

4. Left-skewed (negatively skewed) - The tail extends to the left (lower values). Most data is clustered at higher values.

Next Steps

Now that you can visualize data:

Advertisement

Was this lesson helpful?

Help us improve by sharing your feedback or spreading the word.