beginner 22 minutes

Data Collection Methods

Master data collection: sampling techniques, surveys, experiments, and observational studies. Learn to design research.

On This Page
Advertisement

Why Data Collection Matters

The quality of your statistical conclusions is only as good as the data you collect. Poor data collection leads to:

  • Biased estimates that systematically miss the truth
  • Invalid conclusions that don’t apply to your target population
  • Wasted resources on studies that can’t answer your questions
  • Misleading results that could harm decision-making

Populations and Samples

Key Definitions

TermDefinitionExample
PopulationThe entire group you want to studyAll U.S. adults
SampleA subset of the population you actually study1,500 surveyed adults
ParameterA number describing the population (usually unknown)True average income of all U.S. adults
StatisticA number calculated from the sampleAverage income of 1,500 surveyed adults

Why Sample?

Studying an entire population is often:

  • Impossible (can’t test every light bulb without destroying them all)
  • Impractical (surveying 330 million Americans is costly)
  • Unnecessary (a well-designed sample can provide excellent estimates)

The goal: Use sample statistics to estimate population parameters.

Sampling Methods

Probability Sampling

Every member of the population has a known, non-zero probability of being selected. These methods allow for valid statistical inference.

1. Simple Random Sampling (SRS)

Every possible sample of size nn has an equal chance of being selected.

Simple Random Sampling

Scenario: Survey 100 students from a university of 10,000.

Method:

  1. Get a list of all 10,000 students
  2. Assign each a number (1-10,000)
  3. Use a random number generator to select 100 numbers
  4. Survey those 100 students

Pros: Simple, unbiased, easy to analyze Cons: Need complete list, may miss subgroups

2. Stratified Sampling

Divide the population into strata (groups), then sample from each stratum.

Stratified Sampling

Scenario: Survey students about campus dining.

Method:

  1. Divide students into strata: Freshmen, Sophomores, Juniors, Seniors
  2. Randomly sample from each group proportionally (or equally)
StratumPopulationSample (10%)
Freshmen3,000300
Sophomores2,500250
Juniors2,300230
Seniors2,200220
Total10,0001,000

Pros: Ensures representation of all groups, more precise estimates Cons: Must know stratum membership in advance

3. Cluster Sampling

Divide population into clusters (often geographic), randomly select entire clusters, then sample everyone (or a random sample) within selected clusters.

Cluster Sampling

Scenario: Survey household income across a large city.

Method:

  1. Divide city into 500 neighborhoods (clusters)
  2. Randomly select 25 neighborhoods
  3. Survey all (or some) households in those 25 neighborhoods

Pros: Practical when no population list exists, reduces travel costs Cons: Less precise than SRS (households in clusters may be similar)

4. Systematic Sampling

Select every kthk^{th} individual from a list, starting from a random point.

Systematic Sampling

Scenario: Quality control on a production line.

Method:

  1. Production makes 1,000 items per day, want to test 50
  2. Calculate interval: k=1000/50=20k = 1000/50 = 20
  3. Randomly pick starting point (say, item #7)
  4. Test items: 7, 27, 47, 67, 87, … (every 20th)

Pros: Easy to implement, spreads sample across time/space Cons: Can be biased if there’s a periodic pattern

Non-Probability Sampling

Selection is not random. Useful for exploratory research but cannot generalize to populations.

1. Convenience Sampling

Sample whoever is easily available.

Convenience Sampling
  • Surveying shoppers at one mall
  • Using students in your class for a psychology study
  • Polling followers on social media

Problem: Your sample may not represent the population at all.

2. Voluntary Response Sampling

People choose to participate (e.g., online polls, call-in surveys).

3. Snowball Sampling

Existing participants recruit future participants. Useful for hard-to-reach populations.

Snowball Sampling

Scenario: Study experiences of undocumented immigrants.

Method:

  1. Find initial participants through community organizations
  2. Ask them to refer others who might participate
  3. Continue until desired sample size reached

Useful when: Population is hidden or hard to identify

Types of Studies

Observational Studies

Researchers observe and measure without intervening. No manipulation of variables.

Cross-Sectional Studies

Collect data at one point in time.

Cross-Sectional Study

Research question: Is coffee consumption associated with anxiety levels?

Method: Survey 1,000 adults today about their coffee consumption and anxiety symptoms.

Limitation: Can’t determine which came first (temporal ambiguity)

Longitudinal Studies

Follow the same subjects over time.

Longitudinal Study

Research question: Does childhood obesity predict adult health outcomes?

Method:

  • 1990: Measure BMI of 5,000 children (age 10)
  • 2000: Follow up (age 20)
  • 2010: Follow up (age 30)
  • 2020: Follow up (age 40)

Advantage: Can track changes over time Challenge: Expensive, participant dropout

Case-Control Studies

Compare people with a condition (cases) to those without (controls).

Case-Control Study

Research question: Is smoking associated with lung cancer?

Method:

  • Cases: 500 people with lung cancer
  • Controls: 500 similar people without lung cancer
  • Compare smoking history between groups

Advantage: Efficient for rare diseases Limitation: Relies on recall (potential bias)

Experimental Studies

Researchers actively intervene by assigning treatments to subjects.

Key Components of Experiments

1. Treatments: The conditions being compared (drug vs. placebo, new method vs. standard)

2. Experimental Units: The individuals receiving treatments (people, animals, plots of land)

3. Response Variable: The outcome being measured

4. Control Group: Group receiving no treatment or a placebo (for comparison)

5. Randomization: Random assignment of units to treatments (reduces bias)

6. Replication: Multiple units in each treatment group (allows estimation of variability)

Randomized Controlled Trial (RCT)

Research question: Does a new drug lower blood pressure?

Design:

  1. Recruit 200 patients with high blood pressure
  2. Randomly assign: 100 to drug, 100 to placebo
  3. Neither patients nor doctors know who gets what (double-blind)
  4. After 3 months, compare blood pressure between groups

Why this works:

  • Randomization ensures groups are similar at baseline
  • Blinding prevents placebo effect and researcher bias
  • Any difference in outcomes can be attributed to the drug

Observational vs. Experimental

AspectObservationalExperimental
InterventionNoYes
CausationCannot establishCan establish
EthicsOften more ethicalMay raise ethical concerns
Real-worldMore naturalMore artificial
ConfoundingPossibleControlled through randomization
ExampleSurvey about diet and healthClinical trial of new drug

Sources of Bias

Bias is systematic error that causes results to deviate from the truth in a consistent direction.

Selection Bias

The sample doesn’t represent the population.

Selection Bias Examples
  • 1936 Literary Digest Poll: Surveyed 2.4 million people, predicted Landon would beat Roosevelt. Wrong! Sample came from phone books and car registrations—overrepresented wealthy voters.
  • Survivor bias: Studying successful entrepreneurs while ignoring failed ones
  • Self-selection: Only people interested in a topic volunteer to participate

Response Bias

Participants don’t answer truthfully or accurately.

Causes include:

  • Social desirability: People underreport drinking, overreport exercise
  • Leading questions: “Don’t you agree that taxes are too high?”
  • Question wording: “Assistance to the poor” vs. “Welfare” get different responses
  • Recall bias: People misremember past events

Measurement Bias

The measurement process itself introduces errors.

Measurement Bias
  • Using bathroom scales that consistently read 2 lbs too high
  • Researchers who know the hypothesis unconsciously interpreting results favorably
  • Survey questions that are confusing or ambiguous

Non-Response Bias

People who don’t respond differ systematically from those who do.

Non-Response Bias

A phone survey about political opinions:

  • Only 30% of people answer the phone
  • Those who answer may be older (more landlines) or have more free time
  • Political opinions of responders may differ from non-responders

Survey Design Best Practices

Writing Good Questions

DO:

  • Use clear, simple language
  • Ask about one thing at a time
  • Provide balanced response options
  • Pilot test your survey

DON’T:

  • Use leading questions
  • Use double-barreled questions (“Do you like the taste and price?”)
  • Use jargon or technical terms
  • Make assumptions (“When did you stop smoking?”)
Good vs. Bad Survey Questions

Bad: “Don’t you think the government wastes too much money?”

  • Leading, suggests expected answer

Good: “How well do you think the government manages taxpayer money?”

  • Excellent / Good / Fair / Poor / Very Poor

Bad: “How often do you exercise and eat healthy?”

  • Double-barreled (two questions in one)

Good: Two separate questions about exercise and diet

Response Formats

TypeWhen to UseExample
Multiple choiceFinite, known options”What is your major?”
Likert scaleMeasure attitudes/opinions”Strongly disagree to Strongly agree”
Rating scaleMeasure intensity”Rate your pain from 0-10”
Open-endedExploratory, want details”What could we improve?”
RankingRelative preferences”Rank these 5 features by importance”

Designing Your Study: A Checklist

  1. Define your research question clearly
  2. Identify the population of interest
  3. Choose study type (observational vs. experimental)
  4. Select sampling method (probability preferred)
  5. Determine sample size (larger = more precise)
  6. Design instruments (surveys, measurements)
  7. Consider potential biases and how to minimize them
  8. Plan data collection logistics
  9. Pilot test before full implementation
  10. Document everything for reproducibility

Summary

In this lesson, you learned:

  • Populations are entire groups; samples are subsets we actually study
  • Probability sampling (SRS, stratified, cluster, systematic) allows valid inference
  • Non-probability sampling (convenience, voluntary) cannot generalize to populations
  • Observational studies measure without intervention; cannot prove causation
  • Experiments with randomization can establish causation
  • Bias (selection, response, measurement, non-response) threatens validity
  • Good survey design uses clear, neutral, focused questions

Practice Problems

1. A researcher wants to study smartphone usage among teenagers. She surveys students at her local high school during lunch. What type of sampling is this, and what are potential problems?

2. A pharmaceutical company tests a new pain medication by randomly assigning 100 patients to receive the drug and 100 to receive a sugar pill. Neither patients nor researchers know who received what. What type of study is this? What features make it rigorous?

3. An online news site asks readers to vote on whether they support a new policy. 75% vote “No.” Can we conclude that 75% of the population opposes the policy? Why or why not?

Click to see answers

1. This is convenience sampling. Problems:

  • Only represents one school (not all teenagers)
  • Students at lunch may differ from those elsewhere
  • Results cannot be generalized to all teenagers

2. This is a randomized controlled trial (RCT) that is double-blind. Rigorous features:

  • Random assignment (eliminates selection bias)
  • Control group (provides comparison)
  • Double-blinding (prevents placebo effect and researcher bias)
  • Replication (100 per group allows statistical analysis)

3. No! This is voluntary response sampling. People with strong opinions (especially negative) are more likely to respond. The 75% likely overestimates true opposition in the population.

Next Steps

Now that you understand data collection:

Advertisement

Was this lesson helpful?

Help us improve by sharing your feedback or spreading the word.