Correlation of Heights: Measuring Heredity
Exploring the Linear Relationship between Parent and Offspring Height using Pearson's Statistical Models.
This project explores the fascinating connection between parent and offspring height through the lens of statistical analysis. Using Pearsonβs Correlation Coefficient and scatter diagrams, it examines how heredity influences stature while highlighting Sir Francis Galtonβs groundbreaking concept of βRegression toward the Mean.β By analyzing real-world height data, the study demonstrates how genetics and environmental factors combine to shape human growth patterns.
1. Abstract & Objective
This project investigates the degree to which height is inherited by analyzing the relationship between the heights of parents and their adult children. We aim to determine the Correlation Coefficient and visualize the data through Scatter Diagrams to verify Sir Francis Galtonβs theory of "Regression toward the Mean."
2. Historical Context: The Galton Study
In 1886, Francis Galton published his findings on hereditary stature. He discovered that while tall parents tend to have tall children, the childrenβs heights are usually closer to the population average than their parents' were. This project uses his statistical framework to analyze modern data.
3. Mathematical Framework
The primary tool for this analysis is the Pearson Correlation Coefficient ($r$). This value indicates the strength and direction of the linear relationship.
Interpreting the Coefficient ($r$):
- $r = +1$: Perfect positive correlation.
- $0.7 \leq r < 1$: Strong positive relationship.
- $0.3 \leq r < 0.7$: Moderate positive relationship.
- $r = 0$: No linear correlation.
4. Investigative Methodology
Step 1: Data Collection
Gather height data from 30-50 pairs of parents (Mid-parent height) and their adult children. Use the formula:
Mid-parent height ($x$): $\frac{\text{Father's height} + (1.08 \times \text{Mother's height})}{2}$
Step 2: Tabulation
| Subject | Parent Height ($x$) | Child Height ($y$) | $x^2$ | $y^2$ | $xy$ |
|---|---|---|---|---|---|
| 1 | 175 cm | 178 cm | 30625 | 31684 | 31150 |
| ... [Collect 30+ Data Points] ... | |||||
Step 3: Scatter Diagram
Plot the data points on a graph where the x-axis represents parent height and the y-axis represents child height. Observe the spread of points around the "Line of Best Fit."
5. Analysis & Conclusion
By applying the $r$ formula to your gathered data, you will likely find a strong positive correlation ($r \approx 0.6$ to $0.8$). This confirms that while height is heavily influenced by genetics, environmental factors and the "regression toward the mean" prevent a perfect $1:1$ ratio.

