Correlation of Heights: Measuring Heredity

Exploring the Linear Relationship between Parent and Offspring Height using Pearson's Statistical Models.

This project explores the fascinating connection between parent and offspring height through the lens of statistical analysis. Using Pearson’s Correlation Coefficient and scatter diagrams, it examines how heredity influences stature while highlighting Sir Francis Galton’s groundbreaking concept of β€œRegression toward the Mean.” By analyzing real-world height data, the study demonstrates how genetics and environmental factors combine to shape human growth patterns.

1. Abstract & Objective

This project investigates the degree to which height is inherited by analyzing the relationship between the heights of parents and their adult children. We aim to determine the Correlation Coefficient and visualize the data through Scatter Diagrams to verify Sir Francis Galton’s theory of "Regression toward the Mean."

2. Historical Context: The Galton Study

In 1886, Francis Galton published his findings on hereditary stature. He discovered that while tall parents tend to have tall children, the children’s heights are usually closer to the population average than their parents' were. This project uses his statistical framework to analyze modern data.

3. Mathematical Framework

The primary tool for this analysis is the Pearson Correlation Coefficient ($r$). This value indicates the strength and direction of the linear relationship.

$r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}$

Interpreting the Coefficient ($r$):

  • $r = +1$: Perfect positive correlation.
  • $0.7 \leq r < 1$: Strong positive relationship.
  • $0.3 \leq r < 0.7$: Moderate positive relationship.
  • $r = 0$: No linear correlation.

4. Investigative Methodology

Step 1: Data Collection

Gather height data from 30-50 pairs of parents (Mid-parent height) and their adult children. Use the formula:

Mid-parent height ($x$): $\frac{\text{Father's height} + (1.08 \times \text{Mother's height})}{2}$

Step 2: Tabulation

Subject Parent Height ($x$) Child Height ($y$) $x^2$ $y^2$ $xy$
1 175 cm 178 cm 30625 31684 31150
... [Collect 30+ Data Points] ...

Step 3: Scatter Diagram

Plot the data points on a graph where the x-axis represents parent height and the y-axis represents child height. Observe the spread of points around the "Line of Best Fit."

5. Analysis & Conclusion

By applying the $r$ formula to your gathered data, you will likely find a strong positive correlation ($r \approx 0.6$ to $0.8$). This confirms that while height is heavily influenced by genetics, environmental factors and the "regression toward the mean" prevent a perfect $1:1$ ratio.

Correlation of heights infographic showing parent and child height scatter plot with regression line
A visual exploration of how genetics influences human height across generations.