n a previous chapter, there is an introduction to the idea of 'regression' or finding a line-of-best-fit.
The three-median method is useful when it is assumed that there is a strong linear relation in the data.
There are some calculations hidden in this description and they are illustrated in the following example.
$x$x | $10$10 | $13$13 | $16$16 | $19$19 | $22$22 | $25$25 | $28$28 | $31$31 | $34$34 | $37$37 | $40$40 | $43$43 | $46$46 | $49$49 | $52$52 | $55$55 |
$y$y | $16$16 | $15$15 | $26$26 | $23$23 | $36$36 | $39$39 | $48$48 | $61$61 | $46$46 | $73$73 | $62$62 | $70$70 | $76$76 | $65$65 | $74$74 | $94$94 |
In this data set, there are $16$16 observations. So, there will be $5$5 data points in the outside groups and $6$6 in the central group.
The median data points are: $(16,26)$(16,26), $(32.5,53.5)$(32.5,53.5) and $(49,65)$(49,65).
The gradient of the regression line is: $\frac{65-26}{49-16}=\frac{39}{33}=\frac{13}{11}$65−2649−16=3933=1311.
The equation of the line joining the lower and upper median points is $\frac{13}{11}=\frac{y-26}{x-16}$1311=y−26x−16. After rearranging, this is $y=\frac{13}{11}x+\frac{78}{11}$y=1311x+7811.
So, at the central median point, where $x=32.5$x=32.5, the point on the line has $y$y-coordinate given by $y=\frac{13}{11}\times32.5+\frac{78}{11}$y=1311×32.5+7811. This simplifies to $y=45.5$y=45.5.
The $y$y-coordinate of the central median point is $53.5$53.5, which is $8$8 units above the line. So, we move the line vertically by $\frac{8}{3}$83.
The regression line must have the equation
$y=\frac{13x}{11}+\frac{322}{33}$y=13x11+32233
The data and the regression line are shown in the graph below.
The median points are coloured black. The blue line is the line joining the lower and upper median points. The vertical black line passes through the central median point. The red line is the $3$3-median regression line.
It is apparent in this case that the fit is not very good. We note that the position of the line depends on just three median values.and that at most five data points are needed to find these. (In this case, only four were needed.) As a consequence, most of the information in the data set is ignored and the likelihood of a good fit is reduced accordingly.