# Coding

A LevelEdexcel

## Coding

(Note: This topic is Edexcel only.)

Coding is where something is done to every data value to make them easier to work with. Indeed, we can add, subtract, multiply and divide our data values, and as long as we do the same thing to each data value, the mean and standard deviation of the new set can tell us the mean and standard deviation of the original values.

There are three skills you need to know for coding.

A LevelEdexcel

## Skill 1: Coding With Raw Data

To code a data point $x$, we turn it into $y=\dfrac{x-a}{b}$ where $a$ and $b$ are of our choosing. Then we can work with our new data set $y$ to get information about our original $x$.

Recall notation: $\text{mean of }x=\bar{x}$ and $\text{variance of }x= \text{var}(x)$

To get information about $x$ from our coded set $y$, we have the following formulas:

$\bar{y}=\dfrac{\bar{x}-a}{b}$

$\text{var}(y)=\dfrac{\text{var}(x)}{b^{2}}$

Example: Find the mean and variance of $14010,14030,14040,14060$

If we use the coding $y=\dfrac{x-14000}{10}$ we get $y=1,3,4,6$, which is much easier to work with.

So, find the mean and variance of the $y$ values:

$\bar{y}=\dfrac{1+3+4+6}{4}=3.5$

$\text{var}(y)=\dfrac{1^{2}+3^{2}+4^{2}+6^{2}}{4}-3.5^{2}=\dfrac{1+9+16+36}{4}-12.25=\dfrac{62}{4}-12.25=15.5-12.25=3.25$

Then, find the mean of the original values:

$3.5=\dfrac{\bar{x}-14000}{10}$

$\bar{x}-14000=35$

$\bar{x}=14035$

and the variance of the original values:

$3.25=\dfrac{\text{var}(x)}{10^{2}}$

$\text{var}(x)=325$

So the mean and variance of $14010,14030,14040,14060$ is $14035$ and $325$

A LevelEdexcel

## Skill 2: Coding With Summarised Data

We can also apply coding to simplify summarised data to calculate the mean and variance.

Example: Suppose we have $\sum{(x-100)}=18$ and $\sum{(x-100)^{2}}=45$ with ten data points.

The obvious coding to try is $y=x-100$

This gives $\sum{y}=18$ and $\sum{y^{2}}=45$

So, the mean and variance of the $y$ values are:

$\bar{y}=\dfrac{18}{10}=1.8$

$\text{var}(y)=\dfrac{45}{10}-1.8^2=4.5-3.24=1.26$

This gives results of:

$\bar{x}= \bar{y} +100 = 1.8 + 100 = 101.8$

and

$\text{var}(x)= \text{var} (y) = 1.26$

(the variance of $x$ is the same as the variance of $y$, since we only subtracted $100$ from each number).

A LevelEdexcel

## Skill 3: Coding With Grouped Data

We can code grouped data by coding the midpoint and calculating all our sums with the coded midpoint, then converting back at the end.

Example: Estimate the mean and variance of the heights of the teachers at a school from the following table.

Step 1: Find the midpoints.

Step 2: Choose a suitable coding for the midpoints. Here we should go for $y=\dfrac{x-1.55}{0.1}$. Now add the coded midpoints to the table.

Step 3: Proceed normally with the coded values, by adding an $fy$ column and an $fy^{2}$ column to the table.

Step 4: Calculate the mean and variance estimates for $y$.

$\bar{y}=\dfrac{87}{50}=1.74$

$\text{var}(y) =\dfrac{191}{50}-1.74^{2}=3.82-3.0276=0.7924$

Step 5: Use the coding formulas to turn these into values for $x$.

$1.74=\dfrac{\bar{x}-1.55}{0.1}$

$\bar{x}-1.55=0.174$

$\bar{x}=1.724$

$0.7924=\dfrac{\text{var}(x)}{0.1^{2}}$

$\text{var}(x)=0.007924$

A LevelEdexcel

## Coding Example Questions

Question 1: By applying a suitable coding, find the mean and standard deviation of the data set

$1000,1010,1030,1040,1070$

[4 marks]

A Level Edexcel

A suitable coding would be $y=\dfrac{x-1000}{10}$, which gives $y=0,1,3,4,7$ as the new data set.

$\bar{y}=\dfrac{0+1+3+4+7}{5}=\dfrac{15}{5}=3$

$\text{var}(y)=\dfrac{0^{2}+1^{2}+3^{2}+4^{2}+7^{2}}{5}-3^{2}=\dfrac{1+9+16+49}{5}-9=\dfrac{75}{5}-9=15-9=6$

$3=\dfrac{\bar{x}-1000}{10}$

$\bar{x}-1000=30$

$\bar{x}=1030$

$6=\dfrac{\text{var}(x)}{10^{2}}$

$6=\dfrac{\text{var}(x)}{100}$

$\text{var}(x)=600$

$\text{var}(x)$ is the variance, while the question asked for standard deviation.

$\sigma=\sqrt{600}=24.5$ ($3$ sf)

Gold Standard Education

Question 2: From the summary statistics below, find the mean and variance of $x$, provided that there are $8$ data points.

$\sum{\dfrac{x-4}{12}}=24$

$\sum{\left(\dfrac{x-4}{12}\right)^{2}}=120$

[2 marks]

A Level Edexcel

Clearly, we should use the encoding $y=\dfrac{x-4}{12}$ to get:

$\sum{y}=24$

$\sum{y^{2}}=120$

So:

$\bar{y}=\dfrac{24}{8}=3$

$\text{var}(y)=\dfrac{120}{8}-3^{2}=15-9=6$

Now convert back to $x$:

$3=\dfrac{\bar{x}-4}{12}$

$\bar{x}-4=36$

$\bar{x}=40$

$6=\dfrac{\text{var}(x)}{12^{2}}$

$6=\dfrac{\text{var}(x)}{144}$

$\text{var}(x)=864$

Gold Standard Education

Question 3: Use a suitable encoding to estimate the mean and variance of the grouped data presented below.

[6 marks]

A Level Edexcel

Step 1: Find the midpoints of each class and add them to the table.

Step 2: Choose a suitable coding. In this case, there are a few possibilities, but the one shown here is $y=\dfrac{x-1725}{25}$.

Step 3: Proceed to create the rest of the table, with an $fy$ and an $fy^{2}$ column.

Step 4: Calculate the mean and variance for $y$.

$\bar{y}=\dfrac{445}{100}=4.45$

$\text{var}(y)=\dfrac{3185}{100}-4.45^{2}=31.85-19.8025=12.0475$

Step 5: Convert back into $x$ values.

$4.45=\dfrac{\bar{x}-1725}{25}$

$\bar{x}-1725=111.25$

$\bar{x}=1836.25$

$12.0475=\dfrac{\text{var}(x)}{25^{2}}$

$12.0475=\dfrac{\text{var}(x)}{625}$

$\text{var}(x)=7529.6875$ or $7530$ to $3$ significant figures.