Grouped Data

A LevelAQAEdexcelOCR

Grouped Data

Grouped data is represented in a histogram or frequency polygon. We can use histograms to estimate the mean, median and standard deviation of data sets.

Make sure you are happy with the following topics before continuing.

A LevelAQAEdexcelOCR

Estimation from Histograms

(Note: for guidance on how to draw histograms, see Presenting Data.)

Since histograms collate data, it may seem impossible to answer questions such like how many data points are greater than $9$, unless $9$ is a class boundary. We can, however, estimate the answers to these questions by assuming frequency is evenly distributed across an entire class. Here is how to do it:

Example: Approximately how many values are greater than $12$ in this histogram?

Draw a line at $12$ on the $x$ axis. This will split the second block. Then, the area of the graph to the right of the line is our estimate. In this case, the second block now extends from $12$ to $20$, with a height of $2$, so a frequency of $2\times 8=16$ comes from the second block. The third block has a length of $10$ and a height of $3$, so gives $30$ frequency. In total, there are $16+30=46$ values larger than $12$.

A LevelAQAEdexcelOCR

Frequency Polygon

A frequency polygon is another way to represent grouped data. It is a line graph joining the points with co-ordinates (midpoint of class, frequency).

Example:

The midpoints are $5,14,21,25,33$, so we plot the points:

$(5,9)$

$(14,15)$

$(21,17)$

$(25,9)$

$(33,4)$

and connect them with straight lines.

A LevelAQAEdexcelOCR

Estimating the Mean and Standard Deviation from a Histogram

Previously, when we used frequency tables to find the mean and standard deviation, we looked at $x$, $fx$ and $fx^{2}$. While we clearly still have $f$, it is not obvious how we should get $x$. This is where the idea of midpoints comes in again.

To estimate the mean and standard deviation from a histogram, first turn the histogram into a table, then add a column of the midpoints of each class labelled $x$. Then, create columns $fx$ and $fx^{2}$ and find the totals of all of the columns. Finally, use these totals in the formulas for mean and standard deviation.

Recall: The formulas:

$\text{mean}=\dfrac{\sum{fx}}{\sum{f}}$

$\text{variance}=\dfrac{\sum{fx^{2}}}{\sum{f}}-\text{mean}^{2}$

$\text{standard deviation}=\sqrt{\text{variance}}$

A LevelAQAEdexcelOCR

Estimating the Median from a Histogram

To estimate the median from a histogram we use linear interpolation. This is where we assume that within each block, the frequency is evenly spaced.

To find the median, first find $\sum{f}$ and divide it by $2$ to find the position of the median (since this is an estimate, if we obtain a decimal we can treat it as if it is a whole number position). Then, find which block the position falls into. Then, within that block, find where it lies.

For example, if the median is the $7$th position of a block with $10$ values of length $5$, then you would add $\dfrac{7\times 5}{10}=3.5$ to the lower bound of the block to find the median.

A LevelAQAEdexcelOCR
A LevelAQAEdexcelOCR

Example 1: Estimating the Mean and Standard Deviation from a Histogram

Find the mean and standard deviation of the data in the histogram below.

[6 marks]

Step 1: Create a table of the data from the histogram.

Step 2: Add columns for the midpoint ($x$), $fx$ and $fx^{2}$.

Step 3: Use the formulas to find the mean and standard deviation.

\begin{aligned}\text{mean}&=\dfrac{\sum{fx}}{\sum{f}}\\[1.2em]&=\dfrac{138.5}{31}=4.47\\[1.2em]\text{variance}&=\dfrac{\sum{fx^{2}}}{\sum{f}}-\text{mean}^{2}\\[1.2em]&=\dfrac{759.75}{31}-4.47^{2}\\[1.2em]&=4.55\\[1.2em]\text{standard deviation}&=\sqrt{\text{variance}}\\[1.2em]&=\sqrt{4.55}\\[1.2em]&=2.13\end{aligned}

A LevelAQAEdexcelOCR

Example 2: Estimating the Median from a Histogram

Find the median of the data in the histogram from the previous example.

[3 marks]

There are $31$ data points, so the median is the $15.5$th data point. We can treat the decimal like it is a whole number position for our estimate. There are $15$ data points in the first two blocks, so this falls $0.5$ data points into the third block. Said block contains $8$ data points and has a width of $1$. So we are $\dfrac{1\times 0.5}{8}=0.0625$, so we are $0.0625$ into the block. The block starts at $5$, so the median is $5.0625$.

A LevelAQAEdexcelOCR

Grouped Data Example Questions

Question 1: Create a histogram from the following table.

[4 marks]

A Level AQAEdexcelOCR

Gold Standard Education

Question 2: If $\sum{f}=18$, $\sum{fx}=162$ and $\sum{fx^{2}}=2430$, what is the variance?

[2 marks]

A Level AQAEdexcelOCR
\begin{aligned}\text{mean}&=\dfrac{\sum{fx}}{\sum{f}}\\[1.2em]&=\dfrac{162}{18}\\[1.2em]&=9\\[1.2em]\text{variance}&=\dfrac{\sum{fx^{2}}}{\sum{f}}-\text{mean}^{2}\\[1.2em]&=\dfrac{\sum{fx^{2}}}{\sum{f}}-9^{2}\\[1.2em]&=\dfrac{2430}{18}-81\\[1.2em]&=135-81\\[1.2em]&=54\end{aligned}

Gold Standard Education

Question 3: Consider this histogram.

a) Estimate how many values are greater than $15$.

b) Turn the values in the histogram into a frequency table.

c) What is the mean and standard deviation of the data in the histogram?

d) What is the median of the data in the histogram?

[10 marks]

A Level AQAEdexcelOCR

a) A line at $15$ would split the second block. To the right of $15$ in this block is a width of $5$ and a height of $28$, for a total of $5\times 28=140$ frequency. The third block has a width of $10$ and a height of $12$, for a total of $10\times 12=120$ frequency. Overall, the number of values greater than $15$ is $140+120=260$

b)

c) Step 1: Using the table from the second question, create a table containing totals, midpoints, $fx$ and $fx^{2}$.

Step 2: Use the formulas to find the mean and standard deviation.

$\text{mean}=\dfrac{\sum{fx}}{\sum{f}}=\dfrac{8650}{700}=12.4$

$\text{variance}=\dfrac{\sum{fx^{2}}}{\sum{f}}-\text{mean}^{2}=\dfrac{141625}{700}-12.4^{2}=49.6$

$\text{standard deviation}=\sqrt{\text{variance}}=\sqrt{49.6}=7.04$

d) The median is the $350$th value, which falls within the second block. Since $160$ values are in the first block, this is the $190$th value of the second block. The second block has a width of $15$ and a frequency of $420$. So position $190$ is

$\dfrac{190\times 15}{420}=\dfrac{95}{14}$

Adding on the original $5$ from the width of the first block gives a value of $\dfrac{165}{14}$, which is our median.

Gold Standard Education

A Level

A Level

A Level

A Level

A Level