Grouped Data
Grouped Data Revision
Grouped Data
Grouped data is represented in a histogram or frequency polygon. We can use histograms to estimate the mean, median and standard deviation of data sets.
Make sure you are happy with the following topics before continuing.
Estimation from Histograms
(Note: for guidance on how to draw histograms, see Presenting Data.)
Since histograms collate data, it may seem impossible to answer questions such like how many data points are greater than 9, unless 9 is a class boundary. We can, however, estimate the answers to these questions by assuming frequency is evenly distributed across an entire class. Here is how to do it:
Example: Approximately how many values are greater than 12 in this histogram?
Draw a line at 12 on the x axis. This will split the second block. Then, the area of the graph to the right of the line is our estimate. In this case, the second block now extends from 12 to 20, with a height of 2, so a frequency of 2\times 8=16 comes from the second block. The third block has a length of 10 and a height of 3, so gives 30 frequency. In total, there are 16+30=46 values larger than 12.
Frequency Polygon
A frequency polygon is another way to represent grouped data. It is a line graph joining the points with co-ordinates (midpoint of class, frequency).
Example:
The midpoints are 5,14,21,25,33, so we plot the points:
(5,9)
(14,15)
(21,17)
(25,9)
(33,4)
and connect them with straight lines.
Estimating the Mean and Standard Deviation from a Histogram
Previously, when we used frequency tables to find the mean and standard deviation, we looked at x, fx and fx^{2}. While we clearly still have f, it is not obvious how we should get x. This is where the idea of midpoints comes in again.
To estimate the mean and standard deviation from a histogram, first turn the histogram into a table, then add a column of the midpoints of each class labelled x. Then, create columns fx and fx^{2} and find the totals of all of the columns. Finally, use these totals in the formulas for mean and standard deviation.
Recall: The formulas:
\text{mean}=\dfrac{\sum{fx}}{\sum{f}}
\text{variance}=\dfrac{\sum{fx^{2}}}{\sum{f}}-\text{mean}^{2}
\text{standard deviation}=\sqrt{\text{variance}}
Estimating the Median from a Histogram
To estimate the median from a histogram we use linear interpolation. This is where we assume that within each block, the frequency is evenly spaced.
To find the median, first find \sum{f} and divide it by 2 to find the position of the median (since this is an estimate, if we obtain a decimal we can treat it as if it is a whole number position). Then, find which block the position falls into. Then, within that block, find where it lies.
For example, if the median is the 7th position of a block with 10 values of length 5, then you would add \dfrac{7\times 5}{10}=3.5 to the lower bound of the block to find the median.
Example 1: Estimating the Mean and Standard Deviation from a Histogram
Find the mean and standard deviation of the data in the histogram below.
[6 marks]
Step 1: Create a table of the data from the histogram.
Step 2: Add columns for the midpoint (x), fx and fx^{2}.
Step 3: Use the formulas to find the mean and standard deviation.
\begin{aligned}\text{mean}&=\dfrac{\sum{fx}}{\sum{f}}\\[1.2em]&=\dfrac{138.5}{31}=4.47\\[1.2em]\text{variance}&=\dfrac{\sum{fx^{2}}}{\sum{f}}-\text{mean}^{2}\\[1.2em]&=\dfrac{759.75}{31}-4.47^{2}\\[1.2em]&=4.55\\[1.2em]\text{standard deviation}&=\sqrt{\text{variance}}\\[1.2em]&=\sqrt{4.55}\\[1.2em]&=2.13\end{aligned}
Example 2: Estimating the Median from a Histogram
Find the median of the data in the histogram from the previous example.
[3 marks]
There are 31 data points, so the median is the 15.5th data point. We can treat the decimal like it is a whole number position for our estimate. There are 15 data points in the first two blocks, so this falls 0.5 data points into the third block. Said block contains 8 data points and has a width of 1. So we are \dfrac{1\times 0.5}{8}=0.0625, so we are 0.0625 into the block. The block starts at 5, so the median is 5.0625.
Grouped Data Example Questions
Question 1: Create a histogram from the following table.
[4 marks]
Question 2: If \sum{f}=18, \sum{fx}=162 and \sum{fx^{2}}=2430, what is the variance?
[2 marks]
Question 3: Consider this histogram.
a) Estimate how many values are greater than 15.
b) Turn the values in the histogram into a frequency table.
c) What is the mean and standard deviation of the data in the histogram?
d) What is the median of the data in the histogram?
[10 marks]
a) A line at 15 would split the second block. To the right of 15 in this block is a width of 5 and a height of 28, for a total of 5\times 28=140 frequency. The third block has a width of 10 and a height of 12, for a total of 10\times 12=120 frequency. Overall, the number of values greater than 15 is 140+120=260
b)
c) Step 1: Using the table from the second question, create a table containing totals, midpoints, fx and fx^{2}.
Step 2: Use the formulas to find the mean and standard deviation.
\text{mean}=\dfrac{\sum{fx}}{\sum{f}}=\dfrac{8650}{700}=12.4
\text{variance}=\dfrac{\sum{fx^{2}}}{\sum{f}}-\text{mean}^{2}=\dfrac{141625}{700}-12.4^{2}=49.6
\text{standard deviation}=\sqrt{\text{variance}}=\sqrt{49.6}=7.04
d) The median is the 350th value, which falls within the second block. Since 160 values are in the first block, this is the 190th value of the second block. The second block has a width of 15 and a frequency of 420. So position 190 is
\dfrac{190\times 15}{420}=\dfrac{95}{14}Adding on the original 5 from the width of the first block gives a value of \dfrac{165}{14}, which is our median.