# Data Collection

## Data Collection Revision

**Data Collection**

Data can be separated into two different categories: **primary data** and **secondary data**.

**Primary data** is data that you have collected yourself, e.g. by conducting a **survey **within your class. If you have used data that someone else has collected, this is called **secondary data**, e.g. you have found data on a **website**.

When collecting **primary data**, it is important that you display it in a form which makes it easy to **analyse **and is **suitable** for the type of data that you have.

**Different Types of Data**

**Qualitative Data**

Qualitative data is the collection of **non-numerical** values, for example the names of students in a classroom or the different items on a food menu.

**Quantitative Data**

As expected, quantitative data **measures quantities using numbers**, for example recording the **ages **of people at a birthday party or the **weights** of some people.

There are two different types of quantitative data, which are:

**Discrete Data** – the values collected can only exact values. For example, the number of people sat in a restaurant can only be a **whole number**, it is impossible to have 37.6 people.

**Continuous Data** – the values collected can take any value in the given range. For example, the **heights** of children in a classroom could take any value within a sensible range, e.g. 1.34 \text{ m}.

**Organising Data**

When collecting data and displaying it in a table, it is almost always required to group the data found into **classes**, therefore making it easier to work with and analyse.

It is important that the classes you select **do not overlap**, whilst still covering all the possible values. This is to ensure that no data is accounted for twice or missed out.

For **discrete data** the classes will have gaps between them, e.g. 0-5 people, 6-10 people. This is because there can be no data between 5 and 6. However, for **continuous data** there can be no gaps, so they should be separated using **inequalities**.

**Example: **Cohen wants to conduct a **survey** about the **ages** (in whole years) of people watching a film at the cinema.

Design a **table** he could use to collect and represent this data.

For the table to be used to both collect the data and show it, including a column for the ‘**Tally**‘ and a column for the ‘**Frequency**‘ is necessary.

As shown in the table on the right, **none of the classes overlap**, as there are gaps in between as this data will be discrete.

To make sure all options are covered for, it is sometimes required to use classes like ‘**.. or less**‘, ‘**… or over**‘ or ‘**other**‘. This allows you to include all possible values with a sensible number of classes.

**Questionnaires**

Asking people to complete a **questionnaire **is another method to record data.

To make sure your questionnaire can be used as a reliable source of data it needs to have:

- No
**ambiguous**questions - No
**overlapping**questions **Fair**questions – i.e. not questions that are leading or contain some**bias**or**opinion**within them

**Example:** Create a **questionnaire** to find out how much time some students spend exercising.

To make the data collected easier to use, it is mostly easier to use **multiple choice questions** that are split into **classes**. As mentioned previously, it is important that these classes do not **overlap **and that they also cover every possible scenario.

Furthermore, specifying a **time frame** for a questionnaire like this is imperative, otherwise your data collected will not be very useful to use as one person could have answered how much they exercise per **day** and another person could have interpreted it as how much they exercise per **week**.

Here is an example of a good questionnaire for this scenario.

## Data Collection Example Questions

**Question 1:** Luke wants to collect some data about the time taken for some of his classmates to complete a running race.

Decide whether the data he collects will be discrete or continuous data.

Give a reason for your answer.

**[1 mark]**

The data that will be collected will be continuous data, as time is a continuous measurement, with an infinite amount of possible times in any given range.

**Question 2:** A teacher wants to record the amount of marks achieved in a test by their students. The maximum number of marks is 80.

Design a suitable table they could use to collect this data.

**[2 marks]**

As this will be discrete data, there should be gaps between the classes, because marks can only be a whole number. We also know that a student cannot get more than 80 marks on the test, so using a class with ‘… or over’ in would be unnecessary.

Here is an example of a suitable table to use:

**Question 3:** Megan wants to find out how much money some university students spend of coffee.

She creates this questionnaire.

Write down **two** things wrong with this questionnaire.

**[2 marks]**

- The question doesn’t include a time frame for the amount of money spend on coffee.
- The classes overlap, which could cause data to be recorded twice or the same data to be recorded in different classes. For example, if someone spent £3 on coffee, they could tick the first or second option.