# Types of Data

## Types of Data Revision

**Types of Data**

We can **classify** data in a few different ways. First of all, we can **classify** data into different types by looking at what form it takes, and then **classify** these types depending on how it has been collected.

**Class 1: Qualitative**

**Qualitative/categorical data** is anything that isn’t a number, for example words. We usually obtain **qualitative/categorical data** by conducting a survey. Examples include:

- Football team names
- Favourite takeaways

**Class 2: Quantitative**

**Quantitative** data is numerical. There are two types of quantitative data, continuous and discrete.

**Continuous** data can take any numerical value. Examples of continuous data include:

- Height
- Weight
- Running race times

**Discrete** data can only take certain exact numerical values (typically whole numbers). Examples of discrete data include:

- Number of children
- Shoe size

**Primary Data**

**Primary data** is data that you collect first-hand.

**Examples:**

- Surveying members of the public
- Measuring the heights of your classmates

**Advantages**:

- You can ensure that the data is relevant
- You can ensure your data sample is reliable

**Disadvantages:**

- Collecting data can be very time-consuming
- Paying for resources (e.g. printing questionnaires), and/or participation in the survey means that it can be expensive

**Secondary Data**

**Secondary data** is data that has already been collected by someone else.

**Examples:**

- The results of a questionnaire that have been posted on the internet
- Statistics published in a newspaper

**Advantages**:

- It takes much less time than collecting data yourself
- Secondary data is either free, or at least much cheaper than collecting the data yourself

**Disadvantages**:

- The data available might not be suitable for your purposes
- You don’t know how the data was collected, meaning you can’t be sure that it is representative or fair

**Note:** It’s important to understand that we can combine the two classifications we’ve seen. In other words, data can be continuous and primary / secondary, or categorical and primary / secondary etc.

**Example 1: Types of Data**

Janet wants to learn some information about Jason. She learns three pieces of new information about him. For each one, state whether the data is discrete, continuous, or categorical.

**[3 marks]**

**a)** His hair colour.

His hair colour might be brown, blonde, ginger, but in any case, it’s not a number. Therefore, this is **categorical data**.

**b)** How many siblings he has.

The number of siblings Jason has can only be a whole number. Therefore, this is **discrete data**.

**c)** How fast he can solve a Rubik’s cube.

How fast he can solve the puzzle is given as a time so the result could be any value. Therefore, it is **continuous data**.

**Note:** Time is continuous, even through your stop watch may only count to the nearest hundredth of a second for example.

**Example 2: Types of Data**

Chidi wants to gather some data on people’s favourite food. He decides to use a survey he found online that was conducted 10 years earlier where 200 people were asked if their favourite food was Italian, Chinese, Indian, or, Thai.

**[3 marks]**

**a)** State which two of the following words describes the data Chidi is using:

** primary, secondary, categorical, discrete, continuous.**

The data is regarding people’s favourite type of food. Since this is not numerical, it must therefore be** categorical**. Secondly, he is using data that was collected by someone else so this is **secondary** data.

**b)** State one advantage and one disadvantage of Chidi choosing to use this type of data.

- One advantage of Chidi using this data is that he saves a lot of time compared to collecting it himself
- One disadvantage is that Chidi doesn’t know how the data was collected, so it might not be representative

Additionally, in this case the data is 10 years old and people’s preferences might have changed a lot in that time. Furthermore, the survey he found only gave people 4 choices – it is missing many other options.

## Types of Data Example Questions

**Question 1:** State whether the data for the following is categorical, discrete or continuous:

**a)** The heights of 12 dogs.

**b)** The lengths of 15 snakes.

**c)** The eye colours of students in a class.

**d)** The number of goals scored by members of the school’s football team.

**[4 marks]**

a) Since the heights of dogs can be of any value (including numbers that are not whole numbers), this data is **continuous**.

b) Since the lengths of snakes can be of any value (including numbers that are not whole numbers), this data is **continuous**.

c) Since the data collected will be in the form of words (blue, green, brown etc.), this data is **categorical**.

d) Since goals can only be counted in whole numbers, this data is **discrete**.

**Question 2:** Eleanor is measuring the length of everyone in her class’s hair. State:

**a)** Whether this data is primary or secondary.

**b)** Whether this data is categorical, discrete, or continuous.

**[2 marks]**

a) Since Eleanor is measuring the data herself, the data is **primary**.

b) She is measuring hair length, which can have any value, including values that are not whole numbers, so the data is **continuous**.

**Question 3:** Tahani says:

“People’s shoe sizes are based on the length of their feet, and since length is continuous, shoe size must also be continuous.” Explain why Tahani is wrong.

**[1 mark]**

Tahani is wrong because although a shoe size is based on foot length, the length of a person’s foot can be of any value, whereas shoe sizes have limited values (5, 5 and a half, 6, 6 and a half etc.).

**Question 4:** Michael wants to collect information from families in his town about the number of children they have. He chooses to question people directly to obtain this data.

**a)** State whether his data will be primary or secondary.

**b)** Give two advantages of choosing to use this type of data.

**[3 marks]**

a) Since Michael is collecting the data himself, it is **primary** data.

b) By collecting the data himself, he can ensure that the numbers are all accurately recorded.

A second advantage is that he can make efforts to make sure his sample is representative (he can ask people of different genders, races, ages, etc.). If he was using secondary data, he would have no control over who was being asked.

Note: other correct advantages are acceptable.

**Question 5**: Steve wants to obtain data from his 30 classmates about the performance of the striker, Harry Kane, in a recent match. Half of them are allowed to choose from the following six options:

- “The worst performance I have ever witnessed from any player ever!”
- “He had a nightmare!”
- “A below average performance.”
- “Not his fault the team lost.”
- “I wish he could play like that every week!”
- “No player in the world could have performed better than that!”

The other half of the class are asked to give him a rating out of 10.

**a)** Is the data that Steve obtains from the first set of data categorical, discrete quantitative or continuous quantitative data?

**b)** Is the data that Steve obtains from the second set of data qualitative, discrete quantitative or continuous quantitative data?

**c)** State two disadvantages for collecting data qualitatively in this example.

**[4 marks]**

a) Since the data that Steve collects from the first half of the class is worded data, this is **categorical** **data**.

b) Since the data that Steve collects from the second of the class is a number, this is quantitative data. Since the data can only take certain values (numbers between 1 and 10), the data is **discrete quantative data**.

c) The first disadvantage of collecting data in this way is that it is harder to analyse. It is much easier to analyse numerical data than worded data.

The second disadvantage is that there are only 6 options for the worded responses, whereas there are ten options for numbered responses between 0 and 10.