Statistical and probability models: Calculate central tendencies and dispersion of data
Unit 3: Measures of dispersion of ungrouped data
Dylan Busa
Unit 3: Measures of dispersion of ungrouped data
By the end of this unit you will be able to:
- Calculate the range, inter-quartile range and semi-inter-quartile range.
- Calculate percentiles and quartiles.
What you should know
Before you start this unit, make sure you can:
Calculate the mean, median and mode of a data set. If you need help with this, review Unit 2 in this subject outcome.
Introduction
We saw in the previous unit that a data set is not always nicely and evenly or symmetrically distributed about the mean. Sometimes it is skewed, either positively or negatively. We also saw that we can tell if a data set is skewed by looking at the value of mean – median.
While the value of mean – median might tell us whether a data set is skewed or not, it is not a very good method of working out by how much the data is skewed or how the data is distributed. The measures of central tendency do not fully describe a data set and can often be misleading without some indication of how spread out or variable the data is. For this, we need to look at how the data is actually spread out or dispersed. To do this, we use several measures of dispersion.
But what is dispersion anyway? Let’s investigate one such measure, the range, with an activity.
The range
Activity 3.1: Measure dispersion
Time required: 10 minutes
What you need:
- a pen or pencil
- a piece of paper
What to do:
Consider these two sets of data:
Data set A: [latex]\scriptsize \{8,\ 6,\ 7,\ 11,\ 13,\ 10,\ 11,\ 9,\ 9,\ 10\}[/latex]
Data set B: [latex]\scriptsize \{3,\ 1,\ 8,\ 16,\ 12,\ 1,\ 17,\ 13,\ 21,\ 18,\ 2,\ 7,\ 9,\ 12,\ 1\}[/latex]
- For each data set, calculate the mean.
- What do you notice about the means? Do you think this means that the data sets are the same?
- What are the smallest and biggest values in each data set?
- What is the difference between the biggest and smallest values in each data set?
- Represent each data set on a number line by drawing a dot above a number on the line to represent each element in the set. If two entries have the same value, draw one dot above the other. Also indicate the mean with a vertical line.
- What do you notice about how spread out the data in each set is? How does this correspond to your answer to question 4?
What did you find?
- [latex]\scriptsize {{\bar{x}}_{\text{A}}}=\displaystyle \frac{{94}}{{10}}=9.4[/latex] [latex]\scriptsize {{\bar{x}}_{\text{B}}}=\displaystyle \frac{{141}}{{15}}=9.4[/latex]
- Both means are the same. This does not, however, mean that the data sets are the same. For one, Data set A has ten values, while Data set B has [latex]\scriptsize 15[/latex].
- Data set A: smallest value is [latex]\scriptsize 6[/latex] and biggest value is [latex]\scriptsize 13[/latex].
Data set B: smallest value is [latex]\scriptsize 1[/latex] and biggest value is [latex]\scriptsize 21[/latex]. - Data set A: [latex]\scriptsize 13-6=7[/latex]
Data set B: [latex]\scriptsize 21-1=20[/latex] - Data set A:
Data set B:
- The data in set B is far more spread out than in set A. It makes sense then that the difference between the biggest and smallest value in set B was greater than in set A.
The difference that we calculated between the biggest and smallest value in the data set in Activity 3.1 is called the range of the data set, and it is one of the measures of dispersion that we use to describe a data set. We saw that the data in set B was more spread out, or dispersed, than the data in set A, even though their means were the same. Therefore, a measure of dispersion (like the range) helps us to describe a data set more accurately, together with the measures of central tendency we learnt about in Unit 2.
By its nature, the range is very sensitive to outliers.
Take note!
Range
The range of a data set is the difference between the maximum and minimum values in the set.
Percentiles
Consider the following situation.
Walter and Thabisang were first year mathematics students who applied for a tutor job at a local community college. One of the criteria for being awarded the job was the applicant’s rank in their university class.
Thabisang was ranked [latex]\scriptsize 30\text{th}[/latex] in her Mathematics class at university while Walter was ranked [latex]\scriptsize 15\text{th}[/latex] in his Mathematics class at another university.
At the moment it would seem that Walter had a higher ranking. However, we have no idea how many other students were in either applicant’s class and, therefore, we cannot really compare their rankings. More information is needed to arrive at an informed conclusion.
How do their ranks compare if we include the fact that there were [latex]\scriptsize 50[/latex] students in Walter’s class and [latex]\scriptsize 150[/latex] students in Thabisang’s class?
If Thabisang was [latex]\scriptsize 30\text{th}[/latex] out of [latex]\scriptsize 150[/latex] students, this means that she was in the top [latex]\scriptsize 20\%[/latex] of students. [latex]\scriptsize \left( {\displaystyle \frac{{30}}{{150}}\times 100=20\%} \right)[/latex]. [latex]\scriptsize 80\%[/latex] of the students were ranked below her.
If Walter was [latex]\scriptsize 15\text{th}[/latex] out of [latex]\scriptsize 50[/latex] students, this means that he was in the top [latex]\scriptsize 30\%[/latex] of students. [latex]\scriptsize \left( {\displaystyle \frac{{15}}{{50}}\times 100=30\%} \right)[/latex]. [latex]\scriptsize 70\%[/latex] of the students were ranked below him.
So, even though Thabisang’s ranking seemed lower, in the context of her bigger class, she was actually better ranked. She was in the top [latex]\scriptsize 20\%[/latex] of students while Walter was only in the top [latex]\scriptsize 30\text{ }\!\!%\!\!\text{ }[/latex] of students. Which student would you hire?
Because [latex]\scriptsize 80\%[/latex] of students performed worse than she did, we say that Thabisang was in the [latex]\scriptsize 80\text{th}[/latex] percentile. Walter was in the [latex]\scriptsize 70\text{th}[/latex] percentile because [latex]\scriptsize 70\%[/latex] of students were ranked lower than him.
Percentiles divide sets of data into 100 equal parts. 100% is the basis of measure, hence the name percentile. Percentiles give us an excellent way to see how one value compares to other values in the same dataset.
Can you see that the median is really the same thing as the [latex]\scriptsize 50\text{th}[/latex] percentile? Exactly half ([latex]\scriptsize 50\%[/latex]) of the values lie above it and half lie below it.
Here is another data set: [latex]\scriptsize 14.2,\text{ }13.9,\text{ }19.8,\text{ }10.3,\text{ }13.0,\text{ }11.1[/latex]. If we arrange it in ascending order, we can find the rank of each value and its percentile (see Table 1).
Because there are six values, it is easy to divide the data into six equal groups (one value per group). This means that [latex]\scriptsize 10.3[/latex] is at the zero percentile. There are [latex]\scriptsize 0\%[/latex] of values below it and [latex]\scriptsize 100\%[/latex] of values above it.
[latex]\scriptsize 11.1[/latex] is at the [latex]\scriptsize 20\text{th}[/latex] percentile. There is one value smaller than it (one of the remaining five values or [latex]\scriptsize 20\%[/latex]) and four values greater (four of the remaining five values or [latex]\scriptsize 80\%[/latex]). [latex]\scriptsize 14.2[/latex] is at the [latex]\scriptsize 80\text{th}[/latex] percentile. There is one value greater than it and four values smaller.
The zero percentile is always the smallest value in a data set. The [latex]\scriptsize 100\text{th}[/latex] percentile is always the biggest value in a data set.
We can determine the rank ([latex]\scriptsize r)[/latex] of a value in a data set of [latex]\scriptsize n[/latex] values at any percentile ([latex]\scriptsize p[/latex]) using the formula [latex]\scriptsize r=\displaystyle \frac{p}{{100}}(n-1)+1[/latex].
In the above case we can work out what value is at the [latex]\scriptsize 60\text{th}[/latex] percentile as follows:
[latex]\scriptsize \begin{align*}r=\displaystyle \frac{{60}}{{100}}(6-1)+1\\=4\end{align*}[/latex]
The fourth value ([latex]\scriptsize 13.9[/latex]) is at the [latex]\scriptsize 60\text{th}[/latex] percentile.
Take note!
Percentile
The [latex]\scriptsize p\text{th}[/latex] percentile is the value, [latex]\scriptsize v[/latex], that divides a data set into two parts, such that [latex]\scriptsize p[/latex] percent of the values in the data set are less than [latex]\scriptsize v[/latex], and [latex]\scriptsize 100-p[/latex] percent of the values are greater than [latex]\scriptsize v[/latex]. Percentiles can only lie in the range [latex]\scriptsize 0\le p\le 100[/latex].
The rank of the value at the [latex]\scriptsize p\text{th}[/latex] percentile can be calculated using [latex]\scriptsize r=\displaystyle \frac{p}{{100}}(n-1)+1[/latex].
Example 3.1
Determine the quartiles of the following data set.
[latex]\scriptsize \{27,\text{ }45,\text{ }11,\text{ 1}3,\text{ }9,\text{ 1}5,\text{ }31,\text{ 1}7,\text{ }16,\text{ }40,\text{ }12,\text{ 1}6,\text{ 3},\text{ 11 }\!\!\}\!\!\text{ }[/latex]
Solution
Quartiles, as the word suggests, means percentiles that cut the data into four groups where each group contains the same number of values. Therefore, the quartiles are the [latex]\scriptsize 25\text{th}[/latex], [latex]\scriptsize 50\text{th}[/latex] and [latex]\scriptsize 75\text{th}[/latex] percentiles.
First, we need to sort the dataset into increasing order.
[latex]\scriptsize \text{ }3,\text{ }9,\text{ 11},\text{ }11,\text{ }12,\text{ 1}3,\text{ 1}5,\text{ }16,\text{ 1}6,\text{ 1}7,\text{ }27,\text{ }31,\text{ }40,\text{ }45[/latex]
Next, we can find the rank of the value at the [latex]\scriptsize 25\text{th}[/latex] percentile. There are [latex]\scriptsize 14[/latex] values in the dataset. Hence [latex]\scriptsize n=14[/latex].
[latex]\scriptsize \begin{align*}r&=\displaystyle \frac{p}{{100}}(n-1)+1\\&=\displaystyle \frac{{25}}{{100}}(14-1)+1\\&=4.25\end{align*}[/latex]
The rank is given as a fraction. Therefore, the [latex]\scriptsize 25\text{th}[/latex] percentile lies between the fourth and the fifth value. When this happens, we take the value halfway between these values i.e. the mean. So, the [latex]\scriptsize 25\text{th}[/latex] percentile is [latex]\scriptsize \displaystyle \frac{{11+12}}{2}=11.5[/latex]
[latex]\scriptsize 50\text{th}[/latex] percentile:
[latex]\scriptsize \begin{align*}r&=\displaystyle \frac{p}{{100}}(n-1)+1\\&=\displaystyle \frac{{50}}{{100}}(14-1)+1\\&=7.5\end{align*}[/latex]
The [latex]\scriptsize 50\text{th}[/latex] percentile is between the seventh and eighth value: [latex]\scriptsize \displaystyle \frac{{15+16}}{2}=15.5[/latex]
[latex]\scriptsize 75\text{th}[/latex] percentile:
[latex]\scriptsize \begin{align*}r&=\displaystyle \frac{p}{{100}}(n-1)+1\\&=\displaystyle \frac{{75}}{{100}}(14-1)+1\\&=10.75\end{align*}[/latex]
The [latex]\scriptsize 75\text{th}[/latex] percentile is between the [latex]\scriptsize 10\text{th}[/latex] and [latex]\scriptsize 11\text{th}[/latex] value: [latex]\scriptsize \displaystyle \frac{{17+27}}{2}=22[/latex]
Exercise 3.1
Determine the deciles (the percentiles that divide a dataset into ten groups i.e. the [latex]\scriptsize 10\text{th}[/latex], [latex]\scriptsize 20\text{th}[/latex], [latex]\scriptsize 30\text{th}[/latex], … percentiles).
[latex]\scriptsize 20, 29, 32, 35, 46, 51, 54, 60, 68, 68, 72, 76, 78, 83, 86, 89, 91, 92, 98, 101, 109, 114, 117, 118, 123, 126, 130, 135, 139, 144[/latex]
The full solutions are at the end of the unit.
Inter-quartile range
In Example 3.1, we found the quartiles of a dataset – the [latex]\scriptsize 25\text{th}[/latex], [latex]\scriptsize 50\text{th}[/latex] and [latex]\scriptsize 75\text{th}[/latex] percentiles. We often make use of these quartiles to describe a dataset and its characteristics and so give these quartiles special names. We call the [latex]\scriptsize 25\text{th}[/latex] percentile the first or lower quartile ([latex]\scriptsize Q1[/latex]), the median or [latex]\scriptsize 50\text{th}[/latex] percentile the second quartile ([latex]\scriptsize Q2[/latex]), and the [latex]\scriptsize 75\text{th}[/latex] percentile the third or upper quartile ([latex]\scriptsize Q3[/latex]).
We can get a sense of how tightly or widely dispersed a dataset is by calculating not just the range (the difference between the biggest and smallest value or between the zero and [latex]\scriptsize 100\text{th}[/latex] percentiles), but also the difference between [latex]\scriptsize Q1[/latex] and [latex]\scriptsize Q3[/latex] (called the inter-quartile range) and comparing the answers we get.
The inter-quartile range gives us the range of the middle half of the data. We can also find the middle of this range, a measure called the semi-inter-quartile range.
Example 3.2
A high school has two cricket teams: a junior and a senior team. The junior team consists of [latex]\scriptsize 17[/latex] players (including reserves) and the senior team consists of [latex]\scriptsize 16[/latex] players (including reserves). The mass of each team member is given below. Use the data to answer the questions that follow.
Junior team masses (kg)
[latex]\scriptsize \{56,\ 60,\ 67,\ 45,\ 51,\ 53,\ 64,\ 49,\ 56,\ 48,\ 42,\ 51,\ 64,\ 52,\ 64,\ 49,\ 50\}[/latex]
Senior team masses (kg)
[latex]\scriptsize \{88,\ 81,\ 53,\ 62,\ 83,\ 68,\ 70,\ 62,\ 91,\ 78,\ 64,\ 74,\ 73,\ 54,\ 62,\ 62\}[/latex]
- Calculate the range of both datasets.
- Calculate the inter-quartile range of both datasets.
- Calculate the semi-inter-quartile range of both datasets.
- Which dataset is more dispersed?
Solutions
- Junior team: [latex]\scriptsize 42,\ 45,\ 48,\ 49,\ 49,\ 50,\ 51,\ 51,\ 52,\ 53,\ 56,\ 56,\ 60,\ 64,\ 64,\ 64,\ 67[/latex]
The range is [latex]\scriptsize 67-42=25[/latex]
Senior team: [latex]\scriptsize 53,\ 54,\ 62,\ 62,\ 62,\ 62,\ 64,\ 68,\ 70,\ 73,\ 74,\ 78,\ 81,\ 83,\ 88,\ 91[/latex]
The range is [latex]\scriptsize 91-53=38[/latex] - Junior team:
[latex]\scriptsize Q1[/latex]:
[latex]\scriptsize \begin{align*}r=\displaystyle \frac{{25}}{{100}}(17-1)+1\\=5\end{align*}[/latex]
Therefore [latex]\scriptsize Q1=49[/latex]
[latex]\scriptsize Q3[/latex]:
[latex]\scriptsize \begin{align*}r=\displaystyle \frac{{75}}{{100}}(17-1)+1\\=13\end{align*}[/latex]
Therefore, [latex]\scriptsize Q3=60[/latex]
Therefore the inter-quartile range is [latex]\scriptsize Q3-Q1=60-49=11[/latex]Senior team:
[latex]\scriptsize Q1[/latex]:
[latex]\scriptsize \begin{align*}r=\displaystyle \frac{{25}}{{100}}(16-1)+1\\=4.75\end{align*}[/latex]
Therefore [latex]\scriptsize Q1=\displaystyle \frac{{62+62}}{2}=62[/latex]
[latex]\scriptsize Q3[/latex]:
[latex]\scriptsize \begin{align*}r=\displaystyle \frac{{75}}{{100}}(16-1)+1\\=12.25\end{align*}[/latex]
Therefore, [latex]\scriptsize Q3=\displaystyle \frac{{78+81}}{2}=79.5[/latex]
Therefore the inter-quartile range is [latex]\scriptsize Q3-Q1=79.5-62=17.5[/latex] - Junior team: semi-inter-quartile range is [latex]\scriptsize \displaystyle \frac{{11}}{2}=5.5[/latex]
Senior team: semi-inter-quartile range is [latex]\scriptsize \displaystyle \frac{{17.5}}{2}=8.75[/latex] - By all measures the senior team data is more dispersed. It has both a greater range as well as a greater inter-quartile range. Therefore, not only is there a greater difference between the largest and smallest values but the middle [latex]\scriptsize 50\%[/latex] of the data is also more spread out over a greater range.
Exercise 3.2
Class A and Class B both wrote a test out of [latex]\scriptsize 50[/latex] marks. The results of each class are given. Use the data to answer the questions that follow.
Class A: [latex]\scriptsize \{36,\ 39,\ 49,\ 18,\ 36,\ 24,\ 38,\ 36,\ 28,\ 27,\ 30,\ 42,\ 48,\ 45,\ 39,\ 21,\ 29,\ 31,\ 34\}[/latex]
Class B: [latex]\scriptsize \{34,\ 34,\ 19,\ 27,\ 36,\ 26,\ 39,\ 39,\ 29,\ 39,\ 35,\ 46,\ 41,\ 35,\ 29,\ 25,\ 35,\ 37,\ 31,\ 39,\ 42,\ 28\}[/latex]
- Calculate the mean and median for each class.
- Calculate [latex]\scriptsize Q1[/latex] and [latex]\scriptsize Q3[/latex] for each class.
- Calculate the inter-quartile range for each class.
- What would you have had to score to be in the [latex]\scriptsize 80\text{th}[/latex] percentile in each class?
- Which class did better? Explain your answer.
The full solutions are at the end of the unit.
Take note!
Quartiles
- [latex]\scriptsize Q1[/latex]: first or lower quartile ([latex]\scriptsize 25\text{th}[/latex] percentile)
- [latex]\scriptsize Q2[/latex]: second quartile (the median or [latex]\scriptsize 50\text{th}[/latex] percentile)
- [latex]\scriptsize Q3[/latex]: third or upper quartile ([latex]\scriptsize 75\text{th}[/latex] percentile)
- Inter-quartile range: [latex]\scriptsize Q3-Q1[/latex]
- Semi-inter-quartile range: [latex]\scriptsize \displaystyle \frac{{Q3-Q1}}{2}[/latex]
Summary
In this unit you have learnt the following:
- Measures of dispersion tell us how spread out or dispersed the values in a dataset are.
- The range is the difference between the biggest value and the smallest value.
- A percentile is the value below which a given percentage of scores fall. The [latex]\scriptsize 40\text{th}[/latex] is that value in a data set below which [latex]\scriptsize 40\%[/latex] of the values lie and above which [latex]\scriptsize 60\%[/latex] of the values lie.
- The rank of the value at the [latex]\scriptsize p\text{th}[/latex] percentile for a dataset of [latex]\scriptsize n[/latex] values can be calculated using [latex]\scriptsize r=\displaystyle \frac{p}{{100}}(n-1)+1[/latex].
- Quartiles divide a dataset into four equal groups. The quartiles are:
- [latex]\scriptsize Q1[/latex]: first or lower quartile ([latex]\scriptsize 25\text{th}[/latex] percentile)
- [latex]\scriptsize Q2[/latex]: second quartile (the median or [latex]\scriptsize 50\text{th}[/latex] percentile)
- [latex]\scriptsize Q3[/latex]: third or upper quartile ([latex]\scriptsize 75\text{th}[/latex] percentile).
- The inter-quartile range is the difference between [latex]\scriptsize Q3[/latex] and [latex]\scriptsize Q1[/latex], and represents the range in which the middle [latex]\scriptsize 50\%[/latex] of the data lie.
- The semi-inter-quartile range is the middle of the inte-rquartile range.
Unit 3: Assessment
Suggested time to complete: 40 minutes
- A group of [latex]\scriptsize 20[/latex] students count the number of phone calls they have each made in the past month. This is the data they collect:
[latex]\scriptsize \{11,\ 8,\ 17,\ 13,\ 9,\ 12,\ 2,\ 6,\ 15,\ 7,\ 14,\ 15,\ 1,\ 6,\ 6,\ 13,\ 19,\ 9,\ 6,\ 19\}[/latex]
Calculate the range of values in the data set. - A company wanted to evaluate the training programme in its factory. They gave the same task to trained and untrained employees and timed each one in seconds.
Trained: [latex]\scriptsize \{121,\ 137,\ 131,\ 135,\ 130,\ 128,\ 130,\ 126,\ 132,\ 127,\ 129,\ 120,\ 118,\ 125,\ 134\}[/latex]
Untrained: [latex]\scriptsize \{135,\ 142,\ 126,\ 148,\ 145,\ 156,\ 152,\ 153,\ 149,\ 145,\ 144,\ 134,\ 139,\ 140,\ 142\}[/latex]- Find the mean of each dataset.
- Find the median for each dataset.
- Find the inter-quartile range for both sets of data.
- Did the training programme work?
- There are 14 men working in a factory. Their ages are: [latex]\scriptsize 22,\text{ }25,\text{ }33,\text{ }35,\text{ }38,\text{ }48,\text{ }55,\text{ }55,\text{ }55,\text{ }55,\text{ }55,\text{ }56,\text{ }59,\text{ }64[/latex]
- If three men had to be retrenched, but the median had to stay the same, show the ages of the three men you would retrench.
- Find the mean age of the men in the factory using the original data.
The full solutions are at the end of the unit.
Unit 3: Solutions
Exercise 3.1
The data has already been ordered.
[latex]\scriptsize \begin{align*}{*{20}{l}} {29,\text{ }32,\text{ }35,\text{ }46,\text{ }51,\text{ }54,\text{ }60,\text{ }68,\text{ }68,\text{ }72,\text{ }76,\text{ }78,\text{ }83,\text{ }86,\text{ }89,\text{ }91,} \\ {92,\text{ }98,\text{ }101,\text{ }109,\text{ }114,\text{ }117,\text{ }118,\text{ }123,\text{ }126,\text{ }130,\text{ }135,\text{ }139,\text{ }144} \end{align*}[/latex]
[latex]\scriptsize {{p}_{0}}[/latex]: [latex]\scriptsize 29[/latex]
[latex]\scriptsize {{p}_{{10}}}[/latex]: [latex]\scriptsize r=\displaystyle \frac{{10}}{{100}}(29-1)+1=3.8[/latex]
We get a rank of [latex]\scriptsize 3.8[/latex] so we need to find the mean of the third and fourth values in the ordered data set. The third value (rank [latex]\scriptsize =3[/latex]) is [latex]\scriptsize 35[/latex] and the fourth value (rank [latex]\scriptsize =4[/latex]) is [latex]\scriptsize 46[/latex].
[latex]\scriptsize \therefore {{p}_{{10}}}=\displaystyle \frac{{35+46}}{2}=40.5[/latex]
[latex]\scriptsize {{p}_{{20}}}[/latex]: [latex]\scriptsize r=\displaystyle \frac{{20}}{{100}}(29-1)+1=6.6[/latex]
We get a rank of [latex]\scriptsize 6.6[/latex] so we need to find the mean of the sixth and seventh values in the ordered data set. The sixth value (rank [latex]\scriptsize =6[/latex]) is [latex]\scriptsize 54[/latex] and the seventh value (rank [latex]\scriptsize =7[/latex]) is [latex]\scriptsize 60[/latex].
[latex]\scriptsize \therefore {{p}_{{20}}}=\displaystyle \frac{{54+60}}{2}=57[/latex]
[latex]\scriptsize {{p}_{{30}}}[/latex]: [latex]\scriptsize r=\displaystyle \frac{{30}}{{100}}(29-1)+1=9.4\text{ }\therefore {{p}_{{30}}}=\displaystyle \frac{{68+72}}{2}=70[/latex]
[latex]\scriptsize {{p}_{{40}}}[/latex]: [latex]\scriptsize r=\displaystyle \frac{{40}}{{100}}(29-1)+1=12.2\text{ }\therefore {{p}_{{40}}}=\displaystyle \frac{{78+83}}{2}=80.5[/latex]
[latex]\scriptsize {{p}_{{50}}}[/latex]: [latex]\scriptsize r=\displaystyle \frac{{50}}{{100}}(29-1)+1=15\text{ }\therefore {{p}_{{50}}}=89[/latex]
[latex]\scriptsize {{p}_{{60}}}[/latex]: [latex]\scriptsize r=\displaystyle \frac{{60}}{{100}}(29-1)+1=17.8\text{ }\therefore {{p}_{{60}}}=\displaystyle \frac{{92+98}}{2}=95[/latex]
[latex]\scriptsize {{p}_{{70}}}[/latex]: [latex]\scriptsize r=\displaystyle \frac{{70}}{{100}}(29-1)+1=20.6\text{ }\therefore {{p}_{{70}}}=\displaystyle \frac{{109+114}}{2}=111.5[/latex]
[latex]\scriptsize {{p}_{{80}}}[/latex]: [latex]\scriptsize r=\displaystyle \frac{{80}}{{100}}(29-1)+1=23.4\text{ }\therefore {{p}_{{80}}}=\displaystyle \frac{{118+123}}{2}=120.5[/latex]
[latex]\scriptsize {{p}_{{90}}}[/latex]: [latex]\scriptsize r=\displaystyle \frac{{90}}{{100}}(29-1)+1=26.2\text{ }\therefore {{p}_{{90}}}=\displaystyle \frac{{130+135}}{2}=132.5[/latex]
[latex]\scriptsize {{p}_{{100}}}[/latex]: [latex]\scriptsize 144[/latex]
Class A: [latex]\scriptsize \{36,\ 39,\ 49,\ 18,\ 36,\ 24,\ 38,\ 36,\ 28,\ 27,\ 30,\ 42,\ 48,\ 45,\ 39,\ 21,\ 29,\ 31,\ 34\}[/latex]
Class B: [latex]\scriptsize \{34,\ 34,\ 19,\ 27,\ 36,\ 26,\ 39,\ 39,\ 29,\ 39,\ 35,\ 46,\ 41,\ 35,\ 29,\ 25,\ 35,\ 37,\ 31,\ 39,\ 42,\ 28\}[/latex]
- Class A:
Mean: [latex]\scriptsize \bar{x}=\displaystyle \frac{{650}}{{19}}=34.21[/latex]
Median: [latex]\scriptsize 18, 21, 24, 27, 28, 29, 30, 31, 34, \boxed{36}, 36, 36, 38, 39, 42, 45, 48, 49[/latex]
The median mark is 36.
Class B:
Mean: [latex]\scriptsize \bar{x}=\displaystyle \frac{{745}}{{22}}=33.86[/latex]
Median: [latex]\scriptsize 19, 25, 26, 27, 28, 29, 29, 31, 34, 34, \boxed{35, 35}, 35, 36, 37, 39, 39, 39, 39, 41, 42, 46[/latex]
The median is 35. - Class A:
Class A: [latex]\scriptsize Q1=\displaystyle \frac{{25}}{{100}}(19-1)+1=5.5[/latex]
Therefore [latex]\scriptsize Q1=\displaystyle \frac{{28+29}}{2}=28.5[/latex]
[latex]\scriptsize Q3=\displaystyle \frac{{75}}{{100}}(19-1)+1=14.5[/latex]
Therefore [latex]\scriptsize Q3=\displaystyle \frac{{39+39}}{2}=39[/latex]Class B:
Class B: [latex]\scriptsize Q1=\displaystyle \frac{{25}}{{100}}(22-1)+1=6.25[/latex][latex]\scriptsize Q3=\displaystyle \frac{{75}}{{100}}(22-1)+1=16.75[/latex]
Therefore [latex]\scriptsize Q1=\displaystyle \frac{{29+29}}{2}=29[/latex]
[latex]\scriptsize Q3=\displaystyle \frac{{75}}{{100}}(22-1)+1=16.75[/latex]
Therefore [latex]\scriptsize Q3=\displaystyle \frac{{39+39}}{2}=39[/latex] - Class A inter-quartile range (IQR): [latex]\scriptsize Q3-Q1=39-28.5=10.5[/latex]
Class B inter-quartile range (IQR): [latex]\scriptsize Q3-Q1=39-29=10[/latex] - Class A:
[latex]\scriptsize r=\displaystyle \frac{{80}}{{100}}(19-1)+1=15.4[/latex]
[latex]\scriptsize {{p}_{{80}}}=\displaystyle \frac{{39+42}}{2}=40.5[/latex]
If you scored more than [latex]\scriptsize 40.5[/latex] out of [latex]\scriptsize 50[/latex] you would have been in the [latex]\scriptsize 80\text{th}[/latex] percentile.Class B:
[latex]\scriptsize r=\displaystyle \frac{{80}}{{100}}(22-1)+1=17.8[/latex]
[latex]\scriptsize {{p}_{{80}}}=\displaystyle \frac{{39+39}}{2}=39[/latex]
If you scored more than [latex]\scriptsize 39[/latex] out of [latex]\scriptsize 50[/latex] you would have been in the [latex]\scriptsize 80\text{th}[/latex] percentile. - Class A had a higher mean. The two classes IQRs were more or less the same. Class A had a higher [latex]\scriptsize 80\text{th}[/latex] percentile. Therefore Class A did marginally better than Class B.
Unit 3: Assessment
- Order the data: [latex]\scriptsize 1,\ 2,\ 6,\ 6,\ 6,\ 6,\ 7,\ 8,\ 9,\ 9,\ 11,\ 12,\ 13,\ 13,\ 14,\ 15,\ 15,\ 17,\ 19,\ 19[/latex]
Range: [latex]\scriptsize 19-1=18[/latex] - .
- Trained:
[latex]\scriptsize \bar{x}=\displaystyle \frac{{1\ 923}}{{15}}=128.2[/latex]Untrained:
[latex]\scriptsize \bar{x}=\displaystyle \frac{{2\ 150}}{{15}}=143.3[/latex] - Trained: [latex]\scriptsize 118, 120, 121, 125, 126, 127, 128, \boxed{129}, 130, 130, 131, 132, 134, 135, 137[/latex]
The median is [latex]\scriptsize 129[/latex] seconds
Untrained: [latex]\scriptsize 126, 134, 135, 139, 140, 142, 142, \boxed{144}, 145, 145, 148, 149, 152, 153, 156[/latex]
The median is [latex]\scriptsize 144[/latex] seconds - Trained:
Class A: [latex]\scriptsize Q1=\displaystyle \frac{{25}}{{100}}(15-1)+1=4.5[/latex]
Therefore [latex]\scriptsize Q1=\displaystyle \frac{{125+126}}{2}=125.5[/latex]
[latex]\scriptsize Q3=\displaystyle \frac{{75}}{{100}}(15-1)+1=11.5[/latex]
Therefore [latex]\scriptsize Q3=\displaystyle \frac{{131+132}}{2}=131.5[/latex]
Trained IQR: [latex]\scriptsize 131.5-125.5=6[/latex]Untrained:
Class A: [latex]\scriptsize Q1=\displaystyle \frac{{25}}{{100}}(15-1)+1=4.5[/latex]
Therefore [latex]\scriptsize Q1=\displaystyle \frac{{139+140}}{2}=139.5[/latex]
[latex]\scriptsize Q3=\displaystyle \frac{{75}}{{100}}(15-1)+1=11.5[/latex]
Therefore [latex]\scriptsize Q3=\displaystyle \frac{{148+149}}{2}=148.5[/latex]
Trained IQR: [latex]\scriptsize 148.5-139.5=9[/latex] - The programme did work. Not only was the mean time of the trained workers less than the untrained workers but the distribution of times was also smaller, meaning that the trained workers were all performing more consistently.
- Trained:
- Data is already ordered: [latex]\scriptsize 22,\text{ }25,\text{ }33,\text{ }35,\text{ }38,\text{ }48,\text{ }55,\text{ }55,\text{ }55,\text{ }55,\text{ }55,\text{ }56,\text{ }59,\text{ }64[/latex]
- The median is currently [latex]\scriptsize 55[/latex]. To keep the median at [latex]\scriptsize 55[/latex], one must retrench one person younger and one person older than the median, and a third at the median age. One combination could be the oldest and youngest workers as well as one of the workers aged [latex]\scriptsize 55[/latex]. You might have given a different answer that also keeps the median at [latex]\scriptsize 55[/latex]
[latex]\scriptsize \cancel{{22}},\text{ }25,\text{ }33,\text{ }35,\text{ }38,\text{ }48,\text{ }55,\text{ }55,\text{ }55,\text{ }55,\text{ }\cancel{{55}},\text{ }56,\text{ }59,\text{ }\cancel{{64}}[/latex]
[latex]\scriptsize 25, 33, 35, 38, 48, \boxed{55}, 55, 55, 55, 56, 59[/latex] - [latex]\scriptsize \bar{x}=\displaystyle \frac{{655}}{{14}}=46.79[/latex]
- The median is currently [latex]\scriptsize 55[/latex]. To keep the median at [latex]\scriptsize 55[/latex], one must retrench one person younger and one person older than the median, and a third at the median age. One combination could be the oldest and youngest workers as well as one of the workers aged [latex]\scriptsize 55[/latex]. You might have given a different answer that also keeps the median at [latex]\scriptsize 55[/latex]
Media Attributions
- figure1 © DHET is licensed under a CC BY (Attribution) license
- activity3.1A5a © DHET is licensed under a CC BY (Attribution) license
- activity3.1A5b © DHET is licensed under a CC BY (Attribution) license
- table1 © DHET is licensed under a CC BY (Attribution) license
- takenote © DHET is licensed under a CC BY (Attribution) license