topic badge

10.07 Pareto charts

Lesson

A Pareto chart is used to identify the most significant factors in a set of categorical data. The chart combines a column graph and a line graph, and has two vertical axes, one for each graph type.

  • The columns represent the frequency for each category, or factor in a process, but could also represent cost, or some other unit of measurement. The columns are arranged in descending order, from tallest on the left to shortest on the right (i.e. most significant to least significant). The value for each column is read from the left vertical axis.
     
  • The line graph or polygon represents the cumulative percentage of the values for each column. It always curves upwards, but decreases in steepness, from left to right. The value of any point on the line graph is read from the right vertical axis.

Below is a Pareto chart showing some of the common reasons for failing a driving test. 

 

We can see from the chart above that the most common reasons for failing a driving test are:

  • Observations at intersections (not checking for traffic carefully in all directions)
  • Use of mirrors (not using the cars mirrors to check the position of other vehicles)

If we draw a line from the $80%$80% mark on the right vertical axis to the line graph, and then continue that line down to the horizontal axis, the most important factors appear on the left of the line. In this case, it is the first two factors (represented by the first two points on line graph) that contribute to the majority of driving test failures. One could argue that the third factor, 'inappropriate speed' is also significant, due its closeness to the $80%$80% mark. 

Pareto charts are based on something called the Pareto principle, which says that around $80%$80% of problems in a process tend to come from only $20%$20% of factors. While these percentages are only a guide, they are common enough to be called the $80$80/$20$20 rule.

Process improvement teams often use Pareto charts to determine which factors in a process are causing the most problems, so they can focus their efforts on those. This is an important part of quality control, often used to improve customer service or reduce the number of defects in a product.

 

 

Worked example

A hotel collected data on customer complaints over the course of a month and organised the data into a frequency table:

Type of complaint Number of complaints
Reservation wait time $33$33
Room cleanliness $19$19
Room service time $11$11
Staff attitude $9$9
Noise level $5$5
Other $3$3

The hotel wants to display this data in a Pareto chart and present it to staff so that the most significant complaints can be addressed.

The first step is to add two additional columns to the frequency table: one for cumulative frequency and the other for cumulative percentage.

The cumulative percentage is found by dividing each cumulative frequency by the total number of complaints, then multiplying by $100$100. For example, in the first row, the cumulative percentage is $\frac{33}{80}\times100=41.3%$3380×100=41.3% to $1$1 decimal place.

Type of complaint Number of complaints Cumulative frequency Cumulative percentage
Reservation wait time $33$33 $33$33 $41.3%$41.3%
Room cleanliness $19$19 $52$52 $65.0%$65.0%
Room service time $11$11 $63$63 $78.8%$78.8%
Staff attitude $9$9 $72$72 $90.0%$90.0%
Noise level $5$5 $77$77 $96.3%$96.3%
Other $3$3 $80$80 $100.0%$100.0%
TOTAL $80$80    
  • The number of complaints (frequency) column is used to draw the vertical bars, with its corresponding axis on the left.
  • The cumulative percentage column is used to draw the line graph, with its corresponding axis on the right. 

 

 

Did you know?

The Pareto chart is named after Vilfredo Pareto (1848-1923), an Italian engineer, economist and political scientist. He came up with the Pareto principal (or $80$80/$20$20 Rule), after observing that $80%$80% of the wealth and land in Italy was owned by $20%$20% of the population. His $80$80/$20$20 rule happens to be true in many other situations. For example:

  • $80%$80% of sales come from $20%$20% of clients
  • $80%$80% of complaints come from $20%$20% of customers
  • $80%$80% of defects come from $20%$20% or sources

 

Practice questions

Question 1

The following table was used in a vehicle service centre to determine the main causes of engine overheating.

A Pareto chart is to be constructed from this information.

Cause Frequency Cumulative frequency Cumulative percentage
Damaged radiator core $31$31 $31$31 $44$44
Faulty fans $20$20 $51$51 $72$72
Faulty thermostat $8$8 $59$59 $83$83
Loose fan belt $5$5 $64$64 $90$90
Damaged radiator fins $4$4 $68$68 $96$96
Coolant leakage $3$3 $71$71 $100$100
Total $71$71    
  1. Which column in the table is used to create the vertical bars in the column graph?

    Frequency

    A

    Cumulative frequency

    B

    Cumulative percentage

    C
  2. Which column in the table is used to create the line graph?

    Frequency

    A

    Cumulative frequency

    B

    Cumulative percentage

    C

Question 2

At Pareto's Burritos, the owners regularly ask their customers if and why they are not happy with their burritos.

They created a chart for last month's feedback.

  1. How many customers in total expressed dissatisfaction last month? You can assume that the bars are in line with the labels on the left-hand $y$y axis, or exactly halfway between two labels.

  2. Using the bar section of the Pareto chart, find the percentage of customer complaints made up by the three most frequent complaints.

    Round your answer to the nearest percentage.

  3. Pareto wants to significantly improve customer satisfaction in the next month. What single change would improve customer satisfaction the most?

    Increasing the speed of service.

    A

    Lowering the price of burritos.

    B

    Adding more guacamole.

    C

    Using fresher ingredients.

    D
  4. What percentage of customer complaints would be resolved by reviewing how their chefs make their burritos? You can assume that the bars are in line with the labels on the left-hand $y$y axis, or exactly halfway between labels.

    Round your answer to the nearest percentage.

Question 3

Bill caught the train and noted what activity each person in his carriage (excluding himself) was doing between the next two stops. The Pareto chart shows the results.

  1. How many other people were in the carriage? You can assume that each bar is either in line with a tick on the left-hand $y$y-axis, or exactly halfway between ticks.

  2. Using the bar section of the Pareto chart, find the percentage of people on the carriage (excluding Bill) that make up the three most common activities. You can assume that each bar is either in line with a tick on the left-hand $y$y-axis, or exactly halfway between ticks.

    Round your answer to the nearest percentage.

Outcomes

MS11-2

represents information in symbolic, graphical and tabular form

MS11-7

develops and carries out simple statistical processes to answer questions posed

What is Mathspace

About Mathspace