topic badge

1.01 Introduction to the data cycle and Venn diagrams

Venn diagrams

A Venn diagram is a tool that is used to compare and contrast properties of two things or groups. They allow us to visualize the overlaps and differences between these two categories.

For example, in the following Venn diagram, we can see that:

A Venn diagram consists of two overlapping circles against a light blue background. The left circle is labeled Fish and contains the attributes: Scales, Gills, and Lay eggs. The right circle is labeled Dolphins and lists Breathe air and Give live birth as distinctive features. The overlapping area shared by both circles includes the characteristics Swim, Have teeth, and Have fins. An additional, unrelated text Two legs is placed outside the circles at the bottom.

Both fish and dolphins:

  • Have fins

  • Have teeth

  • Swim

Dolphins, but not fish:

  • Breathe air

  • Give birth to live young

Neither dolphins, nor fish have two legs.

A Venn diagram will usually have two or three categories. Where the circles overlap, the characteristics there are shared, in the rectangle, but outside all the circles are the characteristics that neither have.

A Venn diagram featuring two overlapping circles within a rectangular boundary. The left circle is labeled A and the right circle is labeled B. The overlap represents the intersection of sets A and B. The diagram is outlined in a single, solid line, using a simple black and white color scheme.
Venn diagram with two categories
A Venn diagram features three overlapping circles enclosed within a square boundary. Each circle is labeled with a different letter at the top of its respective circle: A for the top left, B for the top right, and C for the bottom circle.
Venn diagrams with three categories

Instead of listing all of the characteristics or members that belong in each region, we can just list the number of items in each region.

A Venn diagram with two overlapping circles is depicted, each labeled with a specific interest. The left circle, shaded in blue and labeled Likes running, contains the names Agostino, Darla, Foster, Isabella, and Jerome. The right circle, shaded in teal and labeled Plays an instrument, includes the names Hajime, Chauncey, Esme, and Konrad. The overlapping area, representing individuals who both like running and play an instrument, contains the names Bo, Mei, Gareth, and Linda. The diagram uses a clear, contrasting color scheme to distinguish between the different sets.
Venn diagram that sorts students by who like running and who play an instrument
A Venn diagram features two overlapping circles within a rectangular boundary. The circle on the left, shaded blue, is labeled Likes running and contains the number 5. The circle on the right, shaded teal, is labeled Plays an instrument and includes the number 4. The overlapping area, representing individuals who both like running and play an instrument, contains the number 3. Outside the circles, in the bottom right corner of the rectangle, is the number 1, indicating individuals not included in either set.
A Venn diagram that shows the number of students that fit in each of the four regions

Exploration

With a partner or in a group of three, draw a Venn diagram with two or three circles and label each circle with one of your names.

  1. Sort each of the characteristics into the correct region of your Venn diagram:

    • Is taking Geometry class

    • Ate breakfast

    • Is a vegetarian

    • Likes walking

    • Plays video games

    • Traveled out of state last summer

  2. Come up with six more characteristics to sort and ensure that there is at least one characteristic in each region of the Venn diagram.

  3. Rewrite the Venn diagram with the number of characteristics that are in each region instead of listing them.

Examples

Example 1

A marine biologist creates this diagram of her favorite fish:

A Venn diagram with two overlapping circles within a rectangular boundary is depicted. The left circle, shaded in blue, is labeled Freshwater fish and includes the names Goldfish, Largemouth Bass, and Betta Fish. The right circle, shaded in teal, is labeled Saltwater fish and lists Blue Tang, Clownfish, and Seahorse. The overlapping area, indicating species that can inhabit both freshwater and saltwater, contains the names Barramundi and Salmon.
a

How many of the fish can live in fresh water?

Worked Solution
Create a strategy

This will include any of the fish in the freshwater fish circle.

Apply the idea

There are 5 fish can live in fresh water.

Reflect and check

Some of these fish can also live in salt water and some of them cannot.

b

How many of the fish are both freshwater and saltwater?

Worked Solution
Create a strategy

These fish will be in the overlap of the two circles.

Apply the idea

There are 2 fish that can live in fresh and salt water: Salmon and Barramundi.

c

How many of the fish are freshwater or saltwater, but not both?

Worked Solution
Create a strategy

There are no fish outside of the two circles, so this will be the total number of fish, minus those that are both.

Apply the idea

There are 6 fish that are either freshwater or saltwater, but not both: Goldfish, Betta Fish, Largemouth Bass Clownfish, Blue Tang, and Seahorse.

Reflect and check

This Venn diagram sorted 8 types of fish, but there are many more types of fish that could be sorted using this Venn diagram.

Example 2

An ecologist listed some features of rivers and lakes. Organize the given characteristics in a Venn diagram.

RiverLake
Flows across a slopeUsually situated in a basin
Often supports navigationCan regulate local climate
Can be used for recreation and human activityCan be used for recreation and human activity
Can have rapids or waterfallsOften a habitat for diverse species
Support ecosystems with aquatic lifeSupport ecosystems with aquatic life
Has a defined riverbedContains still or standing water
Subject to pollution and environmental concernsSubject to pollution and environmental concerns
Worked Solution
Create a strategy

The characteristics should be organized in a Venn diagram with one circle representing rivers and the other representing lakes.

The overlapping section should list the shared characteristics, while the non-overlapping sections should list the unique characteristics.

Apply the idea

We can go through the list of characteristics for rivers first and check whether each one should go in the overlap or not. For ones that are in the overlap, we should cross them off from the lakes list so we don't double up.

A Venn diagram illustrates the characteristics of rivers and lakes. The left circle, shaded in blue, is labeled Rivers and lists attributes including Flows across a slope, Often supports navigation, Can have rapids or waterfalls, and Has a defined riverbed. The right circle, shaded in teal, is labeled Lakes and details features such as Usually situated in a basin, Can regulate local climate, Often a habitat for diverse species, and Contains still or standing water. The overlapping section in the middle, indicating shared characteristics, contains Support ecosystems with aquatic life, Subject to pollution and environmental concerns, and Can be used for recreation and human activity. The diagram uses contrasting colors to clearly differentiate between the two bodies of water while highlighting their similarities and differences.
Reflect and check

This Venn diagram could be helpful for someone trying to decide whether to vacation near a lake or river if they are looking to do calm canoeing and look for lots of wildlife.

Idea summary

Venn diagrams can be used to organize information in two main ways:

  1. To sort members of a population or sample into categories. They are helpful to compare the number of members that fit both categories, just one category, or neither.

  2. To sort the characteristics of two people or things. They are useful for comparing and contrasting which traits are shared and which are unique.

A Venn diagram featuring two overlapping circles within a rectangular boundary. The left circle is labeled A and the right circle is labeled B. The overlap represents the intersection of sets A and B. The diagram is outlined in a single, solid line, using a simple black and white color scheme.

The area where the circles overlap shows shared characteristics or those who fit in both categories.

The data cycle with Venn diagrams

The data cycle is the process where we formulate questions, then collect data, create data displays, and analyze and explain the results.

A data cycle with four stages. At the top, there is Formulate questions represented by a speech bubble with a question mark. To the right, Collect or acquire data is shown with an icon of a person and a magnifying glass. At the bottom, Organize and represent data is illustrated with a dot plot. To the left, Analyze and communicate results is indicated by a person with charts. Clockwise arrows are drawn from one stage to the next.

Let's first look at the types of questions we can ask where a Venn diagram would be helpful to organize the data.

Multiple-selection survey

A survey question where the respondent can pick none, one, or multiple options that are true for them

Example:

Which sport(s) do you like: Football, Cross-country, Ballet?

Since each circle represents a category, we are looking for categorical data. The number of categories will determine the number of circles, usually two or three.

A statistical question could be "Which book series are popular among students in my school?"

This could then lead to a multiple-selection survey question:

"Which book series have you read at least one book from? Select all that apply:"

  • Harry Potter

  • Lord of the Rings

  • Magic Tree House

The data could be collected and organized in a table like this, where an X means yes and a blank means no.

Harry PotterLord of the RingsMagic Tree House
AmirXX
BilalXXX
Chloe
Denel XX
EmilX
A Venn diagram with three overlapping circles within a rectangular boundary illustrates the reading interests among a group. Each circle represents a different book series. The top left circle, shaded in blue, is labeled Has read Harry Potter. The top right circle, shaded in teal, is labeled Has read Lord of the Rings. The bottom circle, shaded in pink, is labeled Has read Magic Tree House. The overlapping areas of the circles indicate individuals who have read combinations of these series, with each intersection demonstrating the overlap between the respective reading experiences. The diagram uses a clear color-coded scheme to differentiate between the individual and shared readership of these popular book series.

This Venn diagram can then be used to organize the data about the book series students have read.

Notice that:

  • A student can fit into none, one, two, or all three categories.

  • Previously, we used single-selection surveys, but now we need to allow multiple selections.

  • If we wanted to have four or more possible responses, we couldn't draw a Venn diagram that allows for all possibilities, but they can be used in some cases.

In order to make further conclusions or in a second iteration of the data cycle, it can be helpful to organize data from Venn diagram into a table or vice versa. We can convert between a table and a Venn diagram with two circles by matching up their parts.

A Venn diagram with 2 overlapping circles called left-handed and entered. Ask your teacher for more information.

This Venn diagram shows data for students who were asked:

  1. Are you left-handed?

  2. Did you enter the math contest?

A two-way table with columns left and right and rows entered and didn't enter. Ask your teacher for more information.

By converting the Venn diagram into a table, we can more clearly see what each of the four regions represents.

Once the display is created, we can analyze the Venn diagram or table to make conclusions about the proportion of the sample in the different regions or formulate further questions involving probabilities.

For example, once our data is organized, we could ask questions like: "If a student is right-handed, when is the probability they entered the math contest?"

Probability

The likelihood of an event occurring

\text{Probability of an event}=\dfrac{\text{Number of favorable outcomes}}{\text{Number of possible outcomes}}

Instead of comparing the proportions in the different regions, we can also compare and contrast the properties or characteristics using Venn diagrams with questions like:

  • Compare and contrast the properties of the following parallelograms.
    • Rectangle

    • Square

    • Parallelogram

  • Are squirrels and chipmunks or squirrels and rabbits more similar?

  • Compare and contrast cultural practices in the US and Canada.

Examples

Example 3

Consider this Venn diagram:

A Venn diagram features two overlapping circles within a rectangular boundary, each labeled with a type of technology ownership. The left circle, shaded blue, is labeled Owns a laptop and contains the number 30, representing individuals who own only a laptop. The right circle, shaded in teal, is labeled Owns a smartphone and includes the number 40, representing individuals who own only a smartphone. The overlapping area, indicating individuals who own both a laptop and a smartphone, contains the number 50. Outside of the circles at the bottom right corner of the rectangle, the number 10 indicates individuals who own neither a laptop nor a smartphone. The diagram is clearly organized with numerical data to represent the different groups of technology owners.
a

Write two statistical questions that could be summarized with the given Venn diagram.

Worked Solution
Create a strategy

There are two categories, "Owns a laptop" and "Owns a smartphone". The formulated questions should explore the relationship between these two categories.

Apply the idea

Two possible statistical questions are:

  1. Are laptops or smartphones more popular among 10th graders?

  2. What proportion of students don't have access to technology at home?

Reflect and check

Since the data was already collected, we could draw some conclusions to help answers these questions.

b

Write a statistical question that involves probability that could be summarized with the given Venn diagram.

Worked Solution
Create a strategy

A probability question would involve comparing a specific subgroup to the whole population.

Apply the idea

A possible statistical question could be:

"What is the probability or likelihood that a student owns both a smartphone and a laptop?"

Reflect and check

This is different from the survey questions we would ask the sample like: "Do you own a laptop?" and "Do you own a smartphone?"

Example 4

Alaia wants to know more about how ice hockey and field hockey are played.

a

Formulate a statistical question that could be used to start the data cycle for Alaia and would require a Venn diagram to analyze.

Worked Solution
Create a strategy

In this case, we will be using a Venn diagram to compare and contrast features, not to collect data from a population.

Apply the idea

"Compare and contrast the features and rules of ice hockey and field hockey."

Reflect and check

If Alaia wanted to know more about the popularity of ice hockey compared the field hockey, she could have asked a question like "In Virigina, what proportion of the population has been to ice hockey games compared to field hockey games?"

b

Collect and organize data for Alaia using a Venn diagram.

Worked Solution
Create a strategy

Alaia can focus on one particular aspect of the sports such as equipment, rules, safety precautions, or league setup.

She could then do another iteration of the data cycle to look at another aspect.

Apply the idea

She can start by looking at the equipment for each sport. Here is a list she could have collected.

Ice Hockey equipmentField Hockey equipment
Helmet with face cageThroat protector
Ice skatesCleats
GlovesGloves
PuckBall
Goalie padsLeg guards
MouthguardMouthguard
Shin padsShin pads
Hockey stick with a curved bladeField hockey stick (straight stick)
Elbow padsElbow pads

She can then organize it in a Venn diagram.

A Venn diagram with two overlapping circles is used to compare equipment used in ice hockey and field hockey. The left circle, shaded blue and labeled Ice hockey equipment, includes items specific to ice hockey: Helmet with face cage, Ice skates, Puck, Goalie pads, and Hockey stick with curved blade. The right circle, shaded teal and labeled Field hockey equipment, lists items specific to field hockey: Throat protector, Cleats, Ball, Leg guard, and Field hockey stick. The overlapping area in the center, representing equipment common to both sports, contains Gloves, Mouthguard, Elbow pads, and Shin pads. This diagram efficiently categorizes and highlights the unique and shared equipment between the two sports.
Reflect and check

From this she might note that for both sports the main equipment is safety pads and lead her to do another iteration of the data cycle looking at rates of injury or types of injuries between the two sports.

Example 5

A group of tourists in Japan were asked whether they spoke Filipino or Spanish. Their results were recorded in this table:

Can speak FilipinoCan speak Spanish
Abby Yes No
Carlo No No
Thomas Yes No
Dean No Yes
Kevin No Yes
Pam Yes Yes
Jenny Yes Yes
Rose Yes No
Hiro Yes No
Keia Yes Yes
Aurora Yes No
a

Complete this table to summarize the survey results.

SpanishNot Spanish
Filipino
Not Filipino
Worked Solution
Create a strategy

We can go through the data one item at a time and make a tally of which of the four regions each person would fit using a tally.

We could also go through and count how many people have Yes and Yes, then how many have Yes and No, then No and Yes, then No and No.

Apply the idea

We can start with a tally:

SpanishNot Spanish
Filipino\vert \vert \vert\cancel{ \vert\vert\vert\vert}
Not Filipino \vert \vert \vert

We can then convert to numbers:

SpanishNot Spanish
Filipino 35
Not Filipino 2 1
Reflect and check

Review the tally counts to ensure that each response has been accurately recorded in the table. Double-check the numbers for those who speak both languages to ensure they are not double-counted in the single-language categories.

If there was a large amount of data, we could use technology and write spreadsheet formulas to do the calculations.

b

Create a Venn diagram to summarize the data.

Worked Solution
Create a strategy

The four regions in the table correspond to the four regions in the Venn diagram.

Apply the idea

We can match from the table the number of people who speak both languages (3), only Filipino (5), only Spanish (2), and neither language (1).

A Venn diagram depicts the language abilities of a group of individuals, differentiated by their ability to speak Filipino and Spanish. The diagram contains two overlapping circles within a rectangular boundary. The left circle, shaded blue, is labeled Can speak Filipino and includes the number 5, indicating individuals who only speak Filipino. The right circle, shaded teal, is labeled Can speak Spanish and contains the number 2, indicating individuals who only speak Spanish. The overlapping section of the circles, colored green, contains the number 3, representing individuals who can speak both Filipino and Spanish. In the bottom right corner outside the circles, the number 1 represents individuals who speak neither Filipino nor Spanish.
Reflect and check

We can double check we didn't miss anyone by adding up all the numbers in the Venn diagram and counting the total number of people who answered the survey. Both should be 11.

c

If a random tourist was selected from the sample, what is the probability that they speak neither language?

Worked Solution
Create a strategy

Determine the total number of tourists surveyed to establish the denominator for the probability calculation.

Use the formula for probability:

\text{Probability of an event}=\dfrac{\text{Number of favorable outcomes}}{\text{Number of possible outcomes}}

Apply the idea

There is one tourist (Carlo) who speaks neither Filipino nor Spanish out of a total of 10 tourists surveyed.

P\text{(Speaks Neither)}=\dfrac{1}{11}

Reflect and check

This was a very small sample, so may not be representative of the population. We also cannot extend any conclusions to tourists in general as these were all tourists in Japan.

d

If a random tourist who spoke Filipino was selected from the sample, what is the probability that they do not speak Spanish?

Worked Solution
Create a strategy

Use the formula for probability:

\text{Probability of an event}=\dfrac{\text{Number of favorable outcomes}}{\text{Number of possible outcomes}}

where the favorable outcomes are tourists who speak only Filipino, and the total outcomes are all Filipino-speaking tourists.

Apply the idea

Given the survey data:

  • Tourists who can speak Filipino: 8 in total (Abby, Thomas, Pam, Jenny, Rose, Hiro, Keia, and Aurora)

  • Tourists who can speak Filipino, but not Spanish: 5 in total (Abby, Thomas, Rose, Hiro, and Aurora).

Therefore, tourists who speak only Filipino (not both languages) are 5 out of the 8 who can speak Filipino.

P\text{(Speaks Filipino, but not Spanish)}=\dfrac{5}{8}

Reflect and check

This means that \dfrac{3}{8} speak both Filipino and Spanish.

Idea summary

Venn diagrams are helpful to organize data when we ask questions that involve multiple-selection surveys or comparisons.

The data cycle requires us to:

  1. Formulate a statistical question about a specific population

  2. Collect data using a sample survey, an experiment, or secondary data

  3. Organize the data into displays like Venn diagrams and tables

  4. Analyze data to draw conclusions to answer the original question

  5. Possibly repeat the cycle for a new question that came up during the process

Outcomes

G.RLT.1

The student will translate logic statements, identify conditional statements, and use and interpret Venn diagrams.

G.RLT.1d

Interpret Venn diagrams, including those representing contextual situations.

What is Mathspace

About Mathspace