10.2 Displaying Data Part I - Dr. Travers Page of Math
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
§ 10.2 Displaying Data Part I
Types of Variables 1 Categorical 2 Quantitative
Types of Variables 1 Categorical 2 Quantitative Definition A categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values.
Types of Variables 1 Categorical 2 Quantitative Definition A categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values. Definition A quantitative variable is naturally measured as a number for which meaningful arithmetic operations make sense.
Pictographs Definition A pictograph is a form of a bar graph where the bars are graphics that each represent a predetermined quantity. Generally these are graphs used to display categorical variables.
Pictographs What’s Good Gives good visual and can easily be adapted to multiple graphics
Pictographs What’s Good Gives good visual and can easily be adapted to multiple graphics Easy for children to understand since it includes graphics
Pictographs What’s Good Gives good visual and can easily be adapted to multiple graphics Easy for children to understand since it includes graphics What’s Not Good Tedious to draw
Pictographs What’s Good Gives good visual and can easily be adapted to multiple graphics Easy for children to understand since it includes graphics What’s Not Good Tedious to draw Fractions of the scale amount can be difficult to determine exactly which amount was intended
Dot Plots Definition A dot plot is the representation of a set of data over a number line. The number of dots over a number represents the relative quantity of the value.
Dot Plots Definition A dot plot is the representation of a set of data over a number line. The number of dots over a number represents the relative quantity of the value. Example The following are the test scores for a particular high school student in their math class over the course of an academic year. 64 73 85 74 83 71 56 83 76 85 83 87 92 84 95 92 95 92 91
Dot Plots Grades of a High School Student 3 • • Frequency 2 •• •• 1 • • •••• •••• ••• 50 60 70 80 90 100 Grades
Dot Plots What’s Good Gives a good idea of distribution
Dot Plots What’s Good Gives a good idea of distribution Preserves all of the data points
Dot Plots What’s Good Gives a good idea of distribution Preserves all of the data points What’s Not Good Tedious to plot
Dot Plots What’s Good Gives a good idea of distribution Preserves all of the data points What’s Not Good Tedious to plot Can be hard to read
Dot Plots What’s Good Gives a good idea of distribution Preserves all of the data points What’s Not Good Tedious to plot Can be hard to read Not practical for large data sets
Distributions Definition A distribution is a representation of data vs. frequency. It shows all possible values and how often they occur.
Distributions Definition A distribution is a representation of data vs. frequency. It shows all possible values and how often they occur. Range: Highest value - lowest value Here, the range would be 95 − 56 = 39.
Distributions Definition A distribution is a representation of data vs. frequency. It shows all possible values and how often they occur. Range: Highest value - lowest value Here, the range would be 95 − 56 = 39. Center: The central value(s) is the center. It could be a value or a class, depending on the type of graph. Here, the center is the 10th value, since there are 19 data points in the set. The value we seek is 84.
Distributions Definition A distribution is a representation of data vs. frequency. It shows all possible values and how often they occur. Range: Highest value - lowest value Here, the range would be 95 − 56 = 39. Center: The central value(s) is the center. It could be a value or a class, depending on the type of graph. Here, the center is the 10th value, since there are 19 data points in the set. The value we seek is 84. Shape: How many peaks are there? Is it roughly in the middle or to one side? Here we have one peak, so we would say the distribution is unimodal. That peak is to the right, so the tail stretches out to the left. We would say this graph is left skewed.
Distributions Definition A distribution is a representation of data vs. frequency. It shows all possible values and how often they occur. Range: Highest value - lowest value Here, the range would be 95 − 56 = 39. Center: The central value(s) is the center. It could be a value or a class, depending on the type of graph. Here, the center is the 10th value, since there are 19 data points in the set. The value we seek is 84. Shape: How many peaks are there? Is it roughly in the middle or to one side? Here we have one peak, so we would say the distribution is unimodal. That peak is to the right, so the tail stretches out to the left. We would say this graph is left skewed. We must ALWAYS have axes labeled and have a title for our graphs.
Stem-and-Leaf Plots Similarities to Dot Plots Gives idea of distribution
Stem-and-Leaf Plots Similarities to Dot Plots Gives idea of distribution Preserves data
Stem-and-Leaf Plots Similarities to Dot Plots Gives idea of distribution Preserves data Not practical for large data sets
Stem-and-Leaf Plots Similarities to Dot Plots Gives idea of distribution Preserves data Not practical for large data sets Differences from Dot Plots Used for quantitative variables
Stem-and-Leaf Plots Similarities to Dot Plots Gives idea of distribution Preserves data Not practical for large data sets Differences from Dot Plots Used for quantitative variables Easier to read actual data elements
Stem-and-Leaf Plots Similarities to Dot Plots Gives idea of distribution Preserves data Not practical for large data sets Differences from Dot Plots Used for quantitative variables Easier to read actual data elements Can be used for comparisons of two data sets
Stem-and-Leaf Plots Example Using the same data set as we did for the dot plot, construct a stem-and-leaf plot.
Stem-and-Leaf Plots Example Using the same data set as we did for the dot plot, construct a stem-and-leaf plot. First thing we need to do is order the data elements. 56 64 71 73 74 76 82 83 83 83 84 85 85 87 91 92 92 95 95
Stem-and-Leaf Plots Example Using the same data set as we did for the dot plot, construct a stem-and-leaf plot. First thing we need to do is order the data elements. 56 64 71 73 74 76 82 83 83 83 84 85 85 87 91 92 92 95 95 Grades for a High School Student 5 6 7 8 9
Stem-and-Leaf Plots This is the spine with the stems
Stem-and-Leaf Plots This is the spine with the stems The leaves are the last digit
Stem-and-Leaf Plots This is the spine with the stems The leaves are the last digit List in increasing order from spine
Stem-and-Leaf Plots This is the spine with the stems The leaves are the last digit List in increasing order from spine No commas
Stem-and-Leaf Plots This is the spine with the stems The leaves are the last digit List in increasing order from spine No commas Repetition is important Grades for a High School Student 5 6 6 4 7 1 3 4 6 8 2 3 3 3 4 5 5 7 9 1 2 2 5 5
Analysis of Stem-and-Leaf Plot Grades for a High School Student 5 6 6 4 7 1 3 4 6 8 2 3 3 3 4 5 5 7 9 1 2 2 5 5 Same range as dot plot
Analysis of Stem-and-Leaf Plot Grades for a High School Student 5 6 6 4 7 1 3 4 6 8 2 3 3 3 4 5 5 7 9 1 2 2 5 5 Same range as dot plot Same center as dot plot although we would only give class
Analysis of Stem-and-Leaf Plot Grades for a High School Student 5 6 6 4 7 1 3 4 6 8 2 3 3 3 4 5 5 7 9 1 2 2 5 5 Same range as dot plot Same center as dot plot although we would only give class Shape is unimodal and skewed left
Analysis of Stem-and-Leaf Plot Grades for a High School Student 5 6 6 4 7 1 3 4 6 8 2 3 3 3 4 5 5 7 9 1 2 2 5 5 Same range as dot plot Same center as dot plot although we would only give class Shape is unimodal and skewed left Notice that the values on the right are essentially in columns - this is what allows us to quickly see which classes have more elements.
Stem-and-Leaf Plots What if we had a 3 digit number? Suppose the student got a 100 on the next exam? Grades for a High School Student 5 6 6 4 7 1 3 4 6 8 2 3 3 3 4 5 5 7 9 1 2 2 5 5 10 0
Back-to-Back Stem-and-Leaf Plots Example Suppose we wanted to compare the careers of Babe Ruth and Mark McGwire in terms of their yearly home run totals to determine which player was the more consistent long ball hitter. Make a back-to-back stem-and-leaf plot to make the is determination. Ruth: 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, 22 McGwire: 49, 32, 33, 39, 22, 42, 9, 9, 39, 52, 58, 70, 65, 32, 29
Back-to-Back Stem-and-Leaf Plots Ruth v. McGwire 0 1 2 3 4 5 6 7 We set up the graph with one set of data increasing out to the right and the other increasing out to the left. This way we have a side-by-side comparison of the data sets.
Back-to-Back Stem-and-Leaf Plots Ruth v. McGwire 0 9 9 1 5 2 2 2 9 5 4 3 2 2 3 9 9 9 7 6 6 6 1 1 4 2 9 9 4 4 5 2 8 0 6 5 7 0 Who is more consistent and why?
Histograms Used for quantitative variables
Histograms Used for quantitative variables Tracks frequency and shows distribution
Histograms Used for quantitative variables Tracks frequency and shows distribution Does not preserve individual values
Histograms Used for quantitative variables Tracks frequency and shows distribution Does not preserve individual values Good for a large number of values
Histograms Used for quantitative variables Tracks frequency and shows distribution Does not preserve individual values Good for a large number of values Bars must be vertical and must touch
Histograms Example For our test scores example, construct a histogram and analyze the distribution. It is easier if the values are in order as we will be grouping them into classes. 56 64 71 73 74 76 82 83 83 83 84 85 85 87 91 92 92 95 95
Histograms We first want to create a frequency table. This is a collection of non-overlapping classes and the frequency of observation in each of those classes. We need to determine the following in this order:
Histograms We first want to create a frequency table. This is a collection of non-overlapping classes and the frequency of observation in each of those classes. We need to determine the following in this order: Number of classes The rule of thumb with the number of classes is to use the square root of the number of observations in the data set. √ 19 ≈ 4.36 So, we can use 4 or 5 classes. I tend to go up to the next integer to be sure I have enough classes. So we will use 5 for our graph.
Histograms Size of each class We want them to be the same width so that the taller classes will be known to have the most elements. If not then we have to find the area of each rectangle to determine relative size. To find the size, we divide the ‘range’ by the number of classes. 95 − 56 + 1 40 size = = =8 5 5 So we will use 8 for the class size.
Histograms Size of each class We want them to be the same width so that the taller classes will be known to have the most elements. If not then we have to find the area of each rectangle to determine relative size. To find the size, we divide the ‘range’ by the number of classes. 95 − 56 + 1 40 size = = =8 5 5 So we will use 8 for the class size. If this came out to be a decimal, we can use the decimal value for the class size or we can round here too. Remember, this is an approximation tool, so as long as we are using values close to those we calculate, we will have an accurate representation to analyze.
Histograms Endpoints of each class We start the smallest class with a left endpoint of 56, since that was our minimum. Then, to find the next left endpoint, add 8 to 56. Continue in this manner until we have 5 classes. Grade Range Frequency 56- 64- 72- 80- 88-
Histograms Then, we subtract 1 from each left endpoint to find the right endpoint of the previous class. Grade Range Frequency 56-63 64-71 72-79 80-87 88-95
Histograms Then, we subtract 1 from each left endpoint to find the right endpoint of the previous class. Grade Range Frequency 56-63 64-71 72-79 80-87 88-95 Finally, we count how many elements go in each class. Grade Range Frequency 56-63 1 64-71 2 72-79 3 80-87 8 88-95 5
Histograms Grades of a High School Student 8 Frequency 6 4 2 56-63 64-71 72-79 80-87 88-95 Grades We see the same range and shape. Here, we’d have no choice but to give the class only for the center as we would lose the ability to see individual values.
Histograms Example The EPA lists most sports cars in its “two-seater” category. The table below gives the city mileage in miles per gallon. Make and analyze a histogram for the the city mileage. Model Mileage Model Mileage Acura NSX 17 Insight 57 Audi Quattro 20 S2000 20 Audi Roadster 22 Lamborghini 9 BMW M Coupe 17 Mazda 22 BMW Z3 Coupe 19 SL500 16 BMW Z3 Roadster 20 SL600 13 BMW Z8 13 SLK230 23 Corvette 18 SLK 320 20 Prowler 18 911 15 Ferrari 360 11 Boxster 19 Thunderbird 17 MR2 25
Histograms √ There are 22 cars, so we would use 4 < 22 < 5 classes, so here I will choose 5. The size of each class would be 57 − 9 + 1 49 = = 9.8 5 5 So we will use 10.
Histograms √ There are 22 cars, so we would use 4 < 22 < 5 classes, so here I will choose 5. The size of each class would be 57 − 9 + 1 49 = = 9.8 5 5 So we will use 10. Mileage Frequency 9 -18 11 19-28 10 29-38 0 39-48 0 49-58 1
Histograms MPG for Sports Cars 12 Frequency 9 6 3 19-28 29-38 39-48 49-58 9-18 MPG Center:
Histograms MPG for Sports Cars 12 Frequency 9 6 3 19-28 29-38 39-48 49-58 9-18 MPG Center: Boundary between the first two classes Range:
Histograms MPG for Sports Cars 12 Frequency 9 6 3 19-28 29-38 39-48 49-58 9-18 MPG Center: Boundary between the first two classes Range: 58 − 9 = 49 Shape
Histograms MPG for Sports Cars 12 Frequency 9 6 3 19-28 29-38 39-48 49-58 9-18 MPG Center: Boundary between the first two classes Range: 58 − 9 = 49 Shape Unimodal, skewed right
Bar Graphs These look similar to histograms but there are a few differences. Generally used for categorical variables
Bar Graphs These look similar to histograms but there are a few differences. Generally used for categorical variables Bars can be vertical or horizontal
Bar Graphs These look similar to histograms but there are a few differences. Generally used for categorical variables Bars can be vertical or horizontal Cannot analyze distribution like histogram because the order of the classes is not necessarily in numerical order
Bar Graphs These look similar to histograms but there are a few differences. Generally used for categorical variables Bars can be vertical or horizontal Cannot analyze distribution like histogram because the order of the classes is not necessarily in numerical order Can be used for comparisons
Bar Graphs Example The growth of the US population age 65 and over is given in the table. Create a bar graph to represent this data. 1900 4.1 1970 9.8 1910 4.3 1980 11.3 1920 4.7 1990 12.5 1930 5.5 2000 12.4 1940 6.9 2010 13.2 1950 8.1 2020 16.5 1960 9.2 2030 20.0
Percent 5 10 15 20 Bar Graphs 1900 1910 1920 1930 1940 1950 Year 1960 1970 1980 Age of Seniors by Decade 1990 2000 2010 2020 2030
Bar Graphs for Comparisons Example Create a bar graph for the given causes of death and analyze the results. Values given are the number per 100,000 people. Cause of Death 1970 1980 1990 2000 Cardiovascular 640 509 387 318 Cancer 199 208 216 201 Accidents 62 46 36 34
Bar Graphs for Comparisons Causes of Death 600 Number of Deaths Legend (per 100,000) 450 Cardiovascular 300 Cancer Accidents 150 1970 1980 1990 2000 Year
Bar Graphs for Comparisons Causes of Death 600 Number of Deaths Legend (per 100,000) 450 Cardiovascular 300 Cancer Accidents 150 1970 1980 1990 2000 Year Cancer and accidents are roughly the same in each decade
Bar Graphs for Comparisons Causes of Death 600 Number of Deaths Legend (per 100,000) 450 Cardiovascular 300 Cancer Accidents 150 1970 1980 1990 2000 Year Cancer and accidents are roughly the same in each decade Cardiovascular disease decreases each decade and is approaching level of cancer deaths
Pie Charts Pie charts are used to compare different categories in relation to each other. It is only good for one type of variable; we cannot do comparisons between different types of observations with one pie chart.
Pie Charts Pie charts are used to compare different categories in relation to each other. It is only good for one type of variable; we cannot do comparisons between different types of observations with one pie chart. Example You sit on an overpass and record the color of the first 100 cars you see. The results are as follows: color frequency red 15 blue 21 green 18 white 22 black 19 other 5 Construct a pie chart to illustrate the relationship between the colors of these cars.
Pie Charts We have to make sure that the size of each slice is correct in relation to the other slices. To do this, we make sure the central angle is the correct size. Since we have 100 observations, the number of observations is the percent of the circle we need for that wedge. For example, for the red cars, we saw 15 of them, so we would need a central angle of .15(360◦ ) = 54◦ .
Pie Charts Blue 21% Red Green 15% 18% 5% Other 22% 19% White Black
Pie Charts Example The following is a breakdown of the solid waste that made up America’s garbage in 2000. Values given represent millions of tons. Material Weight Food 25.9 Glass 12.8 Metal 18.0 Paper 86.7 Plastics 24.7 Rubber 15.8 Wood 12.7 Yard Trimmings 27.7 Other 7.5 Create a pie chart to represent this data.
Pie Charts We can’t make a pie chart with this data; at least not yet. What do we need? Material Weight Relative Frequency Food 25.9 11.2 % Glass 12.8 5.5% Metal 18.0 7.8% Paper 86.7 37.4% Plastics 24.7 10.7% Rubber 15.8 6.8% Wood 12.7 5.5% Yard Trimmings 27.7 11.9% Other 7.5 3.2% 231.9
Pie Charts Now we can find the central angles and create our pie chart. Material Weight Relative Frequency Central Angle Food 25.9 11.2% 40.3◦ Glass 12.8 5.5% 19.8◦ Metal 18.0 7.8% 28.1◦ Paper 86.7 37.4% 134.6◦ Plastics 24.7 10.7% 38.5◦ Rubber 15.8 6.8% 24.5◦ Wood 12.7 5.5% 19.8◦ Yard Trimmings 27.7 11.9% 42.8◦ Other 7.5 3.2% 11.5◦
Pie Charts Metal Glass 7% Paper 6% Food 37% 11% 3% Other 12% 11% Trimmings 7% 6% Plastics Wood Rubber
You can also read