Dot Plots and Histograms

In this activity, you will use Fathom to gather data from the internet. You will look at two different techniques for capturing data from the internet into a collection box in Fathom. Once you have captured the data, we will look at two different ways of describing the distribution of the data graphically, dot plots (frequency plots) and histograms.

Drag and Drop

Open a new Fathom window as shown in Figure 1.

Figure 1.

You are now going to do a comparison of annual rainfall in Eureka, California with that near the Civic Center in Los Angeles, California. Start by visiting the Los Angeles Almanac at  http://losangelesalmanac.com/ .  Near the bottom of this page is a category for weather and a link for  rainfall  which leads to a page with the link  Total Seasonal Rainfall (Precipitation) at the Los Angeles Civic Center by Season Since 1877 . This last page contains an HTML table with annual rainfall levels at the Los Angeles Civic center dating from 1877. The first few rows are shown in Figure 2.

Figure 2.

The image in Figure 3 shows the address bar of Internet Explorer (your browser may be different).

Figure 3.

Note the Explorer Icon  in the address bar. Click on this with your mouse and drag this icon into the Fathom window. The result is shown in Figure 5.

Figure 5.

Note that a collection box has been created. With the collection box highlighted (single-click it with your mouse), drag a Case Table into the Fathom window, as shown in Figure 6.

Figure 6.

As you can see, the Case Table contains the rainfall data from the web page. There is no guarantee that this will always work with Fathom and web pages containing HTML tables, but it is certainly something to try. It beats typing the data in by hand.
 

Excel Data

Sometimes people store data in Microsoft Excel files on the internet. Should you come across one of these data sets, you can open the data in Microsoft Excel, then copy and paste the data into a Fathom Collection box. First, let's locate some data that's been store in an Excel file. At the Humboldt State University Department of Geology, we can find a data set containing annual rainfall levels for the city of Eureka, California. These are located on the page  http://www.humboldt.edu/~geodept/geology531/531_datasets_index.html . You will find a link there for Eureka Annual Rainfalls (Excel). Right-click this link with your mouse, then select Save Target As ... from the ensuing popup menu. Select a folder in which to save the file. You need to remember the name of the file (eureka_annual_rainfalls.xls) and the location where you save the file. Next, open Microsoft Excel and open the file eureka_annual_rainfalls.xls, the first few entries of which look like that shown in Figure 7.

Figure 7.

One key requirement for Fathom data sets is the requirement that each column of data has a "header" or "column name." In the case of the Excel data in Figure 7, the columns are named Year and Eureka Precip. You are now going to copy and paste these columns into a Fathom Collection. First, drag a new Collection Box into your Fathom window as shown in Figure 8.

Figure 8.

Next use your mouse to highlight the Excel data columns containing the Los Angeles rainfall data as shown in Figure 9. Then select Edit->Copy to copy the data to the Clipboard.

Figure 9.

Next, return to the Fathom window and right-click the new collection icon with your mouse. A popup menu will come up as shown in Figure 10. Select Paste Cases from this popup menu.

Figure 10.

Highlight the new collection box (single-click it with your mouse) then drag a new Case Table into the Fathom window. This will allow you to examine the L.A. rainfall data, as shown in Figure 11.

Figure 11.


 

Stacking Attributes (Variables)

You could easily make separate dot plots for each data set, but it would be better if you could make simultaneous plots on the same scale in order to more easily compare the rainfall of the two cities. Fathom has a utility that will help us with this task called Stacking Attributes.  First, right click the attribute Season... in the first case table and select Delete Attribute from the ensuing popup menu. Next, do the same to delete the InchesA... attribute. This leaves us with the single attribute TotalInc... in the first Case Table, as shown in Figure 12.

Figure 12.

Next, in the second Case Table, the one containing the L.A. data, single-click the attribute EurekaP... to highlight the data column, as shown in Figure 13.

Figure 13.

Next, go to the Edit menu in Fathom and select Copy Attribute from the menu. Next, right-click the first collection icon (not the Case Table) and select Paste Attribute from the ensuing popup menu. This will add the rainfall data to the first case table, as shown in Figure 14. You can now safely delete the second collection icon and its Case Table, as shown in Figure 14.

Figure 14.

Now you will "stack" the attributes. Single-click the collection box to highlight it. Then select Stack Attributes from the Analyze menu, as shown in Figure 15.

Figure 15.

The result is a new collection icon. If you highlight this collection icon (single-click it with your mouse), then drag a new Case Table to the Fathom window, you can inspect its contents, as shown in Figure 16.

Figure 16.

The result is a little difficult to display in an HTML page, but the idea is simple. The data in the first column (L.A. data) has been "stacked" atop the data int he second column (Eureka Data). Also, a categorical variable has been formed with two values, TotalInc... and EurekaP.... If you use the scroll bar in the second case table, you can scroll down until you see the Eureka data, as shown in Figure 17.

Figure 17.


 

Dot Plot

You can now safely delete the we13 collection and its case table, as shown in Figure 18. You want to keep the "stacked" collection and its Case Table. Drag a new Graph object to the Fathom window. Drag the Value attribute from the Case Table to the horizontal axis and the Group attribute to the vertical axis. This results in "stacked" histograms on the same scale, where one can easily compare the rainfall from the two cites, as shown in Figure 18.

Figure 18.


 

Histogram

To change to a histogram is a simple matter. In Figure 18, note the Dot Plot in the upper right corner of the graph object in the Fathom window. Click this with your mouse and select Histogram from the drop down menu, as shown in Figure 19.

Figure 19.

The result is "stacked" histograms, as shown in Figure 20.

Figure 20.


 

Homework

Visit  Counties Ranked by Size at the U.S. Census Bureau. Perform the following tasks.
  1. Select the link to South Dakota. Drag the HTML icon for the resulting page into a Fathom window.
  2. With the collection box highlighted, drag a Case Table into the Fathom Window. Delete the first case that gives a total for the state. We only want the population totals for each county of the state. You'll also have to delete the any last cases at the bottom of the Case Table that contain spurious data. Delete all attributes except for the fourth, which contains the population data for each county as of July 1, 2001. Double-click the name of this attribute and change it simply to SouthDakota.
  3. Go back to Counties Ranked by Size and select the link for North Dakota. Perform the same tasks as in item #2, but this time name the attribute NorthDakota.
  4. Copy the attribute NorthDakota into the first Collection icon, then delete the second collection icon and its Case Table.
  5. Stack the data in the Collection item, then use the stacked data to draw "stacked" dot plots and histograms to compare the population data for the counties of each state.
  6. Obtain a printout of your result and submit with your homework.