This post was co-written with Diana Chin as a critique of a visualization published by Trulia. This critique was done by answering a series of structured questions relating to interactive visualization strengths and weaknesses.
1. List the source of the visualization (a bibliographic citation is preferred, including the date if you can determine it). Include the creator and what you know of their background.
- Visualization Source:
- Kolko, J. (2013, March 25). American Homes Through The Decades. Retrieved from http://www.trulia.com/trends/vis/homes-by-decade/
- Author Information:
- Jed Kolko is the Chief Economist for the company Trulia Trends. He holds a Ph.D. in Economics at the Harvard University and has over 15 years of experience in publishing papers and research management in economic development and consumer technology. He has also worked as a researcher with major companies such as TIME Magazine and CNN.
2. Who is the intended audience? What is its intended goal or purpose?
There are several audiences who may be interested in this visualization. Some home buyers who are interested in purchasing homes for their historical qualities or styles could use this tool to learn about which decades are represented in their market or where more houses from a given decade are concentrated. These buyers could also use this visualization to learn about how a home’s build date affects its listing price in their market; some home markets clearly value older homes more than others do.
An additional audience might be real estate agents (either at a local or national level) who want to learn about home distributions and prices in order to stay informed for their clients. Finally, the visualization may be interesting to the general public who is interested in getting a general idea of where houses were being built at different points in time (such as seeing when the west coast started building more rapidly.
3. What information does this visualization represent?
This visualization shows a geographical representation of where home buyers could find a higher share of homes from a particular decade. It also enables users to compare the median prices for the currently listed homes by decade across different housing markets.
4. How many data dimensions does it encode? Are the encoding mappings appropriate?
The visualization appears to encode 4 data dimensions:
- Geospatial data (longitude and latitude) of each housing market is encoded via 2D position overlaid on a map, which is an appropriate and conventional method for showing this data type. However, when the dots are too large, they are displaced from their correct position, leading to some distortion, inaccuracy, or misrepresentation.
- Each circle on the map is both sized and shaded to represent the share of houses in that market from the currently selected decade: a darker shade/value and a larger size represent a larger share of houses from that decade. Additionally, an orange color highlights the top 3 markets for the selected decade. Each method is an appropriate encoding, though the redundancy is unnecessary. These encodings alone would be weak for making accurate comparisons because neither of these methods are very precise according to Few, but at the very least, they give a rough approximation (Few 2009, 41-43).
- Years (binned as decades) are represented on the bar chart on the x-axis, which follows conventional placement and is an appropriate encoding method.
- The median prices of homes in a given decade and market are encoded in the bar chart by the bars’ length, which is an appropriate and ideal encoding for enabling accurate comparisons between two quantitative values; Few states that this is one of the two most precise encoding methods (Few 2009, 41). Additionally, the y-axis scale begins at zero, so there is no distortion or lie factor (Tufte 1983, 57).
5. List tasks, comparisons, or evaluations it enables.
Viewers can see whether older houses are valued more highly than newer houses for a given area. For example in Figure 2, homes tend to increase in price as they age (pre-1960s) for New York city; but the same trend is not present in nearby Syracuse as seen in Figure 3. This can provide good insight into whether older homes are treasured or tolerated. From Figure 2 and Figure 3, viewers also get an approximate of where homes were being built within a given decade. These two bar graphs show that most home built in the 1920s occurred on the east coast, with some minor activity in major west coast cities (not accounting for homes that have since been demolished). In addition, these two graphs also enable the users to compare the median prices across the years.
In Figure 4, users can view the share (represented by percentage) of currently listed houses from the selected decade for a given market.
In Figure 5, viewers can find which state has the most modernly built houses. In this case, Cincinnati, Ohio has the most houses built in 2010.
In contrast, Figure 6 shows that Peabody, Massachusetts have the most houses built in the pre-1990s.
6. What principles of excellence best describe why it is good?
The visualization does a fairly good job at minimizing distortions and accurately representing the data (adapted from Mackinlay, APT (A Presentation Tool), 1986). For example, the bar graph’s y-axis is set at zero so that length accurately portrays the value with a lie factor equal to 1 (Tufte 1983, 57). While circular area is not always accurately perceivable (Few 2009, 41), it is at least scaled consistently with respect to the value.
The two charts encourage comparisons effectively (Tufte 1983, 13), especially the bar chart, which is one of the most accurate encodings on Mackinlay’s scale (adapted from Mackinlay, APT (A Presentation Tool), 1986). The map invites comparisons, though this is more difficult than with the bar chart.
The visualization also minimizes “chart junk” and keeps a fairly high data-ink ratio; it is not decorated with extraneous illustrations (Tufte 1983, 51), though the redundancy of encoding (shade and size) is unnecessary.
Finally, the visualization makes a large data set easy to understand by breaking it into smaller pieces (filtered by market and decade) (Tufte 1983, 13).
7. List at least three strengths and weaknesses for the visualization.
What makes America Homes Through the Decades such a strong visualization is that it provides an uncluttered, clear, and ease-to-use interface. The bar chart, included in this visualization, codes length which makes it a strong method for encoding quantitative data such as the median home prices. As Few indicates, this is one of the best methods for encoding quantitative data because it can be pre-attentively processed and because the differences in length can be perceived with a high degree of precision. In addition, the way the visualization automatically highlights the three highest values for percentage of homes in a market) for a given decade (denoted by the three orange circles shown in Figure 7), makes easier to find data that might be interesting or worth looking at.
On another note, we found several weaknesses for this visualization. One of them is that the dots (due to the large size of the dots, avoidance of circles overlapping, and the small map), lie over the water. As you can see from Figure 8, as a result, the dots do not clearly associate with the geographic area or housing market; this ends up forcing the user to hover over the dot to see which state it is. On this geospatial map, there isn’t a legend that clearly identifies the meaning of the size, color, or shade of the dots. Hence, this is only useful for comparing percentages amongst the other states rather than comparing against a standard. In other words, there isn’t a way to get an idea of the numerical value without comparing. In addition, the visualization uses color, size, and shade to encode the same information which is the percentage of houses built during the decade. This could potentially confuse the user as they may think each property is encoding a separate piece of data. Moreover, neither area, shade, nor color is the strongest data encoding method for representing quantitative information. Perceived area tends to underestimate actual change in area. (Tufte, 1983, 55; in-class lecture about Stevens’ power law)
The bar graph that the visualization displays does not have labeled values on the y-axis and relies on the fact that users must hover over the bars to find the numerical value. As a result, users are only able to make relative comparisons rather than exact comparisons. In addition, this visualization is limited because it starts at pre-1900. This leads to the question, how far back does this data go to? This visualization also does not mention any alternations for money inflation.
8. For interactive visualizations, which of Shneiderman's  infovis tasks does it support and how?
This visualization supports at least 4 of Shneiderman’s infovis tasks:
- Details on demand (Schneiderman 1996, 4): As you can see from Figure 9, the user can see details by hovering over a circular data point, which shows the name of the housing market and the share of houses in that market that were built in the currently selected decade. Additionally, in Figure 10, you can see that the user can access details on demand by hovering over a bar in the bar chart, which reveals the exact value that the bar represents (for more precision). Finally, it also shows images of houses from the selected decade.
- Relate (Shneiderman 1996, 5): The visualization lets the user make comparisons between housing markets within a selected decade (by comparing dot shade/size) and comparisons in median price by decade within a selected market (by comparing bar length).
- Filter (Shneiderman 1996, 4): The user can filter by decade (by clicking on one on the left side of the visualization) to see the share of each market represented by that decade. The user can also filter by market (by clicking on a single dot) to view median prices by decade for that single market. One thing to note is that the filtering is always active; there is no way to get an overview, such as viewing the median price by decade for multiple markets at once or for viewing the map with multiple decades active.
- Select (Shneiderman 1996, 4): The viewer can select a market to view median prices (which updates a linked chart) by hovering over its dot, and clicking on a dot locks it in place so that hovering will not accidentally change the view. Hovering over a bar also selects it to show the precise value.
9. Does the visualization serve its intended purpose?
This visualization appeals to home buyers that appreciate historical, vintage houses. However, this visualization would not satisfy the users who want to make accurate comparisons and to find information on the quality of housing (via reviews). On the sellers’ side, this would be a useful reference if they want to make general comparisons across the percentage of houses made within a specific year. However, this visualization wouldn’t be generalized across all of his/her clients because of its limitations: the limitations include being unable to make accurate comparisons and the lack of other types of information such as how many rooms a particular house has.
10. Can you suggest any improvements?
Though this visualization is fairly strong, there are opportunities to improve it. It could be powerful to be able to compare different housing markets’ median prices rather the viewing just one at a time; selecting multiple markets could introduce a second set of bars to the bar chart, which would make it easy to compare (instead of having to switch back and forth).
Referring to Figure 11, a visualization that allows comparisons between markets can reveal additional insights. Here, a user might learn that older homes are valued more (relative to other homes in the same market) in Philadelphia than in Boston (mock-up created by Matt Soave). The y-axis should be labeled (see above) so that it’s easy to understand the bars’ length without hovering. Additional dimensions could be added to the dots’ details on demand tooltip, such as the median price for that market and decade or supplementary statistics, such as price range or number of listing.
11. Why do you like this visualization?
We like this particular visualization because it is clean, simple, fairly easy to navigate and there are multiple ways to filter the data. Unlike other visualizations where it takes time to load once you click an item to filter, this visualization responds fast which enables users to efficiently move through the data. We also like how the visualization provides photographs of the houses’ interior design for the selected year; it adds to the user’s understanding of what they are looking at. Moreover, the generous use of color enhances the clarity of the communication.