Data Visualization
Complete Midterm Notes
Definitions, Core Elements & Key Concepts
Before you can design anything, you need to understand what visualization actually is and why it works. Start here.
The Core Definition
Data Visualization has two widely accepted definitions. They sound similar but have an important difference:
Data Visualization
The use of computer-supported, interactive, visual representations of data to amplify cognition.
→ Focus: raw data (structured numbers, tables, etc.)
Information Visualization
The use of computer-supported, interactive, visual representations of abstract data to amplify cognition.
→ Focus: abstract/conceptual data (relationships, hierarchies, etc.)
DataViz = the representation and presentation of data that exploits our visual perception abilities in order to amplify cognition. This definition breaks into three inseparable ideas: Representation, Presentation, and Amplify Cognition.
The Three Pillars of the Definition
Every visualization you ever make is built on these three ideas. Understand each one deeply.
1. Representation
Taking data as the raw material and creating a visual form to best portray its attributes. It is the choice of physical forms (shapes, lines, colors, positions) used to encode the data. Think of it as answering: "What shape does my data take?"
2. Presentation
Presentation goes beyond just showing the data. It concerns how you integrate the data representation into the overall communicated work. It includes decisions about:
- Colors and color palettes
- Layout and composition
- Annotations, labels, and titles
- Interactive features (hover tooltips, filters, etc.)
Think of it as: "How does my visual look and feel as a complete piece of work?"
3. Amplify Cognition
This is the why. Amplifying cognition means maximizing how efficiently and effectively we process information into thought, insights, and knowledge. A visualization that looks beautiful but confuses the viewer has failed. The goal is always to make the reader think better, faster, or more accurately.
DataViz = Representation + Presentation → Amplify Cognition
Art or Science?
Data visualization is both, but it leans more toward science than most people think. Doing it well requires knowledge from several traditionally separate fields:
"Getting visualization right is much more a science than an art, which we can only achieve by studying human perception."
Gestalt Laws (Theoretical Ancestry)
The Gestalt laws are psychological principles that explain how humans naturally group and perceive visual elements. They are the scientific backbone of why certain visual arrangements feel intuitive. You don't need to memorize all of them for the midterm, but know they exist and why they matter for DataViz design.
How to Make Good Visualization
Three things must be understood and balanced:
- Properties of the data and information — what type of data is it? What story does it hold?
- Properties of pictures — what visual encodings (position, length, color, area) are most accurately perceived by humans?
- Rules to map data into pictures — the grammar of graphics, the design principles, the methodology.
Why Do We Visualize? The Two Core Purposes
Every visualization project is created for one of two reasons — or a blend of both. Know the difference.
Purpose 1 — Data Analysis
Using visualization to understand data and extract comprehensive information from it. The chart is a tool for you (the analyst), not necessarily for a general audience. When you visualize data to analyze it, you are exploring — looking for patterns, outliers, and hypotheses.
"The greatest value of a picture is when it forces us to notice what we never expected to see." — This is the essence of exploratory data analysis (EDA).
Advantages of Visualization for Data Analysis
- Understand large datasets faster — patterns that are invisible in a spreadsheet become obvious in a chart.
- Capture important properties — distribution shape, outliers, trends, clusters.
- Capture problems — visualization is a tool for quality control. Dirty data often shows up visually before you find it programmatically.
- Facilitate new hypotheses — a chart can suggest relationships you had not thought to test.
Purpose 2 — Communication
Using visualization to communicate information to an audience. The emphasis here is on clarity, simplicity, and emotional tone. Visualization for communication incorporates simplification (removing noise) and tonal intent (the feeling you want to create in the reader).
"Overload, clutter, and confusion are not attributes of information — they are failures of design." If the reader is confused, the designer is at fault, not the data.
The Ultimate Goal
Regardless of purpose (analysis or communication), the ultimate goal of any visualization is to make readers feel like they have become better informed about a subject.
"Visualization A is more effective than B if the information conveyed by A is more readily perceived than the information in B." — Jock Mackinlay
A Brief History of Data Visualization
DataViz is not a new trend — it has existed for centuries. Understanding its history helps you appreciate how current practice evolved.
Historical Timeline
Catalyzed by two forces: (1) powerful new technological capabilities — cheap computing, cloud storage, open data; and (2) a cultural shift toward transparency and accessibility of data. As Hal Varian (Google Chief Economist) said: "The ability to take data, understand it, process it, extract value from it, visualize it, communicate it — that's going to be a hugely important skill in the next decades."
The Four Key Principles of Data Visualization
These are the non-negotiable rules that separate good visualization from bad. Know them, apply them, and be able to explain each with an example.
Overview of the 4 Principles
Deep Dive: Principle 1 — Forms & Functions
Frank Lloyd Wright said: "Form and function should be one, joined in a spiritual union." This is the ideal for DataViz. The question is never "style or substance?" — it is always both.
Practical advice (from the lectures): When starting a project, first secure the functional aspects of the visualization (does it convey the right information accurately?), and only then explore ways to enhance its form (does it look good and engage the reader?).
Deep Dive: Principle 2 — Deliberate Design
Every single design feature in a visualization should be included for a reason:
"We're so busy thinking about if we can do things, we forget to consider whether we should." — Just because a charting tool lets you add a 3D effect or an animation doesn't mean you should.
Deep Dive: Principle 3 — Accessibility Through Intuitive Design
A visualization should be usable without a manual. If your reader needs a lengthy explanation to understand the chart, the chart has failed. Intuitive design means leveraging natural human visual perception so that the message is immediately apparent.
Clutter — too many grid lines, labels, colors, and decorations — adds cognitive load without adding information. Every element you remove that adds no informational value increases the clarity of the remaining elements.
Deep Dive: Principle 4 — Never Deceive
Visualization ethics deals with the potential deception created by visual choices. Deception can be:
- Intentional — deliberately designing a chart to mislead (e.g., a politician cherry-picking a date range to make a trend look favorable).
- Unintentional — arising from an ineffective or inappropriate representation of data (e.g., a truncated Y-axis that makes a small difference look huge).
- From ignorance — caused by a lack of understanding of visual perception (e.g., using area to encode a 1D value, making readers vastly over- or under-estimate).
- Truncated Y-axis — not starting the bar chart axis at 0 exaggerates differences.
- Area vs. length confusion — using bubble size to show a 1D value misleads because humans perceive area, not radius.
- Cherry-picked timeframes — selecting a window of data that shows a trend favorable to your argument.
- Dual Y-axes — two unrelated scales can create false correlations by manipulating axis ranges.
Visualization Skills for the Masses (Stephen Few)
"The skills required for most effectively displaying information are not intuitive and rely largely on principles that must be learned." — This is the whole reason this course exists. Good visualization is a learned discipline, not an innate talent.
How to Build a Visualization: Two Frameworks
Both Fry's 7 Stages and Kirk's 5-Step process describe how a visualization project actually flows from data to finished product.
Framework 1 — Fry's 7 Stages of Visualizing Data (2008)
Ben Fry proposed a process model for creating data visualizations. These stages are iterative — you may loop back, skip, or re-order them depending on the project.
Note: These stages are often iterative and may have a flexible order or even be omitted in simple projects.
Framework 2 — Andy Kirk's 5-Step Methodology (2012)
This is the primary framework used throughout the course. It is more project-management-oriented than Fry's model.
Visualization Function, Tone, Factors & Users
The first and most critical step in any visualization project. Get this wrong and everything downstream is misaligned.
Clarifying the Purpose: Two Questions
- The reason for existing — What triggered this project? What is its scope and context? How much creative control do you have?
- The intended effect — What should the reader think, feel, or do after seeing this visualization?
Establishing Intent: Visualization Function
Every visualization has one of three primary functions. This is a fundamental classification you must know for the exam:
Goal: Convey a specific narrative to the reader.
What it is: Based around a focused story. You already know what the key finding is, and you design the chart to communicate it clearly.
Examples: A corporate dashboard showing key performance figures; a newspaper infographic explaining economic crisis complexity.
→ More about visual presentation of data.
Goal: Provide an interface for the user to explore the data themselves.
What it is: Lacks a single, predetermined narrative. The user drives the exploration and finds their own insights.
Examples: A scatterplot matrix for multivariate correlation exploration; interactive dashboards with filters, brushing, and sorting.
→ More about visual analysis of data.
Goal: Express or exhibit data as an aesthetic or emotional experience.
What it is: The intent is removed from a pure desire to inform. Data becomes the raw material for artistic self-expression.
Examples: A visualization of all adjectives in a novel; artistic renderings of city heartbeat data.
→ More about form and aesthetic than information transfer.
Explanatory vs. Exploratory: Detailed Comparison
| Dimension | Explanatory | Exploratory |
|---|---|---|
| Narrative | Based around a specific, focused narrative | Lacks a single specific narrative |
| Focus | Visual presentation of data | Visual analysis of data |
| Designer role | Creates a clear portrayal of interesting stories from the dataset | Builds a tool for users to seek personal discoveries and patterns |
| Finding | One specific finding defined beforehand | Opens up possibility for chance/serendipitous findings |
| Interactivity | Usually static or minimal | Usually highly interactive (filter, sort, brush, zoom) |
Establishing Intent: Visualization Tone
Tone is about the type of stimulus or desired emotional response you are trying to create in your reader. There are two ends of a spectrum:
Pragmatic / Analytical Tone
The reader reacts analytically. They read values, compare numbers, track trends. Emotions stay low — unless the data reveals something alarming.
Example: "We need a chart to help monitor our quarterly sales performance."
→ Think: corporate dashboards, scientific reports, financial charts.
Emotive / Abstract Tone
The goal is a personal, impactful experience. Abstract or artistic visual choices are used to create feeling, not just to transfer data.
Example: "We need to present this in a way that persuades people to care." (Chris Jordan: "I fear we aren't feeling enough to digest these huge numbers.")
→ Think: data journalism, advocacy visualizations, data art.
In emotive/abstract visualizations, you sometimes move beyond bars and straight lines toward curves, circles, and organic shapes. Abstract tone is more about creating an aesthetic that portrays a general sense of the data's story — you might not be able to read exact values, but the visual impression carries the message.
Key Factors Surrounding a Visualization Project
Beyond intent, every project is shaped by real-world constraints. The "8 hats" concept refers to the many roles a DataViz designer must wear:
Understanding the Users
Visualizations are always made for someone. The user context fundamentally changes the design. Know these five common user environments:
| User Context | Characteristics | Design Implications |
|---|---|---|
| Boardroom | Executives, high-stakes decisions, limited time | Simple, fast-reading summaries. Highlight the single most important number. High contrast. |
| One-to-One Exchange | Manager or analyst with a peer | More detail acceptable. Can support conversation and questions. |
| Large Range of Customers | Diverse backgrounds, variable expertise | Must work across knowledge levels. Clear labels, plain language. Avoid jargon. |
| Global Audience | Cross-cultural, multilingual | Mind color meanings (red ≠ danger universally), symbols, language, numeric formats. |
| Personal / Self | You are the only audience | Function over form. Quick EDA charts. No need for polished presentation. |
User-Centered Design (UCD)
Good visualization design starts with understanding the user. The four UCD tools you should know:
Physical & Cognitive Characteristics of the User
Physical Capabilities
- Color perception: ~8% of men have color vision deficiency. Never rely on color alone to encode information.
- Ergonomics: Screen size, viewing distance, and input device (mouse vs. touch) affect usability.
- Visual contrast: Low contrast is problematic for older users and those with vision impairments.
Cognitive Characteristics
- Attention & memory: Working memory is limited. A cluttered chart forces the user to use cognitive resources on navigation rather than insight.
- Recognition over recall: Users recognize familiar patterns faster than they recall abstract information. Use conventions.
- Cognitive biases: Anchoring bias (first number seen anchors all other comparisons), confirmation bias (users seek evidence supporting existing beliefs).
- Change blindness: Significant visual changes in dynamic visualizations can go unnoticed if not properly highlighted.
User Research Methods
How do you find out what your users need? User research methods are mapped across two dimensions:
Attitudinal vs. Behavioral
Attitudinal: What people say (surveys, interviews). Useful for stated preferences and opinions.
Behavioral: What people do (usability testing, analytics). Reveals actual behavior, which often differs from stated preferences.
Qualitative vs. Quantitative
Qualitative: More effective at revealing why — deep insights from small samples (interviews, usability sessions).
Quantitative: Shows what is happening and how much — statistical patterns from large samples (surveys, A/B tests, analytics).
Collaboration & Communication Contexts
Visualizations are often used in shared, multi-user settings:
- Synchronous Communication: Real-time collaboration — live dashboards in meetings, conferencing, online games. Design for simultaneous group viewing.
- Asynchronous Communication: Reports, email, social media — users interact at different times. Design must be fully self-explanatory without a presenter.
Data Preparation: Editorial Focus & the 6 Mechanisms
Data preparation is typically the most time-consuming and intensive activity in any visualization project. Get it right — everything downstream depends on it.
A. Editorial Focus
Editorial focus is the story you want to tell through the visualization — the main narrative or message you want to emphasize to the reader. It determines the direction and goal of the visualization, not just what data to display.
"What topic or question do I want readers to have answered after seeing this visualization?"
Why Do You Need Editorial Focus?
- Ensures clarity — guarantees the visualization communicates a clear message.
- Guides design decisions — determines what to emphasize and what to omit.
- Prevents information overload — stops you from adding "everything" to a single chart.
- Delivers the right insight — helps surface the finding that actually matters.
Without editorial focus: Show all products, all regions, all metrics in one chart → information overload.
With editorial focus (goal: show sales decline after 2023): Show only total sales per year, highlight 2023–2024, add annotation explaining the cause.
The most influential data visualizations in history — from the New York Times, The Guardian, National Geographic — succeed largely because of strong editorial focus. They do not dump data; they tell a specific, focused story.
B. Preparing & Familiarizing with Data
Data is the primary raw material. Without good data, there is no compelling story to tell. A strong visualization always starts from strong data. Datasets with errors or missing values do not just slow down analysis — they can corrupt the message you are trying to deliver.
The 6 Mechanisms of Data Preparation
⚠️ Ethical Concerns in Acquisition: Ensure data is (1) obtained ethically and responsibly; (2) legally compliant with relevant regulations; (3) respects privacy and confidentiality of sensitive data; (4) used according to its license — especially if publishing or monetizing.
Completeness — Is it all there?
- Does it have all the categories needed?
- Does it cover the full time period needed?
- Are all expected fields/variables present?
- Does it contain the expected number of records?
Quality — Is it clean?
- Are there errors or incorrect values?
- Unexplained classifications or coding conventions?
- Formatting issues (unusual dates, weird ASCII characters)?
- Missing items or incomplete records?
- Duplicate rows?
- Accuracy issues — does the data appear plausible?
- Unusual values or obvious outliers that need investigation?
- Parsing: Split up variables (e.g., extract year from a full date string).
- Merging: Combine variables into new ones (e.g., first name + surname → full name).
- Converting: Turn qualitative/free-text data into coded values or keywords.
- Deriving: Create new values from existing ones (e.g., derive gender from title, sentiment score from text).
- Calculating: Create new metrics (e.g., percentage proportions, ratios, moving averages).
- Removing redundancy: Drop variables you have no planned use for in the visualization.
- Determining resolution: Decide how granular to show the data (see Resolution Options below).
Data Types You Must Know
Understanding your data type is not academic trivia — it determines which chart types are valid and which statistics are meaningful.
| Type | Subtype | Description | Examples |
|---|---|---|---|
| Categorical | Nominal | Named groups with no inherent order. You can count and compare frequencies, but not rank or calculate averages. | Countries, gender, product category, text labels |
| Categorical | Ordinal | Named groups with a meaningful order, but the gaps between levels are not uniform or measurable. | Olympic medals (Gold/Silver/Bronze), Likert scale (Strongly Agree → Strongly Disagree), education level |
| Quantitative | Interval Scale | Numeric values where differences are meaningful, but there is no true zero. Ratios are meaningless. | Temperature in °C or °F, calendar dates (year 0 is arbitrary) |
| Quantitative | Ratio Scale | Numeric values with a true absolute zero. All arithmetic operations are valid. Ratios are meaningful. | Prices, age, distance, weight, speed, count of items |
20°C is not "twice as hot" as 10°C — temperature on the Celsius scale has no true zero, so ratios are meaningless. But $20 is twice as much as $10 — money has a true zero. This affects what calculations and visual encodings are appropriate.
Resolution Options — At What Level of Detail?
One of the most important decisions in data preparation is choosing the level of resolution at which to present the data. Showing too much detail creates visual noise; too little hides important patterns.
Complete Term Reference
Every key term from the lectures defined clearly. Great for last-minute review.
Two Dimensions of Visualization Design
Step 4 of Kirk's methodology — Design Concepting — is where all earlier work (purpose, data prep, questions) gets translated into a visual artifact. Every design decision lives in one of two dimensions.
Where Does This Fit?
Kirk's 5-step methodology: Purpose & Parameters → Prepare & Explore Data → Formulate Questions → Design Concepting ← you are here → Construct & Launch.
Design Concepting asks: How do we give form to our data? The answer lives across two dimensions:
Think: What kind of chart? What shape encodes the data?
Covered by the 5 representation methods and all the chart types within them.
Think: How does the whole thing look, feel, and behave?
Covered by color, interactivity, annotation, and architecture.
1. Choose the correct visualization method (which of the 5 categories does your data need?)
2. Choose the appropriate chart type within that method (which specific chart best fits the data and purpose?)
Five Categories of Visualization
Every chart type in existence belongs to one of these five purposes. Knowing which category your data falls into is the first decision in representation.
| # | Method | What It Does | Classic Example |
|---|---|---|---|
| 1 | Comparing Categorical Values | Facilitate comparisons between the relative and absolute sizes of categorical values. | Bar Chart |
| 2 | Assessing Hierarchies & Part-of-a-Whole | Show a breakdown of categorical values in relationship to a population, or as elements of hierarchical structures. | Pie Chart |
| 3 | Showing Changes over Time | Exploit temporal data to show changing trends and patterns of values over a continuous time frame. | Line Chart |
| 4 | Plotting Connections & Relationships | Assess associations, distributions, and patterns between multivariate datasets. Usually facilitates exploratory analysis. | Scatter Plot |
| 5 | Mapping Geo-Spatial Data | Plot and present datasets with geo-spatial properties. | Choropleth Map |
Comparing Categorical Values
Charts in this group allow you to compare the size, frequency, or magnitude of distinct categories against each other.
| Chart Type | Data Variables | Visual Variables | What You Get / When to Use |
|---|---|---|---|
| Bar / Column Chart | 1 categorical + 1 quantitative | Height / Length, Position | The workhorse of comparison. Compares magnitudes across discrete categories. Bars = horizontal, Columns = vertical. |
| Floating Bar (Gantt Chart) | 1 categorical-nominal + 2 quantitative | Position, Length | Shows a range of quantitative values per category (bar stretches from min to max, not from zero). Reveals variation, overlap, and outliers across categories. |
| Pixelated Bar Chart | Multiple categorical + 1 quantitative | Height, Color-hue, Symbol | Two levels of resolution in one: global bar chart view (aggregate) + detail view inside each bar (pixels/symbols). Usually interactive — hover a pixel for precise detail. |
| Histogram | 1 quantitative-interval + 1 quantitative-ratio | Height, Width | Shows frequency distribution of a continuous quantitative variable over binned intervals. Key difference from bar chart: no gaps between bars; used for continuous data, not categorical. |
| Slopegraph (Bumps / Table Chart) | 1 categorical + 2 quantitative | Position, Connection, Color-hue | Compares two (or more) quantitative values linked to the same categories. Perfect for before–after or two-point-in-time comparisons. The slope direction and steepness encode change. |
| Radial / Circular Bar Chart | Multiple categorical + 1 categorical-ordinal | Position, Color-hue, Color-saturation, Texture | Displays changes over time (each ring = time period), proportional comparisons, and multi-category compositions. Good for overview and pattern detection; not for reading precise values. |
| Glyph Chart | Multiple categorical + multiple quantitative | Shape, Size, Position, Color-hue | Uses a repeated shape (e.g., a flower) where each part encodes a variable. Not for precise reading — for relative comparisons (big, medium, small). Usually interactive for exploration. |
| Sankey Diagram | Multiple categorical + multiple quantitative | Height, Position, Link, Width, Color-hue | Shows flow — how quantities move from one stage to another through connecting ribbons. Ribbon width = magnitude of flow. Best for multi-stage processes or transformations. |
| Area Size Chart (Bubble / Circle) | 1 categorical + 1 quantitative-ratio | Area, Color-hue | Uses circle area to represent magnitude. Often used to emphasize stark inequality between categories. Area is less accurately perceived than length — use with caution for precise comparison. |
| Small Multiples (Trellis Chart) | Multiple categorical + multiple quantitative | Position + any visual variable | A grid of small identical charts, each showing one subset of the data. Exploits the eye's ability to quickly scan and compare many similar charts simultaneously. Best for many categories or time-series comparisons. |
| Word Cloud | 1 categorical + 1 quantitative-ratio | Size | Font size encodes word frequency. Color is usually decorative only. Good for early exploratory text analysis to find key terms — not for precise frequency comparison. Requires good text preprocessing. |
Assessing Hierarchies & Part-of-a-Whole
These charts show how a total is broken down into constituent parts, or how elements are nested within larger structural hierarchies.
| Chart Type | Data Variables | Visual Variables | What You Get / Notes |
|---|---|---|---|
| Pie Chart | 1 categorical + 1 quantitative-ratio | Angle, Area, Color-hue | Often criticized because angles and areas are harder to compare accurately than length or position. Problems arise from misuse: too many categories, 3D effects, disorganized slices. Best practice: max 3 categories, start first slice at vertical, arrange logically. |
| Stacked Bar Chart | 2 categorical + 1 quantitative-ratio | Length, Color-hue, Position, Color-saturation | Shows composition of categories using color + position. Can use absolute or normalized values. Weakness: inner segments are hard to compare accurately because they lack a shared baseline. Use ordinal ordering for ordinal data (e.g., sentiment: disagree → agree). |
| Square Pie / Waffle Chart / Unit Chart | 1 categorical + 1 quantitative-ratio | Position, Color-hue / Symbol | More accurate than pie/donut because it uses grid areas (e.g., 100 squares = 100%). Small parts remain visible and distinguishable. Stays clean even with multiple categories. Good for percentage-based narratives. |
| Treemap | Multiple categorical-nominal + 1 quantitative-ratio | Area, Position, Color-hue, Color-saturation | Nested rectangles where area encodes magnitude. Great for showing large hierarchical datasets in a compact space. Color can encode a second dimension (e.g., growth rate). Area comparison is less precise than length. |
| Circle Packing Diagram | 2 categorical + 1 quantitative-ratio | Area, Color-hue, Position | Many circles packed inside a large circle. Each circle = a category; size = quantitative value; color/position = hierarchy or grouping. Not for precise reading — for seeing relative scale and groupings. |
| Bubble Hierarchy | Multiple categorical + 1 quantitative-ratio | Area, Position, Color-hue | Similar to circle packing but with a more explicit hierarchical arrangement. Bubbles of different sizes grouped by category. |
| Tree Hierarchy (Dendrogram) | 2 categorical + 1 quantitative-ratio | Angle/Area, Position, Color-hue | A tree-shaped diagram showing parent–child relationships. The branching structure makes hierarchical relationships and depth immediately visible. |
Start the first slice from the 12 o'clock (vertical) position as a reference. Limit to ideally maximum 3 categories. Order segments logically. Avoid 3D effects, too many colors, or decorations. If you have more than 3 categories — use a bar chart instead.
Showing Changes over Time
Temporal charts exploit time as the primary axis, revealing trends, cycles, and changes in value across a continuous time frame.
| Chart Type | Data Variables | Visual Variables | What You Get / Notes |
|---|---|---|---|
| Line Chart | 1 quant-interval (time) + 1 quant-ratio + 1 categorical | Position, Slope, Color-hue | The fundamental temporal chart. Slope encodes rate of change. Multiple lines can compare categories over time. Y-axis does not need to start at 0 for line charts. |
| Sparklines | 1 quant-interval + 1 quant-ratio | Position, Slope | Line charts in miniature — Edward Tufte's "intense, word-sized graphics." Not a new chart type, just a very small line chart. Ideal for embedding trend context inside tables or dashboards where space is precious. |
| Area Chart | 1 quant-interval + 1 categorical + 1 quant-ratio | Height, Slope, Area, Color-hue | Like a line chart but with the area below the line filled. The filled area emphasizes cumulative volume. Important: the Y-axis must start at zero, because the area encoding (unlike line slope) requires a true baseline for accurate interpretation. |
| Horizon Chart | 1 quant-interval + 1 categorical + 2 quant-ratio | Height, Slope, Area, Color-hue, Color-saturation | A modified area chart that folds negative values upward and uses color to distinguish positive/negative. Allows many time series to be stacked vertically in very little space, enabling cross-series pattern comparison. |
| Stacked Area Chart | 1 quant-interval + 1 categorical + 1 quant-ratio | Height, Area, Color-hue | Multiple area charts stacked on top of each other. Shows how the composition of categories changes over time. Weakness: middle bands are hard to read accurately because they lack a shared baseline. |
| Stream Graph | 1 quant-interval + 1 categorical + 1 quant-ratio | Height, Area, Color-hue | Like a stacked area chart but without a baseline — layers flow organically around a central axis. Emphasizes peaks and troughs ("ebb and flow") over time. Not for reading precise values. Aesthetic and organic feel. |
| Candlestick Chart | 1 quant-interval + 4 quant-ratio | Position, Height, Color-hue | Used in financial data. Shows OHLC (Open, High, Low, Close) for each time period. Bar height = range from open to close; color = price up or down; wicks = high/low range. Conceptually similar to a boxplot. |
| Barcode Chart | 1 quant-interval + 3 categorical | Position, Symbol, Color-hue | A very compact visualization of event sequences over time using symbols and color. Similar space-efficiency to sparklines. Requires some familiarity to read, but packs a rich story in minimal space. |
| Flow Map | Multiple quant-interval + 1 categorical + 1 quant-ratio | Position, Height/Width, Color-hue | Like a Sankey diagram, but for change over time and/or location. Famous example: Napoleon's 1812 Russian campaign (Minard's map) where ribbon width = troops remaining. Geo-positions are roughly followed but the map is not fully detailed. |
A line chart can have a Y-axis that does not start at zero — the slope carries the meaning. An area chart must have its Y-axis start at zero, because the filled area is what readers judge, and a truncated area creates false impressions of magnitude.
Plotting Connections & Relationships
These charts assess associations, distributions, and patterns between variables — and they usually serve exploratory analysis rather than explanatory storytelling.
| Chart Type | Data Variables | Visual Variables | What You Get / Notes |
|---|---|---|---|
| Scatter Plot | 2 quantitative | Position, Color-hue | The most fundamental relationship chart. Reveals correlations, clustering, and outliers between two continuous variables. X and Y positions encode the two variables. |
| Bubble Plot | 3 quantitative + 1 categorical | Position, Area, Color-hue | A scatter plot extended with a third dimension: bubble area encodes a third quantitative variable; color encodes a category. More information in one chart — but area comparison is less accurate than position. |
| Scatter Plot Matrix | 2 quantitative + 2 categorical | Position, Area, Color-hue | A grid of scatter plots showing every pairwise variable combination simultaneously. Like small multiples applied to correlation analysis. Excellent for multivariate datasets — lets the eye quickly scan across many variable pairings to spot strong/weak relationships. |
| Heatmap / Matrix Chart | Multiple categorical + 1 quantitative-ratio | Position, Color-saturation | A matrix where cells are colored by value. Like small multiples using color as the visual variable. Fast visual scanning for patterns, ordering, and hierarchy across category combinations. Good for correlation matrices, calendar heat, and confusion matrices. |
| Parallel Sets / Parallel Coordinates | Multiple categorical + multiple quant-ratio | Position, Width, Link, Color-hue | Multiple parallel axes, each representing a variable. Each data item is drawn as a polyline crossing all axes. Reveals multi-variable relationships, patterns, and consistency. Functionally similar to Sankey — both show connections across categories. |
| Chord / Radial Network Diagram | Multiple categorical + 2 quant-ratio | Position, Connection, Width, Color-hue, Color-lightness, Symbol, Size | A circular layout where connecting ribbons/chords between categories show the strength and direction of their relationships. Used for complex, bidirectional relationships. Not constrained by X/Y axes. |
| Network Diagram | Multiple categorical-nominal + 1 quant-ratio | Position, Connection, Area, Color-hue | Nodes (entities) connected by edges (relationships). Reveals clusters, sparse connections, dominant nodes, and structural patterns. Often visually complex and "hairball-like" for large datasets — requires careful layout algorithms. |
Mapping Geo-Spatial Data
When your data has a geographic dimension, maps place it in spatial context. Geographic position itself becomes a visual variable.
| Chart Type | Data Variables | Visual Variables | What You Get / Notes |
|---|---|---|---|
| Choropleth Map | 2 quant-interval + 1 quant-ratio | Position, Color-saturation/lightness | Geographic regions (countries, provinces) colored by quantitative value using a gradient (light → dark). Popular but has a critical weakness: larger regions visually dominate even if they have smaller populations, creating potential distortion. |
| Dot Plot Map | 2 quant-interval | Position | Each data point is placed at its geographic coordinates as a dot. Simple and honest — each dot = one occurrence. Dense clusters emerge naturally without distortion from region size. |
| Bubble Plot Map | 2 quant-interval + 1 quant-ratio + 1 categorical-nominal | Position, Area, Color-hue | Combines a map with a bubble plot: dots at geographic coordinates scaled by quantitative value, colored by category. Shows "how much" per location simultaneously. |
| Isarithmic Map (Contour / Isoline) | Multiple quantitative + multiple categorical | Position, Color-hue, Color-saturation, Color-darkness | Uses contour lines or color gradients to show continuous values over geographic space (like elevation, temperature, rainfall). Familiar from weather maps and topographic maps. |
| Particle Flow Map | Multiple quantitative | Position, Direction, Thickness, Speed | Animated particles or arrows that flow across the map to encode direction, magnitude, and movement — e.g., wind patterns, ocean currents. Direction and speed are the key encodings. |
| Cartogram | 2 quant-interval + 1 quant-ratio | Position, Size | A distorted map where each region's size is proportional to a variable (e.g., population or GDP) rather than its true geographic area. Corrects the choropleth's size-dominance problem — but geographic shape is distorted. |
| Dorling Cartogram | 2 categorical + 1 quant-ratio | Position, Size, Color-hue | A variant of cartogram where regions are replaced by uniform circles scaled by value. No geographic shape distortion — circles are simply positioned approximately where the region is. Clean and readable. |
| Connection Map | 2 quant-interval + 1 categorical-nominal | Position, Link, Color-hue | Lines drawn between geographic locations to show connections or flows between places (e.g., flight routes, migration, trade). Line weight and color can encode quantity or category. |
Three Criteria for Chart Selection
Once you know which representation method applies, use these three criteria to select the specific chart type within that method.
The chart must match what your data actually is. Ask: How many data variables do I have? Are they categorical or quantitative? Are they nominal, ordinal, interval, or ratio? The chart's requirements (listed in every chart's "Data Variables" above) must align with your dataset's structure.
Example: If you have 1 categorical + 1 quantitative → bar chart works. Add a second categorical dimension → grouped or stacked bar chart. Add a quantitative second variable → consider a dot plot or scatter plot.
Different visual variables allow different levels of perceptual accuracy. If the reader needs to compare precise values, choose a chart that uses position or length (most accurate). If an impressionistic sense of scale is sufficient, area or color can be used.
→ This connects directly to the Visual Variable Ranking (Part 9 below).
Integrate a visual quality that conveys a deeper connection between the data, the design, and the topic. This is the most subjective criterion — it requires design instinct and experience. A well-chosen metaphor makes the chart feel like it belongs to its subject.
Example: Using a flow-like stream graph to show "ebb and flow" of musical trends feels more appropriate than a stacked bar chart, even if both technically convey the same data.
"The key is not to set out to achieve an attractive and attention-grabbing work — let those qualities emerge as a by-product of good design. Focus instead on delivering the appropriate functional elements by employing the most suitable data representation."
What Are Visual Variables? How Accurately Can We Read Them?
Visual variables are the specific visual forms we assign to data to represent it. Understanding which ones humans perceive most accurately is crucial for choosing the right chart.
Definition: Visual Variable
A visual variable is the specific form we assign to data in order to represent it visually. Examples include:
- The length or height of a bar
- The position of a point on an axis
- The color (hue or saturation) of a region on a map
- The area of a bubble
- The connection between two nodes in a network
- The slope of a line between two points
McKinlay's Perceptual Accuracy Ranking (1986)
Not all visual variables are perceived with equal accuracy. McKinlay's ranking tells us which encodings allow the most precise comparisons — and which should be used only for general impressions.
Position > Length > Angle > Area > Color in terms of perceptual accuracy. This is why bar charts (position + length) are generally more accurate than pie charts (angle + area), and why using color alone to encode quantitative values is the least effective approach.
Visual Variable Example (from lecture)
In a scatter plot of films: X-axis position = profit, Y-axis position = review score, circle area = budget, circle color (hue) = genre. Notice how the most important variable (profit) uses the most accurate encoding (position), while less critical variables use less precise ones (area, color).
The Use of Color in Data Visualization
Color is the most powerful — and most misused — tool in data visualization. It most efficiently taps into the pre-attentive processing of the visual system, but must be used carefully and with a clear purpose.
Two Key Rules of Color Use
Rule 1 — Use It Unobtrusively
Color should not mislead by implying a representation (category distinction, quantity ordering) when no such representation was intended. Using random colors just for decoration creates false visual groupings in the reader's mind.
Rule 2 — Strive for Elegance, Not Novelty
The sensible objective is elegance over attractiveness. A restrained, purposeful color palette almost always beats a flashy rainbow. As with all design layers, every color choice must serve a function.
Three Purposes of Color
Color to Represent Quantitative Data
Color hue (red, blue, green…) has no inherent hierarchy or order of magnitude in human perception. You cannot tell which of red, blue, or green is "bigger" just from the hue alone. Therefore:
Color for Categorical Variables
- Color hue is a particularly strong aid for distinguishing categorical variables — it triggers pre-attentive processing instantly.
- Use a maximum of 12 different hues for category distinction. Beyond that, the palette becomes confusing and categories become hard to tell apart.
- Be sensitive to cultural color meanings — colors carry different symbolic associations across regions (e.g., white = mourning in some Asian cultures; red = luck in China but danger in the West).
Color for Contrast: Foreground vs. Background
Some color combinations are inherently low-contrast and should be avoided:
- Blue on black — many people struggle to discriminate.
- Yellow on white — nearly invisible for most viewers.
- Always check your chosen palette against color blindness simulators (e.g., Vis Check at vischeck.com) to test perceptibility for users with color vision deficiencies.
Approximately 10% of males have red-green color deficiency. If your visualization relies on red vs. green to communicate a distinction, it fails for 1 in 10 male readers. Use a color blindness simulator during design, and always include a secondary encoding (shape, pattern, label) alongside color.
The Potential of Interactive Features
Interactivity transforms a static visualization into a tool. The four categories of interactive features each serve a different user need.
- Select — choose a specific data item or category
- Filter — include/exclude based on criteria
- Exclude — remove unwanted categories
- Modify variables — change which variables are shown on each axis
- Grouping & sorting — reorganize the data by different dimensions
- Brushing — click-drag to highlight a set of data marks across linked views
- Vertical exploration — drill down through hierarchical layers of detail (e.g., country → region → city)
- Horizontal tabs/panels — switch between different views or cuts of the data
- Pan & zoom — navigate spatial or temporal data at different scales
- Reveals actual data values on demand (avoid cluttering the main view with all labels)
- Provides extra detail about a specific data point, category, or event
- Supports: titles, introductions, user guides, visual annotation, legends/keys, units, data sources, and attribution
- Play / Pause / Reset controls for time-series playback
- Manually controllable time sliders — let users scrub through time at their own pace
- Chapter navigation — skip to key milestones in a narrative visualization
- Use sparingly — animation should explain, not just impress
Layout, Placement & Organization
The final presentation layer — how all visible elements are positioned and organized in the overall design.
The Architecture Layer
Architecture and arrangement considers how to lay out the overall design: the placement, organization, and visual hierarchy of all elements (charts, legends, titles, annotations, controls).
Two Core Aims
For the Eye
Reduce the amount of work the eye has to undertake to navigate around the design and decipher the sequence and hierarchy of the display. The eye should naturally flow to the most important element first.
For the Brain
Minimize the amount of thinking and "working out" that goes on in order to understand the layout. A well-arranged design is immediately readable — the reader should never have to ask "where do I look first?"
Guiding Principle: Must Be Intuitive
The key aim of architecture and arrangement is to make the visualization intuitive to navigate. Readers should not need to read a legend to understand the layout structure. The position, size, and grouping of elements should naturally communicate their relationship to each other.
Effective annotation in a visualization includes: Titles · Introductions · User guides · Visual annotation · Legends/keys · Units · Data sources · Attribution. These create the informational scaffold that makes the visualization self-contained and trustworthy.
High-Yield Summary
Everything from this lecture in one tight review pass.
Two Dimensions of Design
- Data Representation — choosing chart type using visual variables.
- Data Presentation — color, interactivity, annotation, and layout.
5 Representation Methods — The Categories
- Comparing Categorical Values → Bar, Sankey, Histogram, Small Multiples…
- Hierarchies & Part-of-a-Whole → Pie, Treemap, Stacked Bar, Waffle…
- Changes over Time → Line, Area, Sparklines, Candlestick, Stream Graph…
- Connections & Relationships → Scatter, Heatmap, Network, Chord…
- Geo-Spatial Data → Choropleth, Cartogram, Bubble Map, Connection Map…
Chart Selection Criteria (3 steps)
- Accommodate the physical properties of the data (data variable types and count).
- Facilitate the desired degree of accuracy (choose visual variables appropriately).
- Create an appropriate metaphor (stylistic fit to the subject).
Perceptual Accuracy Order (McKinlay)
Position > Length > Angle > Area > Volume > Color Saturation > Color Hue > Shape
Color — 3 Purposes, 2 Rules, 3 Schemes
- Purposes: Represent data · Bring data to foreground · Conform to design requirements.
- Rules: Use unobtrusively · Strive for elegance not novelty.
- Quantitative schemes: Sequential (lightness gradient) · Diverging (two hues + midpoint) · Traffic light (Red/Amber/Green — use Blue instead of Green for accessibility).
- Max 12 hues for categorical distinction. Always test for color blindness.
4 Interactive Feature Types
- Manipulate variables & parameters — filter, select, brush, sort, group.
- Adjust the view — drill down, horizontal tabs, pan/zoom.
- Annotated details — tooltips, hover/click reveals.
- Animation — play/pause/slider for temporal data.
Architecture Aim
Minimize eye travel and cognitive effort. Must be intuitive. A layout succeeds when the reader never has to figure out where to look next.
Let attractiveness emerge as a by-product of good design — not as the goal. Focus on functional representation first. Every visual variable, every color, every interactive feature should earn its place by serving the reader's comprehension.
High-Yield Summary — What to Remember for the Midterm
The most likely exam targets, organized by topic. If you only have 30 minutes left before the exam, read this section.
Must-Know Definitions
- DataViz = Representation + Presentation of data that exploits visual perception to amplify cognition.
- Difference between DataViz (raw data) and InfoViz (abstract data).
- Representation = choice of visual form. Presentation = the full delivered work including colors, layout, annotations.
- Amplify Cognition = making insight extraction faster and more accurate.
The 4 Key Principles (Number them correctly!)
- Strive for Forms & Functions — both, not either/or.
- Always justify every design choice — deliberate design; nothing is arbitrary.
- Create accessibility through intuitive design — clutter is a design failure.
- Never deceive the receiver — intentional or unintentional, deception is unethical.
Two Methodologies — Know Both
- Fry (2008): 7 stages — Acquire, Parse, Filter, Mine, Represent, Refine, Interact.
- Kirk (2012): 5 steps — Purpose & Parameters, Prepare & Explore, Formulate Questions, Design Concepting, Construct & Launch.
Three Visualization Functions
- Explanatory — specific narrative, you define the story, visual presentation.
- Exploratory — user-driven, no single narrative, visual analysis tool.
- Exhibition/Data Art — aesthetic self-expression, emotional impact, not pure information transfer.
Two Tones
- Pragmatic — analytical reading, value extraction, corporate/scientific.
- Emotive/Abstract — emotional impact, persuasion, art, curves and organic forms.
Four Data Types
- Nominal — categories, no order (country, gender).
- Ordinal — categories with order, uneven gaps (Likert, medals).
- Interval — numeric, no true zero, differences meaningful (dates, °C).
- Ratio — numeric, true zero, ratios meaningful (age, price, distance).
Six Data Preparation Steps
Acquisition → Examination → Understand Data Types → Transforming for Quality → Transforming for Analysis → Consolidating.
Five Resolution Options
Full → Filtered → Aggregate → Sample → Headline.
Every concept in this course connects back to one idea: effective visualization serves the reader's ability to understand data. Whether you are choosing a chart type, a color, a tone, or a resolution level — always ask: "Does this decision help the reader understand the data better and faster?" If no, cut it.