Michael T. Gastner

Assistant Professor, Yale-NUS College

I am an applied mathematician and data scientist. My main interests are at the interface of data visualization and cartography. Currently, I am spending most of my research time on developing user-friendly methods to construct cartograms.

Curriculum Vitae

Cartograms

We live in the era of “big data”. Visualizing data is an important step when summarizing information. Geographic maps are a popular means to visualize spatial data, but conventional maps often tell a misleading story. Usually, each map region is displayed with an area proportional to its actual land area.

Unfortunately, equal-area maps usually miscommunicate statistical data. In an election, for example, the land area of a region is irrelevant. Instead, well-designed infographics should visualize the number of votes in a region.

Cartograms are infographics that rescale map areas in proportion to statistical data (e.g. population size or the number of votes).

I am developing cartogram algorithms and software. The objectives are that

  • the algorithms guarantee that all areas are scaled correctly.
  • geographic neighbours remain neighbours on the cartogram.
  • cartograms are presented with an intuitive user interface.
For our latest algorithm (described in this PNAS paper), please visit our GitHub repository. To simplify cartogram generation, we have recently embedded this code into the web application go-cart.io.
Cartogram of the 2016 US presidential election
Cartogram of the 2016 US presidential election. Each state is represented by an area in proportion to its number of electors. States won by the Republicans are red, those won by the Democrats blue. See our paper for more details.

Publications

Click here for a complete publication list with links to full-text papers.
Below is a selection of representative publications.

Interactivity in cartograms

I. K. Duncan, S. Tingsheng, S. T. Perrault and M. T. Gastner
Task-based effectiveness of interactive contiguous area cartograms
IEEE Trans. Vis. Comput. Graph. 27(3):2136–2152 (2021)

Cartograms are map-based data visualizations in which the area of each map region is proportional to an associated numeric data value (e.g. population or gross domestic product). Because of their distorted appearance, cartograms have often been criticised as difficult to read. We conducted an experiment to evaluate whether cartograms are more legible if they are accompanied by interactive features (animations, linked brushing, or infotips). With access to interactivity, most participants answered even complex questions about the maps correctly. Among the interactive features, animations had the strongest positive effect, so we recommend them as a minimum of interactivity when cartograms are displayed on a computer screen.

Interface used by participants
Interface used by participants during cartogram reading tasks. (1) Conventional map. (2) Cartogram of the same country. (3) Participants were informed about available interactive features. (4) When this button was pressed, the cartogram displayed on the right switched to a conventional map. (5) Infotip with statistics about the region under the pointer.

Efficient cartogram generation

M. T. Gastner, V. Seguy and P. More
Fast flow-based algorithm for creating density-equalizing map projections
Proc. Natl. Acad. Sci. U.S.A. 115(10):E2156–E2164 (2018)

On conventional maps, each region is displayed with an area proportional (or at least nearly proportional) to its geographic area in square kilometres. But equal-area maps can grossly misrepresent demographic data: densely populated cities should be given more prominence than large, but sparsely populated territories. Cartograms solve this problem by rescaling map regions in proportion to, for example, population or gross domestic products. Here we describe and benchmark a fast flow-based algorithm that computes cartograms in a matter of seconds.

GDP cartogram of India
The states and union territories of India on a cartogram where the area of each region is proportional to its gross domestic product. Delhi (DL), although small in land area, appears larger on this cartogram than many other states because of its economic importance. Our algorithm only needs a few seconds on standard hardware to construct the cartogram.

Cargo shipping

P. Kaluza, A. Kölzsch, M. T. Gastner and B. Blasius
The complex network of global cargo ship movements
J. Royal Soc. Interface 7(48):1093–1103 (2010)

The global network of merchant ships plays a crucial role in human mobility, the exchange of goods and the spread of invasive species. We use information about the itineraries of 16 363 cargo ships during the year 2007 to construct a network of links between ports. We show that bulk dry carriers, container ships and oil tankers differ in their mobility patterns and networks. Container ships follow regularly repeating paths whereas bulk dry carriers and oil tankers move less predictably between ports. The network of all ship movements possesses a heavy-tailed distribution with systematic differences between ship types.

Map of the global cargo ship network
The routes of all cargo ships bigger than 10 000 GT during 2007. The colour scale indicates the number of journeys along each route. Ships are assumed to travel along the shortest (geodesic) paths on water.

Diffusion cartograms

M. T. Gastner and M. E. J. Newman
Diffusion-based method for producing density-equalizing maps
Proc. Natl. Acad. Sci. U.S.A. 101(20):7499–7504 (2004)

Cartograms are maps in which the sizes of geographic regions (e.g. countries, provinces) appear in proportion to their population. Such maps are invaluable for data visualization. Unfortunately, to scale regions and still have them fit together, one is normally forced to distort the regions’ shapes, potentially resulting in maps that are difficult to read. Here we present a technique based on ideas borrowed from elementary physics that suffers from none of these drawbacks.

news cartogram of the USA
In this cartogram, states of the USA have an area proportional to the frequency with which they appeared in news stories between November 1994 and April 1998. Data from the North American News Text Supplement (Linguistic Data Consortium, Philadelphia, 1998).

Teaching

YCC1122: Quantitative Reasoning

This “Common Curriculum” course aims to develop the students’ skills in logical and statistical reasoning so that they become critical and informed readers of quantitative data. The course applies the pedagogy of Team-based Learning to ensure that students who bring diverse talents and backgrounds to the course can learn together and from each another.

Students learn to criticise and question empirical claims, support them with logical arguments and address real-life problems by gathering and visually representing quantitative data. The course teaches quantitative literacy so that students grasp how algorithmic and statistical thinking is used in the natural and social sciences.

Net worth vs. age of the 50 richest Singaporeans
Example of data analysis in Quantitative Reasoning. The data (from Forbes) show the net worth of the 50 richest Singaporeans in the year 2018 versus their age. Does their wealth depend on age? We perform a linear regression of the logarithm of net worth as a function of age. Although the least-squares regression line (blue) increases, the 95% confidence interval (grey) is too wide to conclude that the line's slope is different from zero (p-value 0.31). Hence, we fail to reject the null hypothesis that there is no relationship between age and net worth.

YSC2210: Data Analysis and Visualization (DAVis) with R

This course teaches how to use the programming language R for analyzing and presenting statistical data. Starting from the fundamentals of R (data types, flow control), students learn how to write their own R scripts and functions. They learn how to extract data from web sites and bring the input into a shape (e.g. using regular expressions) that is suitable for further analysis.

Much of the course focuses on R’s graphics features, including network representations and geographic maps. The objective is to present data in ways that are informative, elegant and fun (e.g. as short animated video clips).

Example of a visualization project in DAVis: An animation of Singapore's age pyramid between 1960 and 2017. (Data from Data.gov.sg.)

YSC3216: Stochastic Processes and Models (SPaM)

What do stock markets, the weather, genetic mutations and the movements of a drunkard have in common? All these phenomena are subject to a certain degree of randomness. Such “stochastic processes” are a vibrant area of interdisciplinary research, ranging from mathematical finance over biology to predicting waiting times in supermarket queues.

In this course, students learn the mathematics behind the most common models of stochastic processes: Markov chains, Poisson and renewal processes, queuing theory. Students learn how to prove the most important mathematical results and apply them to realistic problems.

Drunkard's walk
Example of a stochastic process: the random walk. A drunkard walks out of a pub (right). His house is on the same street (left), but he is so drunk that he keeps forgetting in which direction he has been going so far. After every step, he walks to the left and right with equal probability. How many steps does he need on average to reach home? Will he ever get there? The drunkard's walk is a basic model for many stochastic processes (e.g. share prices, population growth).

YSC4208: Monte Carlo Simulations in Science and Statistics (MoCaSinSS)

Monte Carlo simulations are computer experiments that solve numerical problems by using random number generators. At first glance, it may seem bizarre to use a computer, arguably the most accurate and deterministic of all human inventions, to perform random experiments. However, Monte Carlo simulations are nowadays an essential component in many quantitative studies. They are used in the natural sciences, industrial engineering, finance and statistics.

This course teaches how to write elegant and efficient Monte Carlo simulations for concrete real-world examples. Students also learn the theoretical foundations of pseudorandom number generators, Markov chain Monte Carlo and the Metropolis-Hastings algorithm.

Example of a Monte Carlo simulation. Buffon's needle” is an experiment to estimate the value of \(\pi\) = 3.14... We randomly drop needles onto a floor with parallel strips (vertical lines at the top). Needles that cross at least one of the strips are coloured blue. Needles that fall between two strips are shown in green. By counting the numbers of blue and green needles, we can obtain an estimate of \(\pi\). The more needles we drop, the more accurate the estimate (plotted at the bottom).

Contact

Michael T. Gastner
Yale-NUS College, Division of Science
16 College Avenue West, #01-220 Singapore 138527
michael.gastner@yale-nus.edu.sg

My office is RC3-02-05L in Cendana College (2nd floor).

Michael T. Gastner's LinkedIn profile