Michael T. Gastner

Assistant Professor, Yale-NUS College

I am an applied mathematician and data scientist. My main interests are in data visualization and the mathematical modelling of complex systems. Currently, I am spending most of my research time on

Curriculum Vitae

Research

Cartograms

We live in the era of “big data”. Visualizing data is an important step when summarizing information. Geographic maps are a popular means to visualize spatial data, but conventional maps often tell a misleading story. Usually, each map region is displayed with an area proportional to its actual land area.

Unfortunately, equal-area maps usually miscommunicate statistical data. In an election, for example, the land area of a region is irrelevant. Instead, a well-designed infographics should visualize the number of votes in a region.

Cartograms are infographics that rescale map areas in proportion to statistical data, for example population size or the number of votes.

I am developing techniques to calculate cartograms that

  • scale all areas correctly,
  • keep neighbouring regions adjacent,
  • make it easy to identify each region.
For our latest code (described in this PNAS paper), please visit our GitHub repository. To simplify cartogram generation, we have recently embedded this code into the web applet go-cart.io.
Cartogram of the 2016 US presidential election
Cartogram of the 2016 US presidential election. Each state is represented by an area in proportion to its number of electors. States won by the Republicans are red, those won by the Democrats blue. See our paper for more details.

Opinion formation

The voter model is a simple agent-based model to mimic opinion dynamics in social networks: a randomly chosen agent adopts the opinion of a randomly chosen neighbour. This process is repeated until a consensus emerges.

Although the basic voter model is theoretically intriguing, it misses an important feature of real opinion dynamics: it does not distinguish between an agent’s publicly expressed opinion and her inner conviction. A person may not feel comfortable declaring her conviction if her social circle appears to hold an opposing view.

I am working on an extension of the voter model, called the Concealed Voter Model, in which public and private opinions can differ. The main question is: to what extent does preference falsification slow down finding a consensus?

Explanation of the "Concealed Voter Model"
Explanation of the Concealed Voter Model. Each agent is represented by two nodes. One node stands for her externally revealed opinion (top) and another represents her internal (i.e. hidden) opinion (bottom). For example, the agent highlighted by the brown rectangle has an external opinion “red” and an internal opinion “blue”. External opinions can change because agents copy a random neighbour in the external layer with rate \(c\) and because they “externalize” (i.e. make a previously hidden opinion public) with rate \(e.\) Internal opinions can change because every agent “internalizes” (i.e. accepts her external opinion as her inner conviction) with rate \(i.\) See our paper for more details.

Publications

Click here for a complete publication list with links to full-text papers.
Below is a selection of representative publications.

Efficient cartogram generation

M. T. Gastner, V. Seguy and P. More
Fast flow-based algorithm for creating density-equalizing map projections
Proc. Natl. Acad. Sci. U.S.A. 115(10):E2156–E2164 (2018)

On conventional maps, each region is displayed with an area proportional (or at least nearly proportional) to its geographic area in square kilometres. But equal-area maps can grossly misrepresent demographic data: densely populated cities should be given more prominence than large, but sparsely populated territories. Cartograms solve this problem by rescaling map regions in proportion to, for example, population or gross domestic products. Here we describe and benchmark a fast flow-based algorithm that computes cartograms in a matter of seconds.

GDP cartogram of India
The states and union territories of India on a cartogram where the area of each region is proportional to its gross domestic product. Delhi (DL), although small in land area, appears larger on this cartogram than many other states because of its economic importance. Our algorithm only needs a few seconds on standard hardware to construct the cartogram.

Concealed voter model

M. T. Gastner, B. Oborny and M. Gulyás
Consensus time in a voter model with concealed and publicly expressed opinions
J. Stat. Mech. Theory Exp. 2018(6):063401 (2018)

In the basic voter model, it is assumed that agents can find out the true opinions of all their neighbours when they update their own opinions. Here we introduce the Concealed Voter Model which differs from the basic voter model by adding a second, concealed layer of opinions to the public layer. By analyzing the evolution of the opinions, we derive to what extent concealment slows down a consensus as a function of

  • \(c\): the rate with which agents copy external opinions,
  • \(e\): the rate of externalization,
  • \(i\): the rate of internalization.

See this figure for an explanation of \(c\), \(e\) and \(i\).

Delay of the consensus in the Concealed Voter Model
The factor \(\tau\) by which the mean consensus time in the Concealed Voter Model is prolonged compared to the “basic” voter model, where there are no concealed opinions. In this figure, we measure \(\tau\) as a function of the ratios \(\frac e c\) and \(\frac i c\). The delay is especially severe in the "zealot corner" where \(e\) is large and \(i\) is small. In this region, agents tend to be candid about their internal opinions towards the public and they maintain their internal opinions for a long time.

Cargo shipping

P. Kaluza, A. Kölzsch, M. T. Gastner and B. Blasius
The complex network of global cargo ship movements
J. Royal Soc. Interface 7(48):1093–1103 (2010)

The global network of merchant ships plays a crucial role in human mobility, the exchange of goods and the spread of invasive species. We use information about the itineraries of 16 363 cargo ships during the year 2007 to construct a network of links between ports. We show that bulk dry carriers, container ships and oil tankers differ in their mobility patterns and networks. Container ships follow regularly repeating paths whereas bulk dry carriers and oil tankers move less predictably between ports. The network of all ship movements possesses a heavy-tailed distribution with systematic differences between ship types.

Map of the global cargo ship network
The routes of all cargo ships bigger than 10 000 GT during 2007. The colour scale indicates the number of journeys along each route. Ships are assumed to travel along the shortest (geodesic) paths on water.

Diffusion cartograms

M. T. Gastner and M. E. J. Newman
Diffusion-based method for producing density-equalizing maps
Proc. Natl. Acad. Sci. U.S.A. 101(20):7499–7504 (2004)

Cartograms are maps in which the sizes of geographic regions (e.g. countries, provinces) appear in proportion to their population. Such maps are invaluable for data visualization. Unfortunately, to scale regions and still have them fit together, one is normally forced to distort the regions’ shapes, potentially resulting in maps that are difficult to read. Here we present a technique based on ideas borrowed from elementary physics that suffers from none of these drawbacks.

news cartogram of the USA
In this cartogram, states of the USA have an area proportional to the frequency with which they appeared in news stories between November 1994 and April 1998. Data from the North American News Text Supplement (Linguistic Data Consortium, Philadelphia, 1998).

Teaching

YCC1122: Quantitative Reasoning

This “Common Curriculum” course aims to develop the students’ skills in logical and statistical reasoning so that they become critical and informed readers of quantitative data. The course applies the pedagogy of Team-based Learning to ensure that students who bring diverse talents and backgrounds to the course can learn together and from each another.

Students learn to criticise and question empirical claims, support them with logical arguments and address real-life problems by gathering and visually representing quantitative data. The course teaches quantitative literacy so that students grasp how algorithmic and statistical thinking is used in the natural and social sciences.

Net worth vs. age of the 50 richest Singaporeans
Example of data analysis in Quantitative Reasoning. The data (from Forbes) show the net worth of the 50 richest Singaporeans in the year 2018 versus their age. Does their wealth depend on age? We perform a linear regression of the logarithm of net worth as a function of age. Although the least-squares regression line (blue) increases, the 95% confidence interval (grey) is too wide to conclude that the line's slope is different from zero (p-value 0.31). Hence, we fail to reject the null hypothesis that there is no relationship between age and net worth.

YSC2210: Data Analysis and Visualization (DAVis) with R

This course teaches how to use the programming language R for analyzing and presenting statistical data. Starting from the fundamentals of R (data types, flow control), students learn how to write their own R scripts and functions. They learn how to extract data from web sites and bring the input into a shape (e.g. using regular expressions) that is suitable for further analysis.

Much of the course focuses on R’s graphics features, including network representations and geographic maps. The objective is to present data in ways that are informative, elegant and fun (e.g. as short animated video clips).

Example of a visualization project in DAVis: An animation of Singapore's age pyramid between 1960 and 2017. (Data from Data.gov.sg.)

YSC3216: Stochastic Processes and Models (SPaM)

What do stock markets, the weather, genetic mutations and the movements of a drunkard have in common? All these phenomena are subject to a certain degree of randomness. Such “stochastic processes” are a vibrant area of interdisciplinary research, ranging from mathematical finance over biology to predicting waiting times in supermarket queues.

In this course, students learn the mathematics behind the most common models of stochastic processes: Markov chains, Poisson and renewal processes, queuing theory. Students learn how to prove the most important mathematical results and apply them to realistic problems.

Drunkard's walk
Example of a stochastic process: the random walk. A drunkard walks out of a pub (right). His house is on the same street (left), but he is so drunk that he keeps forgetting in which direction he has been going so far. After every step, he walks to the left and right with equal probability. How many steps does he need on average to reach home? Will he ever get there? The drunkard's walk is a basic model for many stochastic processes (e.g. share prices, population growth).

YSC4208: Monte Carlo Simulations in Science and Statistics (MoCaSinSS)

Monte Carlo simulations are computer experiments that solve numerical problems by using random number generators. At first glance, it may seem bizarre to use a computer, arguably the most accurate and deterministic of all human inventions, to perform random experiments. However, Monte Carlo simulations are nowadays an essential component in many quantitative studies. They are used in the natural sciences, industrial engineering, finance and statistics.

This course teaches how to write elegant and efficient Monte Carlo simulations for concrete real-world examples. Students also learn the theoretical foundations of pseudorandom number generators, Markov chain Monte Carlo and the Metropolis-Hastings algorithm.

Example of a Monte Carlo simulation. Buffon's needle” is an experiment to estimate the value of \(\pi\) = 3.14... We randomly drop needles onto a floor with parallel strips (vertical lines at the top). Needles that cross at least one of the strips are coloured blue. Needles that fall between two strips are shown in green. By counting the numbers of blue and green needles, we can obtain an estimate of \(\pi\). The more needles we drop, the more accurate the estimate (plotted at the bottom).

Contact

Michael T. Gastner
Yale-NUS College, Division of Science
16 College Avenue West, #01-220 Singapore 138527
michael.gastner@yale-nus.edu.sg

My office is RC3-02-05L in Cendana College (2nd floor).

Michael T. Gastner's LinkedIn profile