# Michael T. Gastner

Assistant Professor, Yale-NUS College

I am an applied mathematician and data scientist. My main interests are in data visualization and the mathematical modelling of complex systems. Currently, I am spending most of my research time on

• developing user-friendly methods to construct cartograms and
• the mathematics of opinion formation in social networks.
Curriculum Vitae

## Research

### Cartograms

We live in the era of “big data”. Visualizing data is an important step when summarizing information. Geographic maps are a popular means to visualize spatial data, but conventional maps often tell a misleading story. Usually, each map region is displayed with an area proportional to its actual land area.

Unfortunately, equal-area maps usually miscommunicate statistical data. In an election, for example, the land area of a region is irrelevant. Instead, a well-designed infographics should visualize the number of votes in a region.

Cartograms are infographics that rescale map areas in proportion to statistical data, for example population size or the number of votes.

I am developing techniques to calculate cartograms that

• scale all areas correctly,
• make it easy to identify each region.
For our latest code (described in this PNAS paper), please visit our GitHub repository. To simplify cartogram generation, we have recently embedded this code into the web applet go-cart.io.

### Opinion formation

The voter model is a simple agent-based model to mimic opinion dynamics in social networks: a randomly chosen agent adopts the opinion of a randomly chosen neighbour. This process is repeated until a consensus emerges.

Although the basic voter model is theoretically intriguing, it misses an important feature of real opinion dynamics: it does not distinguish between an agent’s publicly expressed opinion and her inner conviction. A person may not feel comfortable declaring her conviction if her social circle appears to hold an opposing view.

I am working on an extension of the voter model, called the Concealed Voter Model, in which public and private opinions can differ. The main question is: to what extent does preference falsification slow down finding a consensus?

## Publications

Below is a selection of representative publications.

### Efficient cartogram generation

M. T. Gastner, V. Seguy and P. More
Fast flow-based algorithm for creating density-equalizing map projections
Proc. Natl. Acad. Sci. U.S.A. 115(10):E2156–E2164 (2018)

On conventional maps, each region is displayed with an area proportional (or at least nearly proportional) to its geographic area in square kilometres. But equal-area maps can grossly misrepresent demographic data: densely populated cities should be given more prominence than large, but sparsely populated territories. Cartograms solve this problem by rescaling map regions in proportion to, for example, population or gross domestic products. Here we describe and benchmark a fast flow-based algorithm that computes cartograms in a matter of seconds.

### Concealed voter model

M. T. Gastner, B. Oborny and M. Gulyás
Consensus time in a voter model with concealed and publicly expressed opinions
J. Stat. Mech. Theory Exp. 2018(6):063401 (2018)

In the basic voter model, it is assumed that agents can find out the true opinions of all their neighbours when they update their own opinions. Here we introduce the Concealed Voter Model which differs from the basic voter model by adding a second, concealed layer of opinions to the public layer. By analyzing the evolution of the opinions, we derive to what extent concealment slows down a consensus as a function of

• $$c$$: the rate with which agents copy external opinions,
• $$e$$: the rate of externalization,
• $$i$$: the rate of internalization.

See this figure for an explanation of $$c$$, $$e$$ and $$i$$.

### Cargo shipping

P. Kaluza, A. Kölzsch, M. T. Gastner and B. Blasius
The complex network of global cargo ship movements
J. Royal Soc. Interface 7(48):1093–1103 (2010)

The global network of merchant ships plays a crucial role in human mobility, the exchange of goods and the spread of invasive species. We use information about the itineraries of 16 363 cargo ships during the year 2007 to construct a network of links between ports. We show that bulk dry carriers, container ships and oil tankers differ in their mobility patterns and networks. Container ships follow regularly repeating paths whereas bulk dry carriers and oil tankers move less predictably between ports. The network of all ship movements possesses a heavy-tailed distribution with systematic differences between ship types.

### Diffusion cartograms

M. T. Gastner and M. E. J. Newman
Diffusion-based method for producing density-equalizing maps
Proc. Natl. Acad. Sci. U.S.A. 101(20):7499–7504 (2004)

Cartograms are maps in which the sizes of geographic regions (e.g. countries, provinces) appear in proportion to their population. Such maps are invaluable for data visualization. Unfortunately, to scale regions and still have them fit together, one is normally forced to distort the regions’ shapes, potentially resulting in maps that are difficult to read. Here we present a technique based on ideas borrowed from elementary physics that suffers from none of these drawbacks.

## Teaching

### YCC1122: Quantitative Reasoning

This “Common Curriculum” course aims to develop the students’ skills in logical and statistical reasoning so that they become critical and informed readers of quantitative data. The course applies the pedagogy of Team-based Learning to ensure that students who bring diverse talents and backgrounds to the course can learn together and from each another.

Students learn to criticise and question empirical claims, support them with logical arguments and address real-life problems by gathering and visually representing quantitative data. The course teaches quantitative literacy so that students grasp how algorithmic and statistical thinking is used in the natural and social sciences.

### YSC2210: Data Analysis and Visualization (DAVis) with R

This course teaches how to use the programming language R for analyzing and presenting statistical data. Starting from the fundamentals of R (data types, flow control), students learn how to write their own R scripts and functions. They learn how to extract data from web sites and bring the input into a shape (e.g. using regular expressions) that is suitable for further analysis.

Much of the course focuses on R’s graphics features, including network representations and geographic maps. The objective is to present data in ways that are informative, elegant and fun (e.g. as short animated video clips).

Example of a visualization project in DAVis: An animation of Singapore's age pyramid between 1960 and 2017. (Data from Data.gov.sg.)

### YSC3216: Stochastic Processes and Models (SPaM)

What do stock markets, the weather, genetic mutations and the movements of a drunkard have in common? All these phenomena are subject to a certain degree of randomness. Such “stochastic processes” are a vibrant area of interdisciplinary research, ranging from mathematical finance over biology to predicting waiting times in supermarket queues.

In this course, students learn the mathematics behind the most common models of stochastic processes: Markov chains, Poisson and renewal processes, queuing theory. Students learn how to prove the most important mathematical results and apply them to realistic problems.

### YSC4208: Monte Carlo Simulations in Science and Statistics (MoCaSinSS)

Monte Carlo simulations are computer experiments that solve numerical problems by using random number generators. At first glance, it may seem bizarre to use a computer, arguably the most accurate and deterministic of all human inventions, to perform random experiments. However, Monte Carlo simulations are nowadays an essential component in many quantitative studies. They are used in the natural sciences, industrial engineering, finance and statistics.

This course teaches how to write elegant and efficient Monte Carlo simulations for concrete real-world examples. Students also learn the theoretical foundations of pseudorandom number generators, Markov chain Monte Carlo and the Metropolis-Hastings algorithm.

Example of a Monte Carlo simulation. Buffon's needle” is an experiment to estimate the value of $$\pi$$ = 3.14... We randomly drop needles onto a floor with parallel strips (vertical lines at the top). Needles that cross at least one of the strips are coloured blue. Needles that fall between two strips are shown in green. By counting the numbers of blue and green needles, we can obtain an estimate of $$\pi$$. The more needles we drop, the more accurate the estimate (plotted at the bottom).

## Contact

Michael T. Gastner
Yale-NUS College, Division of Science
16 College Avenue West, #01-220 Singapore 138527
michael.gastner@yale-nus.edu.sg

My office is RC3-02-05L in Cendana College (2nd floor).