11. Mosiac Plots

Mosiac plots are a great way to juxtapose and configure multiple plots together. Let’s see how they work.

11.1. Housing data

For this example, we will use the housing data.

[1]:
from sklearn.datasets import fetch_california_housing

X, y = fetch_california_housing(return_X_y=True, as_frame=True)
[2]:
X.shape, y.shape
[2]:
((20640, 8), (20640,))

11.2. Mosiac plot by string configuration

You can configure a mosaic plot by using codes for each subplot. Here, we want 3 subplots coded a, b and c. Note how a and b take up one row and one column while c takes up two rows and one column.

[3]:
import matplotlib.pyplot as plt
import seaborn as sns

fig = plt.figure(layout='constrained', figsize=(5, 3.5))
ax = fig.subplot_mosaic('''
ac
bc
''')

X['HouseAge'].plot(kind='kde', ax=ax['a'])
sns.kdeplot(X['HouseAge'], ax=ax['b'], cumulative=True, color='g')
X['HouseAge'].plot(kind='box', ax=ax['c'])

ax['a'].set_title('PDF')
ax['b'].set_title('CDF')
[3]:
Text(0.5, 1.0, 'CDF')
_images/plot-mosaic_5_1.png

11.3. Mosiac plot by list configuration

We can also define the subplots more meaningfully by using names placed into a matrix (list of list). Instead of a, b and c, we now use pdf, cdf and box, correspondingly.

[4]:
fig = plt.figure(layout='constrained', figsize=(5, 3.5))
ax = fig.subplot_mosaic([
    ['pdf', 'box'],
    ['cdf', 'box']
])

X['HouseAge'].plot(kind='kde', ax=ax['pdf'])
sns.kdeplot(X['HouseAge'], ax=ax['cdf'], cumulative=True, color='g')
X['HouseAge'].plot(kind='box', ax=ax['box'])

ax['pdf'].set_title('PDF')
ax['cdf'].set_title('CDF')
[4]:
Text(0.5, 1.0, 'CDF')
_images/plot-mosaic_7_1.png

11.4. Fancy example

Here’s a fancy example of a mosaic plot where we make multiple scatter plots. The plots in the upper triangle color code the scatter plots according to y (if the corresponding y of the scatter plot point is above, red, or below, blue, the mean of y). The plots in the lower triangle color code the scatter plots according to y with a different criterion (if the corresponding y of the scatter plot point is below one standard deviation of y, blue, or not, red).

[5]:
def get_key(r, c, x_col, y_col):
    if r == c:
        return '.'
    if c < r:
        return f'{x_col}_{y_col}_lower'
    return f'{x_col}_{y_col}_upper'

def do_plot(r, c, x_col, y_col, ax):
    k = get_key(r, c, x_col, y_col)
    if '.' == k:
        return
    elif c < r:
        X[y > y.mean() - y.std()].plot(kind='scatter', y=x_col, x=y_col, ax=ax[k], s=1, color='r')
        X[y <= y.mean() - y.std()].plot(kind='scatter', y=x_col, x=y_col, ax=ax[k], s=1, color='b')
    else:
        X[y > y.mean()].plot(kind='scatter', x=x_col, y=y_col, ax=ax[k], s=1, color='r')
        X[y <= y.mean() - y.std()].plot(kind='scatter', x=x_col, y=y_col, ax=ax[k], s=1, color='b')

mosiac = [[get_key(r, c, x_col, y_col) for c, y_col in enumerate(X.columns)] for r, x_col in enumerate(X.columns)]

fig = plt.figure(layout='constrained', figsize=(20, 20))
ax = fig.subplot_mosaic(mosiac)

_ = [[do_plot(r, c, x_col, y_col, ax) for c, y_col in enumerate(X.columns)] for r, x_col in enumerate(X.columns)]
/opt/anaconda3/lib/python3.9/site-packages/pandas/plotting/_matplotlib/core.py:1114: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  scatter = ax.scatter(
_images/plot-mosaic_9_1.png