Boston House Price Mapping

An introduction of Mapping boston house price dataset by ReNom TDA.

In this tutorial, we visualize boston house price dataset. you can learn following points.

  • How to analyse topology.


In [1]:
import numpy as np

from sklearn.cluster import DBSCAN
from sklearn.datasets import load_boston

from renom.tda.topology import Topology
from renom.tda.lens import PCA

Import boston house price dataset

Next, we have to load boston house price data. To accomplish this, we’ll use the load_boston module included in the scikit-learn package.

The boston house price dataset consists of 506 data and data has 13 columns.

13 columns + target value is following.

CRIM - per capita crime rate by town

ZN - proportion of residential land zoned for lots over 25,000 sq.ft.

INDUS - proportion of non-retail business acres per town.

CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)

NOX - nitric oxides concentration (parts per 10 million)

RM - average number of rooms per dwelling

AGE - proportion of owner-occupied units built prior to 1940

DIS - weighted distances to five Boston employment centres

RAD - index of accessibility to radial highways

TAX - full-value property-tax rate per $10,000

PTRATIO - pupil-teacher ratio by town

B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

LSTAT - lower status of the population

target - median value of owner-occupied homes

In [2]:
bos = load_boston()
target =
data = np.concatenate([,,1)], axis=1)
data = (data - np.mean(data, axis=0)) / np.std(data, axis=0)

Create topology instance

In [3]:
topology = Topology()

Create point cloud

In [4]:
metric = None
lens = [PCA(components=[0,1])]
topology.fit_transform(data, metric=metric, lens=lens)
projected by PCA.
finish fit_transform.

Mapping to topological space

In [5]:
clusterer = DBSCAN(eps=25, min_samples=1), overlap=0.5, clusterer=clusterer)
mapping start, please wait...
created 304 nodes.
calculating cluster coordination.
calculating edge.
created 870 edges.

Color topology & show

In [6]:
print("colored by target value.")
topology.color(target, dtype="numerical", ctype="rgb"), 10), node_size=5, edge_width=1, mode=None, strength=None)

for i in range(len(bos.feature_names)):
    print("colored by %s." % bos.feature_names[i])
    topology.color(data[:, i], dtype="numerical", ctype="rgb"), 10), node_size=5, edge_width=1, mode=None, strength=None)
colored by target value.
colored by CRIM.
colored by ZN.
colored by INDUS.
colored by CHAS.
colored by NOX.
colored by RM.
colored by AGE.
colored by DIS.
colored by RAD.
colored by TAX.
colored by PTRATIO.
colored by B.
colored by LSTAT.


This graph shows that boston house price coefficient with RM, PTRATIO, LSTAT.