Boston House Price Mapping

An introduction of Mapping boston house price dataset by ReNom TDA.

In this tutorial, we visualize boston house price dataset. you can learn following points.

  • How to analyse topology.

Requirement

In [1]:
import numpy as np

from sklearn.cluster import DBSCAN
from sklearn.datasets import load_boston

from renom.tda.topology import Topology
from renom.tda.lens import PCA

Import boston house price dataset

Next, we have to load boston house price data. To accomplish this, we’ll use the load_boston module included in the scikit-learn package.

The boston house price dataset consists of 506 data and data has 13 columns.

13 columns + target value is following.

CRIM - per capita crime rate by town

ZN - proportion of residential land zoned for lots over 25,000 sq.ft.

INDUS - proportion of non-retail business acres per town.

CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)

NOX - nitric oxides concentration (parts per 10 million)

RM - average number of rooms per dwelling

AGE - proportion of owner-occupied units built prior to 1940

DIS - weighted distances to five Boston employment centres

RAD - index of accessibility to radial highways

TAX - full-value property-tax rate per $10,000

PTRATIO - pupil-teacher ratio by town

B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

LSTAT - lower status of the population

target - median value of owner-occupied homes

In [2]:
bos = load_boston()
target = bos.target
data = np.concatenate([bos.data, bos.target.reshape(-1,1)], axis=1)
data = (data - np.mean(data, axis=0)) / np.std(data, axis=0)

Create topology instance

In [3]:
topology = Topology()

Create point cloud

In [4]:
metric = None
lens = [PCA(components=[0,1])]
topology.fit_transform(data, metric=metric, lens=lens)
projected by PCA.
finish fit_transform.

Mapping to topological space

In [5]:
clusterer = DBSCAN(eps=25, min_samples=1)
topology.map(resolution=25, overlap=0.5, clusterer=clusterer)
mapping start, please wait...
created 304 nodes.
calculating cluster coordination.
calculating edge.
created 870 edges.

Color topology & show

In [6]:
print("colored by target value.")
topology.color(target, dtype="numerical", ctype="rgb")
topology.show(fig_size=(10, 10), node_size=5, edge_width=1, mode=None, strength=None)

for i in range(len(bos.feature_names)):
    print("colored by %s." % bos.feature_names[i])
    topology.color(data[:, i], dtype="numerical", ctype="rgb")
    topology.show(fig_size=(10, 10), node_size=5, edge_width=1, mode=None, strength=None)
colored by target value.
../../../_images/notebooks_tda-case-study_boston-house-price-mapping_notebook_12_1.png
colored by CRIM.
../../../_images/notebooks_tda-case-study_boston-house-price-mapping_notebook_12_3.png
colored by ZN.
../../../_images/notebooks_tda-case-study_boston-house-price-mapping_notebook_12_5.png
colored by INDUS.
../../../_images/notebooks_tda-case-study_boston-house-price-mapping_notebook_12_7.png
colored by CHAS.
../../../_images/notebooks_tda-case-study_boston-house-price-mapping_notebook_12_9.png
colored by NOX.
../../../_images/notebooks_tda-case-study_boston-house-price-mapping_notebook_12_11.png
colored by RM.
../../../_images/notebooks_tda-case-study_boston-house-price-mapping_notebook_12_13.png
colored by AGE.
../../../_images/notebooks_tda-case-study_boston-house-price-mapping_notebook_12_15.png
colored by DIS.
../../../_images/notebooks_tda-case-study_boston-house-price-mapping_notebook_12_17.png
colored by RAD.
../../../_images/notebooks_tda-case-study_boston-house-price-mapping_notebook_12_19.png
colored by TAX.
../../../_images/notebooks_tda-case-study_boston-house-price-mapping_notebook_12_21.png
colored by PTRATIO.
../../../_images/notebooks_tda-case-study_boston-house-price-mapping_notebook_12_23.png
colored by B.
../../../_images/notebooks_tda-case-study_boston-house-price-mapping_notebook_12_25.png
colored by LSTAT.
../../../_images/notebooks_tda-case-study_boston-house-price-mapping_notebook_12_27.png

conclusion

This graph shows that boston house price coefficient with RM, PTRATIO, LSTAT.