Boston House Price Mapping ¶
An introduction of Mapping boston house price dataset by ReNom TDA.
In this tutorial, we visualize boston house price dataset. you can learn following points.
- How to analyse topology.
import numpy as np from sklearn.cluster import DBSCAN from sklearn.datasets import load_boston from renom.tda.topology import Topology from renom.tda.lens import PCA
Import boston house price dataset ¶
Next, we have to load boston house price data. To accomplish this, we’ll
module included in the scikit-learn package.
The boston house price dataset consists of 506 data and data has 13 columns.
13 columns + target value is following.
CRIM - per capita crime rate by town
ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS - proportion of non-retail business acres per town.
CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
NOX - nitric oxides concentration (parts per 10 million)
RM - average number of rooms per dwelling
AGE - proportion of owner-occupied units built prior to 1940
DIS - weighted distances to five Boston employment centres
RAD - index of accessibility to radial highways
TAX - full-value property-tax rate per $10,000
PTRATIO - pupil-teacher ratio by town
B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT - lower status of the population
target - median value of owner-occupied homes
bos = load_boston() target = bos.target data = np.concatenate([bos.data, bos.target.reshape(-1,1)], axis=1) data = (data - np.mean(data, axis=0)) / np.std(data, axis=0)
Create topology instance ¶
topology = Topology()
Create point cloud ¶
metric = None lens = [PCA(components=[0,1])] topology.fit_transform(data, metric=metric, lens=lens)
projected by PCA. finish fit_transform.
Mapping to topological space ¶
clusterer = DBSCAN(eps=25, min_samples=1) topology.map(resolution=25, overlap=0.5, clusterer=clusterer)
mapping start, please wait... created 304 nodes. calculating cluster coordination. calculating edge. created 870 edges.
Color topology & show ¶
print("colored by target value.") topology.color(target, dtype="numerical", ctype="rgb") topology.show(fig_size=(10, 10), node_size=5, edge_width=1, mode=None, strength=None) for i in range(len(bos.feature_names)): print("colored by %s." % bos.feature_names[i]) topology.color(data[:, i], dtype="numerical", ctype="rgb") topology.show(fig_size=(10, 10), node_size=5, edge_width=1, mode=None, strength=None)
colored by target value.
colored by CRIM.
colored by ZN.
colored by INDUS.
colored by CHAS.
colored by NOX.
colored by RM.
colored by AGE.
colored by DIS.
colored by RAD.
colored by TAX.
colored by PTRATIO.
colored by B.
colored by LSTAT.
This graph shows that boston house price coefficient with RM, PTRATIO, LSTAT.