Boston House Price Mapping

An introduction of Mapping boston house price dataset by ReNom TDA.

In this tutorial, we visualize boston house price dataset. you can learn following points.

  • How to analyse topology.

Requirement

In [1]:
import numpy as np

from sklearn.datasets import load_boston

from renom_tda.topology import Topology
from renom_tda.lens import PCA

Import boston house price dataset

Next, we have to load boston house price data. To accomplish this, we’ll use the load_boston module included in the scikit-learn package.

The boston house price dataset consists of 506 data and data has 13 columns.

13 columns + target value is following.

CRIM - per capita crime rate by town

ZN - proportion of residential land zoned for lots over 25,000 sq.ft.

INDUS - proportion of non-retail business acres per town.

CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)

NOX - nitric oxides concentration (parts per 10 million)

RM - average number of rooms per dwelling

AGE - proportion of owner-occupied units built prior to 1940

DIS - weighted distances to five Boston employment centres

RAD - index of accessibility to radial highways

TAX - full-value property-tax rate per $10,000

PTRATIO - pupil-teacher ratio by town

B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

LSTAT - lower status of the population

target - median value of owner-occupied homes

In [2]:
bos = load_boston()
target = bos.target
data = bos.data

Create topology instance

In [3]:
topology = Topology()

Load data

In [4]:
columns = ["CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD", "TAX", "PTRATIO", "B", "LSTAT"]
topology.load_data(data, number_data_columns=columns, standardize=True)

Create point cloud

In [5]:
metric = None
lens = [PCA(components=[0,1])]
topology.fit_transform(metric=metric, lens=lens)
projected by PCA.

Mapping to topological space

In [6]:
topology.map(resolution=25, overlap=0.5, eps=0.3, min_samples=1)
created 275 nodes.
created 711 edges.

Color topology & show

In [13]:
print("colored by target values.")
topology.color(target, color_method="mean", color_type="rgb")
topology.show(fig_size=(10,10), node_size=10, edge_width=0.5)
colored by target values.
notebooks/tda-case-study/boston-house-price-mapping/../../../../../../../home/grid00/repositories/ReNom/doc/_build/html/.doctrees/nbsphinx/notebooks_tda-case-study_boston-house-price-mapping_notebook_14_1.png
In [12]:
print("colored by RM values.")
topology.color(topology.number_data[:, 5], color_method="mean", color_type="rgb")
topology.show(fig_size=(10,10), node_size=10, edge_width=0.5)
colored by RM values.
notebooks/tda-case-study/boston-house-price-mapping/../../../../../../../home/grid00/repositories/ReNom/doc/_build/html/.doctrees/nbsphinx/notebooks_tda-case-study_boston-house-price-mapping_notebook_15_1.png
Above two images have neary color pattern.
It means RM values are correlate to target values.

Search from values

In [10]:
search_dicts = [{
    "data_type": "number",
    "operator": ">",
    "column": "target",
    "value": 24
}]

topology.color(target, color_method="mean", color_type="rgb")
node_index = topology.search_from_values(search_dicts=search_dicts, target=target, search_type="column")
topology.show(fig_size=(10,10), node_size=10, edge_width=0.5)
notebooks/tda-case-study/boston-house-price-mapping/../../../../../../../home/grid00/repositories/ReNom/doc/_build/html/.doctrees/nbsphinx/notebooks_tda-case-study_boston-house-price-mapping_notebook_18_0.png