# Boston House Price Mapping ¶

An introduction of Mapping boston house price dataset by ReNom TDA.

In this tutorial, we visualize boston house price dataset. you can learn following points.

• How to analyse topology.

## Requirement ¶

In [1]:

import numpy as np

from renom_tda.topology import Topology
from renom_tda.lens import PCA


## Import boston house price dataset ¶

Next, we have to load boston house price data. To accomplish this, we’ll use the  load_boston  module included in the scikit-learn package.

The boston house price dataset consists of 506 data and data has 13 columns.

13 columns + target value is following.

CRIM - per capita crime rate by town

ZN - proportion of residential land zoned for lots over 25,000 sq.ft.

INDUS - proportion of non-retail business acres per town.

CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)

NOX - nitric oxides concentration (parts per 10 million)

RM - average number of rooms per dwelling

AGE - proportion of owner-occupied units built prior to 1940

DIS - weighted distances to five Boston employment centres

TAX - full-value property-tax rate per \$10,000

PTRATIO - pupil-teacher ratio by town

B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

LSTAT - lower status of the population

target - median value of owner-occupied homes

In [2]:

bos = load_boston()
target = bos.target
data = bos.data


## Create topology instance ¶

In [3]:

topology = Topology()


In [4]:

columns = ["CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD", "TAX", "PTRATIO", "B", "LSTAT"]


## Create point cloud ¶

In [5]:

metric = None
lens = [PCA(components=[0,1])]
topology.fit_transform(metric=metric, lens=lens)

projected by PCA.


## Mapping to topological space ¶

In [6]:

topology.map(resolution=25, overlap=0.5, eps=0.3, min_samples=1)

created 275 nodes.
created 711 edges.


## Color topology & show ¶

In [13]:

print("colored by target values.")
topology.color(target, color_method="mean", color_type="rgb")
topology.show(fig_size=(10,10), node_size=10, edge_width=0.5)

colored by target values.

In [12]:

print("colored by RM values.")
topology.color(topology.number_data[:, 5], color_method="mean", color_type="rgb")
topology.show(fig_size=(10,10), node_size=10, edge_width=0.5)

colored by RM values.

Above two images have neary color pattern.
It means RM values are correlate to target values.

## Search from values ¶

In [10]:

search_dicts = [{
"data_type": "number",
"operator": ">",
"column": "target",
"value": 24
}]

topology.color(target, color_method="mean", color_type="rgb")
node_index = topology.search_from_values(search_dicts=search_dicts, target=target, search_type="column")
topology.show(fig_size=(10,10), node_size=10, edge_width=0.5)