# How to clustering Point Cloud ¶

An introduction of clustering point cloud data.

In this tutorial, we visualize iris dataset and clustering point cloud.

• How to clustering point cloud.

## Requirements ¶

In [1]:

import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans, DBSCAN
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

from renom_tda.topology import Topology
from renom_tda.lens import PCA


## Dataset ¶

Next, we have to load iris dataset. To accomplish this, we'll use the  load_iris  module included in the scikit-learn package.

The iris dataset consists of 150 data and data has 4 columns.

In [2]:

iris = load_iris()

data = iris.data
target = iris.target


## Define topology instance ¶

Next, we have to define topology instance.

In [3]:

topology = Topology()


In [4]:

topology.load_data(data)


## Create point cloud ¶

Next, we create point cloud that is projected on 2 or 3 dimention space.

We use fit_transform function to project data with two parameter, metric and lens.

Metric is how to measure distance between data. Lens is the axis of projected space.

This tutorial use metric None and lens PCA.

In [5]:

metric = None
lens = [PCA(components=[0, 1])]
topology.fit_transform(metric=metric, lens=lens)

projected by PCA.


## Colorize point cloud ¶

Next, we colorize point cloud using color_point_cloud funcion.

In [6]:

topology.color_point_cloud(target, normalize=True)


## Unsupervised clustering point cloud ¶

Next, we clustering point cloud.
If you use unsupervised_clustering_point_cloud method, clusterer argument is clusering method's class that have "fit" function.
In [7]:

clusterer = KMeans(n_clusters=3)
topology.unsupervised_clustering_point_cloud(clusterer=clusterer)
topology.show_point_cloud()


another case

In [8]:

clusterer = DBSCAN(eps=0.1, min_samples=2)
topology.unsupervised_clustering_point_cloud(clusterer=clusterer)
topology.show_point_cloud()


## Supervised clustering point cloud ¶

If you use supervised_clustering_point_cloud method, clusterer argument is clusering method's class that have "fit" & "predict" function.

In [9]:

clusterer = KNeighborsClassifier(n_neighbors=3)
topology.supervised_clustering_point_cloud(clusterer=clusterer, target=target, train_size=0.8)
topology.show_point_cloud()

In [10]:

clusterer = SVC()
topology.supervised_clustering_point_cloud(clusterer=clusterer, target=target, train_size=0.8)
topology.show_point_cloud()

In [11]:

clusterer = RandomForestClassifier()
topology.supervised_clustering_point_cloud(clusterer=clusterer, target=target, train_size=0.8)
topology.show_point_cloud()


## True data ¶

In [12]:

topology.color_point_cloud(target, normalize=True)
topology.show_point_cloud()


## Show train test index ¶

In [13]:

topology.train_index

Out[13]:

array([  0,   1,   2,   3,   6,   7,   8,   9,  11,  13,  14,  16,  17,
18,  19,  20,  21,  23,  24,  25,  26,  27,  28,  30,  31,  32,
33,  34,  35,  36,  38,  39,  40,  41,  42,  43,  45,  46,  47,
49,  51,  52,  53,  55,  56,  57,  59,  60,  61,  63,  64,  65,
66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
79,  80,  81,  82,  83,  85,  87,  89,  90,  92,  93,  94,  95,
96,  97,  98,  99, 100, 101, 103, 105, 106, 108, 109, 111, 112,
113, 114, 115, 116, 117, 119, 120, 121, 122, 124, 125, 126, 127,
128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 142,
143, 144, 147])

In [14]:

topology.test_index

Out[14]:

array([  4,   5,  10,  12,  15,  22,  29,  37,  44,  48,  50,  54,  58,
62,  84,  86,  88,  91, 102, 104, 107, 110, 118, 123, 140, 141,
145, 146, 148, 149])