How to clustering Point Cloud

An introduction of clustering point cloud data.

In this tutorial, we visualize iris dataset and clustering point cloud.

  • How to clustering point cloud.

Requirements

In [1]:
import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import load_iris
from sklearn.cluster import KMeans, DBSCAN
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

from renom_tda.topology import Topology
from renom_tda.lens import PCA

Dataset

Next, we have to load iris dataset. To accomplish this, we'll use the load_iris module included in the scikit-learn package.

The iris dataset consists of 150 data and data has 4 columns.

In [2]:
iris = load_iris()

data = iris.data
target = iris.target

Define topology instance

Next, we have to define topology instance.

In [3]:
topology = Topology()

Load data

Next, we load data.
We use load_data function to load data in topology instance.
In [4]:
topology.load_data(data)

Create point cloud

Next, we create point cloud that is projected on 2 or 3 dimention space.

We use fit_transform function to project data with two parameter, metric and lens.

Metric is how to measure distance between data. Lens is the axis of projected space.

This tutorial use metric None and lens PCA.

In [5]:
metric = None
lens = [PCA(components=[0, 1])]
topology.fit_transform(metric=metric, lens=lens)
projected by PCA.

Colorize point cloud

Next, we colorize point cloud using color_point_cloud funcion.

In [6]:
topology.color_point_cloud(target, normalize=True)

Unsupervised clustering point cloud

Next, we clustering point cloud.
If you use unsupervised_clustering_point_cloud method, clusterer argument is clusering method's class that have "fit" function.
In [7]:
clusterer = KMeans(n_clusters=3)
topology.unsupervised_clustering_point_cloud(clusterer=clusterer)
topology.show_point_cloud()
../../../_images/notebooks_tda_how-to-clustering-point-cloud_notebook_14_0.png

another case

In [8]:
clusterer = DBSCAN(eps=0.1, min_samples=2)
topology.unsupervised_clustering_point_cloud(clusterer=clusterer)
topology.show_point_cloud()
../../../_images/notebooks_tda_how-to-clustering-point-cloud_notebook_16_0.png

Supervised clustering point cloud

If you use supervised_clustering_point_cloud method, clusterer argument is clusering method's class that have "fit" & "predict" function.

In [9]:
clusterer = KNeighborsClassifier(n_neighbors=3)
topology.supervised_clustering_point_cloud(clusterer=clusterer, target=target, train_size=0.8)
topology.show_point_cloud()
../../../_images/notebooks_tda_how-to-clustering-point-cloud_notebook_18_0.png
In [10]:
clusterer = SVC()
topology.supervised_clustering_point_cloud(clusterer=clusterer, target=target, train_size=0.8)
topology.show_point_cloud()
../../../_images/notebooks_tda_how-to-clustering-point-cloud_notebook_19_0.png
In [11]:
clusterer = RandomForestClassifier()
topology.supervised_clustering_point_cloud(clusterer=clusterer, target=target, train_size=0.8)
topology.show_point_cloud()
../../../_images/notebooks_tda_how-to-clustering-point-cloud_notebook_20_0.png

True data

In [12]:
topology.color_point_cloud(target, normalize=True)
topology.show_point_cloud()
../../../_images/notebooks_tda_how-to-clustering-point-cloud_notebook_22_0.png

Show train test index

In [13]:
topology.train_index
Out[13]:
array([  0,   1,   2,   3,   6,   7,   8,   9,  11,  13,  14,  16,  17,
        18,  19,  20,  21,  23,  24,  25,  26,  27,  28,  30,  31,  32,
        33,  34,  35,  36,  38,  39,  40,  41,  42,  43,  45,  46,  47,
        49,  51,  52,  53,  55,  56,  57,  59,  60,  61,  63,  64,  65,
        66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
        79,  80,  81,  82,  83,  85,  87,  89,  90,  92,  93,  94,  95,
        96,  97,  98,  99, 100, 101, 103, 105, 106, 108, 109, 111, 112,
       113, 114, 115, 116, 117, 119, 120, 121, 122, 124, 125, 126, 127,
       128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 142,
       143, 144, 147])
In [14]:
topology.test_index
Out[14]:
array([  4,   5,  10,  12,  15,  22,  29,  37,  44,  48,  50,  54,  58,
        62,  84,  86,  88,  91, 102, 104, 107, 110, 118, 123, 140, 141,
       145, 146, 148, 149])