# Search Categorical Data ¶

An introduction of searching topology.

In this tutorial, we visualize iris dataset and search category. you can learn following points.

- How to create topology using ReNom TDA module.
- How to search topology with categorical data.

## Requirements ¶

```
In [1]:
```

```
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn.datasets import load_iris
from renom.tda.topology import SearchableTopology
from renom.tda.lens import PCA
```

## Dataset ¶

Next, we have to load iris dataset. To accomplish this, we’ll use the
```
load_iris
```

module included in the scikit-learn package.

The iris dataset consists of 150 data and data has 4 columns.

```
In [2]:
```

```
iris = load_iris()
data = iris.data
target = iris.target
```

## Create label data ¶

We create categorical data.

In this tutorial, use name of iris species.

```
In [3]:
```

```
setosa = ["setosa"] * 50
versicolor = ["versicolor"] * 50
versinica = ["versinica"] * 50
species = setosa + versicolor + versinica
```

## Define topology instance ¶

Next, we have to define topology instance.

ReNom TDA has two type of Topology class, Topology and SearchableTopology.

Topology class has basic four function, fit_transform, map, color and show.

SearchableTopology class extend Topology class. It has more three function, regist_categorical_data, search and get_hypercubes.

SearchableTopology can search data from categorical data.

If you want more information, see API refarence.

```
In [4]:
```

```
topology = SearchableTopology()
```

## Regist categorical data ¶

Next, we regist categorical data to topology instance.

We can search this data.

```
In [5]:
```

```
topology.regist_categorical_data(np.array(species).reshape(-1, 1))
```

```
In [6]:
```

```
topology.categorical_data[::50]
```

```
Out[6]:
```

```
array([['setosa'],
['versicolor'],
['versinica']],
dtype='<U10')
```

## Create point cloud ¶

Next, we create point cloud that is projected on 2 or 3 dimention space.

We use fit_transform function to project data with two parameter, metric and lens.

Metric is how to measure distance between data. Lens is the axis of projected space.

This tutorial use metric None and lens PCA. This means dimenstion reduction with normal PCA.

```
In [7]:
```

```
metric = None
lens = [PCA(components=[0, 1])]
topology.fit_transform(data, metric=metric, lens=lens)
```

```
projected by PCA.
finish fit_transform.
```

## Mapping to topological space ¶

Next, we create topology.

We use map function to map point cloud to topological space.

We set three parameter, resolution, overlap and clusterer.

Resolution means the number of division. It effects the number of nodes.

Overlap means the easiness to connect with each nodes.

Clusterer means the clustering method for data that is in nodes.

```
In [8]:
```

```
clusterer = DBSCAN(eps=0.5, min_samples=3)
topology.map(resolution=15, overlap=0.5, clusterer=clusterer)
```

```
mapping start, please wait...
created 64 nodes.
calculating cluster coordination.
calculating edge.
created 160 edges.
```

## Color topology ¶

Next, we colorize topology using color funcion.

In this tutorial, topology is colored by iris label values.

We can select dtype is categorical or numerical and ctype is rgb or gray.

```
In [9]:
```

```
topology.color(target, dtype="categorical", ctype="rgb", normalized=False)
topology.show(fig_size=(10, 10), node_size=10, edge_width=2)
```

## Search node & show ¶

Next, we search node that has “setosa” data.

Nodes that contains “setosa” is colored by target value.

```
In [10]:
```

```
topology.search("setosa")
topology.show(fig_size=(10, 10), node_size=10, edge_width=2)
```

```
setosa is in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] data.
setosa is in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22] node.
```