# Baseball data mapping ¶

An introduction of mapping baseball data using ReNom TDA.

In this tutorial, we visualize baseball data using ReNom TDA module. you can learn following points.

• How to analyse topology.

## Requirement ¶

In [1]:

import numpy as np

import pandas as pd

from renom_tda.topology import Topology
from renom_tda.lens import PCA


## Import baseball data ¶

We get 2016 baseball hitter stats from https://github.com/nyk510/baseball_dataset/tree/master/data .
And we calculate sabermetrics measurements.
• OPS(On-base Plus Slugging)
OPS = OBP + SLG
OBP = (H + BB + HBP) / (AB + BB + HBP + SF)
SLG = (1B + 2 2B + 3 3B + 4*HR) / AB
• IsoP(Isolated Power)

IsoP = SLG - AVG

• BABIP(Batting Average on Balls In Play)

BABIP = (H – HR)/(AB – K – HR + SF)

• BB/K
• PA/K
• AB/HR
• SecA(Secondary average)

SECA=(TB - H + BB + SB - CS) / AB

• TA(Total Average)

TA = ( TB + BB + HBP + SB - CS ) / ( AB - H + CS + DP )

• PS(Power-Speed-Number)

PS ＝ ( HR × SB × ２) / ( HR ＋ SB )

• RC27(Runs Created per 27 outs)
RC ＝ ( 2.4 × C ＋ A ) × ( 3 × C ＋ B ) ÷ (9 × C) － 0.9 × C
A = H + BB + HBP - CS - DP
B = TB + 0.26 ×（BB + HBP） + 0.53 × SF + 0.64 × SB - 0.03 × K
C = AB + BB + HBP + SF
In [2]:

file_path = "hitter_metrics.csv"


## Extract text data & number data ¶

We extract text data like Team Name or Player Name and number data.

In [3]:

text_data = np.array(pdata.loc[:, pdata.dtypes=="object"])
number_data = np.array(pdata.loc[:, np.logical_or(pdata.dtypes=="float", pdata.dtypes=="int")])


## Create topology instance ¶

In [4]:

topology = Topology()


If you wan’t to standardize data, you set standardize argument True.

In [5]:

topology.load_data(number_data, text_data=text_data, standardize=True)


## Create point cloud ¶

In [6]:

metric = None
lens = [PCA(components=[0,1])]
topology.fit_transform(metric=metric, lens=lens)

projected by PCA.


## Mapping to Topological Space ¶

In [7]:

topology.map(resolution=25, overlap=0.7, eps=0.3, min_samples=1)

created 145 nodes.
created 457 edges.


## Colorize & show ¶

Next, we colorize topology and show.

In [8]:

target = topology.number_data[:, 0]
topology.color(target, color_method="mean", color_type="rgb")
topology.show(fig_size=(10,10), node_size=10, edge_width=0.5)


## Search player from node value ¶

In [9]:

search_dicts = [{
"data_type": "text",
"operator": "like",
"column": 1,
"value": "大谷"
}]

target = topology.number_data[:, 0]
topology.color(target, color_method="mean", color_type="rgb")
node_index = topology.search_from_values(search_dicts=search_dicts, target=None, search_type="index")
topology.show(fig_size=(10,10), node_size=10, edge_width=0.5)


## Search team ¶

In [10]:

search_dicts = [{
"data_type": "text",
"operator": "like",
"column": 0,
"value": "ヤクルト"
}]

target = topology.number_data[:, 0]
topology.color(target, color_method="mean", color_type="rgb")
node_index = topology.search_from_values(search_dicts=search_dicts, target=None, search_type="index")
topology.show(fig_size=(10,10), node_size=10, edge_width=0.5)


## Search from input data value ¶

In [11]:

search_dicts = [{
"data_type": "number",
"operator": ">",
"column": 0,
"value": 0.9
}]

target = topology.number_data[:, 0]
topology.color(target, color_method="mean", color_type="rgb")
node_index = topology.search_from_values(search_dicts=search_dicts, target=None, search_type="index")
topology.show(fig_size=(10,10), node_size=10, edge_width=0.5)

Colored node has players that OPS larger 0.9.
You can show node id because search_from_values function return node indexes.
In [12]:

node_index

Out[12]:

[137, 138, 139, 140, 141, 142, 143, 144]


## output csv file ¶

Topology instance can create csv file from node indexes.
If text_data_columns and number_data_columns is not None, you can show output csv header with skip_header=False.
In [13]:

topology.output_csv_from_node_ids("output.csv", node_ids=node_index, skip_header=True)