Utilize the data of Oracle Database for AI development

Utilize data stored in the database software most adopted by companies for custom AI development

In order to convey useful information to people who aim for AI development within the company, I will leave it as a reminder log of technical points I noticed and release it for use.

Keywords: Oracle Database, Pyhon, SQL, AI development, cx_Oracle

Overview

We will use the relational database which is the company's master data store effectively and speed up AI development.

Image of conventional AI development

· According to the task, extract data from the relational database as the storage destination of master data, convert it to CSV file

· Save CSV file on the AI development environment

· AI development using Python know-how

Conventional problem

Because the requirements for the AI development environment rapidly change, I think that Docker is often used. However, Docker will need you not to save the data inside the container. For this reason, combining AI development through access to a relational database via a network is an important technique for efficient AI development.

· Data extraction, placement of CSV file, etc.

· Studying Python and preparing data preprocessing

Solving conventional problems

From Python programs, direct IO to relational database data.

· Simple IO from Python program on Docker container

· Eliminate the complexity of storage management using Docker

· Preprocessing of AI data using knowledge of common SQL sentences

Prerequisites

As a relational database, an Oracle Database with high shir within the enterprise

· The version of each software is determined by referring to "ReNom" of AI software GRID company he usually uses

· In the case of a driver, decide your own condition from the usage trend at the time of Web search

environment

· Information on the survivor of the titanic of kaggle is used as a sample. Download "train.csv" from Data tag

[Data Source] ( https://www.kaggle.com/c/titanic )

· Create a table on Oracle named "titanic_train" that can be accessed via the network and place the data

· Imported downloaded data in advance

Container environment

· Ubunts 16.04 LTS · Python: 3.6 · pandas: 0.20.3 · cx_Oracle: 6.1 ( https://qiita.com/domekichi/items/5ee06a9421e3e90d0897 )

sample

Install the Oracle client and cx_Orcle to the Jupyter server.

Import cx_Oracle and os

In [1]:
import cx_Oracle
import os
import numpy as np

Set Japanese shift JIS for environment variables for Oracel Database.

In [2]:
os.environ["NLS_LANG"] = "JAPANESE_JAPAN.JA16SJISTILDE"

Set the connection string to Oracle Database. (User ID, password, server network address, PORT number, service name)

In [3]:
connection = cx_Oracle.connect("(User ID)/(password)/(server IP address):(PORT number)/(service name)")
con = connection.cursor()

Set the SQL statement for the data to be acquired.

In [4]:
sql = 'select * from titanic_train'

Execute the SQL statement.

In [5]:
con.execute(sql)
Out[5]:
<cx_Oracle.Cursor on <cx_Oracle.Connection to system@//10.43.144.7:1521/ORCLPDB1>>

Export all of the execution results, Convert to numpy format matrix, And show results

In [6]:
row = con.fetchall()
rownp = np.array(row)
rownp
Out[6]:
array([['1', '0', '3', ..., '7', ' ', 'S'],
       ['2', '1', '1', ..., '71', 'C85', 'C'],
       ['3', '1', '3', ..., '8', ' ', 'S'],
       ...,
       ['889', '0', '3', ..., '23', ' ', 'S'],
       ['890', '1', '1', ..., '30', 'C148', 'C'],
       ['891', '0', '3', ..., '8', ' ', 'Q']],
      dtype='<U82')