1. How to Use ReNomRG GUI tool ¶
1.1. Start the Application ¶
ReNomRG is a single page web application. If your installation have done successfully, you can run application in any directory with following commands.
cd workspace # Workspace can be any directory. renom_rg # This command will starts ReNomRG GUI server.
For the command
, you can give following arguments.
–host : This specifies server address.
–port : This specifies port number of the server.
For example, following code runs ReNomRG with port 8888.
renom_rg --port 8888 # Running ReNomRG with port 8888
If the application server runs, open web browser and type the server address to the address bar like this.
Then the application will be appeared.
1.2. Place your dataset ¶
When the server starts,
file will be created in the server running directory.
The directory structure is below.
<server_start_directory> └── alembic.ini # database setting file. └── alembic | └── versions # database migration files. | └── env.py # database environment file. └── storage | └── storage.db # default database(sqlite3). | └── trained_weight # weights for regression models. └── datasrc | └── data.pickle # pickle data for train & validation (if data is in pickle format). | └── data.csv # csv data for train & validation (if data is in csv format). | └── prediction_set | └── pred.pickle # pickle data for prediction (if data is in pickle format). | └── pred.csv # csv data for prediction (if data is in csv format). └── scripts └── userdefmodel.py # scripts for user defined model.(available any name.)
If your data is in pickle format, the train/validation data must be named “data.pickle” and the prediction data must be named “pred.pickle”. If your data is in csv format, the train/validation data must be named “data.csv” and the prediction data must be named “pred.csv”.
1.2.1. Format of the data ¶
The supported input file formats are pickled pandas.DataFrame objects and csv data.
Data in csv format must adhere to the following internal format:
The first row must include only the column header names starting from the second column.
The first column must include only sequential numbering starting from 1 in the second row.
Please refer to the image below for a sample.
1.3. Create Regression Model ¶
So far, the server and dataset are prepared. Let’s build a regression model.
For building a model, you have to specify
1.3.1. Create Dataset ¶
For training a machine learning model, you have to prepare training dataset and validation dataset. Training dataset is used for training model, and validation dataset is used for evaluating a model in terms of how accurately predicted value is. In ReNomRG, training dataset and validation dataset will be randomly sampled from the data that is in the datasrc directory.
According to the above figure, you can create dataset from the datasrc. Once the dataset is created, its content will not be changed. Please press new button.
The following page will be appeared.
As you can see, you can specify the dataset name, ‘’description’‘, ratio of training data, feature scaling and Features. After filling all forms, push the confirm button to confirm the dataset.
Then following graph will appear. You can confirm total number of data and ratio of training data contained in the dataset and the histogram of the objective variable. For saving the dataset, push the save button. You can confirm created datasets in the dataset page. To go to the dataset page, please follow the figure below.
When you click on each dataset row, you can confirm the number of data contained in them, the number of teacher data of each variable, the histogram of the objective variable.
1.3.2. Hyper parameter setting ¶
All the materials have been completed so far. Let’s create a model and train it. To create a model, press the + New button. The model setting hyper parameter appears as shown in the figure below.
As you can see in figure above, you can specify the following parameters:
Dataset Name: Dataset for training.
Architecture: Regression algorithm. C-GCNN selects variables for convolution based on correlation coefficient between variables. Kernel-GCNN selects variables for convolution based on similarities between variables obtained from Gaussian kernel. DBSCAN-GCNN selects variables for convolution based on the Euclidean distance between variables. Random Forest is an ensemble learning (machine learning algorithm) using multiple models (decision trees). XGBoost is an ensemble learning (machine learning algorithm) that combines Gradient Boosting and Random Forests.
Training loop setting: Batch size and number of training. Batch Size, Total Epoch.
Graph Comvolution Params: Number of neighbors is parameters of Graph Convolution. The number of neighbors used when data are expanded as if they were images.
Random Forest (XGBoost) Params: Number of trees is number of decision trees. Maximum Depth is depth of decision tree.
1.3.3. Training Model ¶
When the hyper parameter setting is completed, press the [Run] button to start the training. When training begins, the model is displayed in the model list and a progress bar appears.
1.4. Uninstall ReNomRG ¶
ReNomRG can be uninstalled with the following pip command.
pip uninstall renom_rg
(Please check the Product page for other detailed operation methods.) https://www.renom.jp/notebooks/product/renom_rg/about_renom_rg/notebook.html