# dHSIC ¶

method for detecting non-linear correlations

## Introduction to dHSIC ¶

## Limitation of Pearson's Correlation Coefficient ¶

```
In [37]:
```

```
import numpy as np
import matplotlib.pyplot as plt
#Generate data
x = np.random.randn(1000)
y = x * 3
#plot data
plt.scatter(x, y)
plt.show()
#Calcurate Pearson's Correlation Coefficient
print(np.corrcoef(x, y))
```

```
[[ 1. 1.]
[ 1. 1.]]
```

Note that the output of 'numpy.corrcoef(x, y)' is as follows: array([[ xx , xy ], [ yx , yy ]]). Here, 'xx' means the correlation between 'x' and 'x', 'xy' means the correlation between 'x' and 'y', so as 'yx' and 'yy'.

Then, the result showed that the Pearson's correlation coefficient between 'x' and 'y' is 1, which means that these two data series are linearly correlated.

Next, we calculate Pearson's correlation coefficient of other data. Our next data follows y=x^2 .

```
In [38]:
```

```
import numpy as np
import matplotlib.pyplot as plt
#Generate data
x = np.random.randn(1000)
y = x ** 2
#plot data
plt.scatter(x, y)
plt.show()
#Calcurate Pearson's Correlation Coefficient
print(np.corrcoef(x, y))
```

```
[[ 1. -0.02372486]
[-0.02372486 1. ]]
```

Here, Pearson's correlation coefficient was almost zero. This means that these two data series are not linearly correlated. However, it does not mean that these two data series are not correlated. Even the linear correlation was rejected by Pearson's correlation coefficient, the data series might be correlated not in linear way.

It does not make a sense that these data series are not mutually correlated. In this case, if the value of x is given, we can instantly calculate y. This means that x and y are correlated in some way. Limited number of actual data sets have linear correlations. Many actual data sets have non-linear correlations, rather than linear-correlations, if exist. Here lies a big problem; as long as we use Pearson's correlation coefficient, we cannot detect non-linear correlations, which dominates linear correlations in number in real world, as this example shows.

## Detection of non-linear correlations ¶

Now, our interest is detecting non-linear correlations. Several methods such as maximum information coefficient (MIC) or HSIC have been suggested. In this tutorial, we use dHSIC. dHSIC is derived from HSIC. The original HSIC cannot detect correlations between more than two variables. dHSIC is the extension of HSIC to overcome the limitation in the number of variables. For more details, see citation 3.

Now, we use dHSIC to detect non-linear correlations.

```
In [39]:
```

```
from renom_dp.dHSIC import dHSIC_test
x = np.random.randn(1000, 1)
y = x ** 2
plt.scatter(x, y)
print(dHSIC_test([x, y]))
```

```
0.0
```

Of course, other correlations can also be detected. Followings are some examples.

### In the case of y=x^3+2 ¶

```
In [40]:
```

```
from renom_dp.dHSIC import dHSIC_test
x = np.random.randn(1000, 1)
y = x ** 3 + 2
plt.scatter(x, y)
print(dHSIC_test([x, y]))
```

```
0.0
```

### In the case of y=e^x ¶

```
In [41]:
```

```
from renom_dp.dHSIC import dHSIC_test
x = np.random.randn(1000, 1)
y = np.exp(x)
plt.scatter(x, y)
plt.show()
print(dHSIC_test([x, y]))
```

```
0.0
```

For all of the given examples, the correlations were detected.

```
In [42]:
```

```
from renom_dp.dHSIC import dHSIC_test
x = np.random.randn(1000, 1)
y = x ** 3 + np.random.randn(1000, 1) * 3
plt.scatter(x, y)
plt.show()
print(dHSIC_test([x, y]))
```

```
0.0
```

The correlation was detected.

Next, we would like to check whether the results are reliable. To meet this end, we now change the intensity of noise and calculate the dHSIC.

```
In [19]:
```

```
from renom_dp.dHSIC import dHSIC_test
dHSIC_result = []
for n in range(100):
temp = []
for i in range(10):
x = np.random.randn(1000, 1)
y = x ** 3 + np.random.randn(1000, 1) * n
temp.append(dHSIC_test([x, y]))
dHSIC_result.append(np.average(temp))
plt.plot(dHSIC_result)
plt.xlabel("Noise Intensity")
plt.ylabel("dHSIC_test output")
plt.show()
```

In order to decrease the effect of chance, we calculated repeatedly and used the average to make the graph. This shows that the probability increases as the noise becomes bigger. We now proceed to the next step; the application of dHSIC.

## Applications of dHSIC ¶

There are many possible applications for dHSIC.

Sometimes, we need to select some explanatory variables from many options. This is one of the possible applications of dHISC.

```
In [8]:
```

```
from renom_dp.dHSIC import dHSIC_test
x1 = np.random.randn(1000, 1)
x2 = np.random.randn(1000, 1)
x3 = np.random.randn(1000, 1)
x4 = np.random.randn(1000, 1)
x5 = np.random.randn(1000, 1)
y = x1 ** 3 + x2 ** 2 + np.random.randn(1000, 1) * 3
print("x1:{}".format(dHSIC_test([x1, y])))
print("x2:{}".format(dHSIC_test([x2, y])))
print("x3:{}".format(dHSIC_test([x3, y])))
print("x4:{}".format(dHSIC_test([x4, y])))
print("x5:{}".format(dHSIC_test([x5, y])))
```

```
x1:0.0
x2:3.3306690738754696e-16
x3:0.14150768560369287
x4:0.5367135121365071
x5:0.6508547196457113
```

The conbination of (x1, y) and (x2, y) yeilds much smaller dHSIC possibility compared to others. This result shows that x1 and x2 are probably explanatory variables of y. In fact, y was calculated using x1 and x2. In this way, dHSIC can be used to select possible explanatory variables from many possibilities.

Another example. When y = x1 * x2, it is natural that x1 and y or x2 and y have no correlation because, for example, when x1 is large, y can be both large or small dependeing on the the value of x2, and vise virsa. However, y and the combination of x1 and x2 or y, x1 and x2 must be correlated. Let us try!

```
In [20]:
```

```
from renom_dp.dHSIC import dHSIC_test
x1 = np.random.randn(1000, 1)
x2 = np.random.randn(1000, 1)
y = x1 * x2 + np.random.randn(1000, 1) * 3
print("x1:{}".format(dHSIC_test([x1, y])))
print("x2:{}".format(dHSIC_test([x2, y])))
print("x1&x2:{}".format(dHSIC_test([x1, x2, y])))
print("x1x2:{}".format(dHSIC_test([np.concatenate((x1, x2), axis = 1), y])))
```

```
x1:0.26645226359592644
x2:0.5707624912362048
x1&x2:0.000586965474501322
x1x2:4.9920888878784986e-05
```

## Usage of dHSIC_test() ¶

Now, we will see how to use dHSIC_test function.

First, we need to import the function. Numpy is not necessarily required, however, it is very useful.

```
In [24]:
```

```
import numpy as np
from renom_dp.dHSIC import dHSIC_test
```

```
In [25]:
```

```
x1 = np.random.randn(1000, 1)
x2 = np.random.randn(1000, 1)
x3 = x1 * 3 + x2 * 1
```

Next we give it to dHISC_test function.

```
In [26]:
```

```
dHSIC_test([x1, x2])
```

```
Out[26]:
```

```
0.078981160413225782
```

Note that data input should be an array consists of 2-dimensional matrixes.

We can also give 3 arrays to the input as follows.

```
In [27]:
```

```
dHSIC_test([x1, x2, x3])
```

```
Out[27]:
```

```
0.0
```

Or we can give an array consisted of numpy arrays whose shapes are (n, 2), (n, 3) etc.

```
In [28]:
```

```
x12 = np.concatenate((x1, x2), axis = 1)
dHSIC_test([x12, x3])
```

```
Out[28]:
```

```
0.0
```

Now, we see the usage of available options.

Calcuration of the probability requires a statistical test. In this implimentation, 3 kinds of statistical methods are available; Gamma test, Permutation test and Bootstrap test. See the table below for the characteristics of each method. The default is "Gamma".

Test | Time | Input |
---|---|---|

Gamma | Fast | "Gamma" |

Permutation | Slow | "Permutation" |

Bootstrap | Slow | "Bootstrap" |

```
In [29]:
```

```
print(dHSIC_test([x12, x3], method = "Gamma"))
print(dHSIC_test([x12, x3], method = "Permutation"))
print(dHSIC_test([x12, x3], method = "Bootstrap"))
```

```
0.0
0.01
0.01
```

When we use Permutation or Bootstrap method, we can specify the number of shuffled data used for each test by using variable "samplesize". The default is 99. As this number increases, the output is getting reliable, however, it is becoming more resource consurming.

```
In [30]:
```

```
print(dHSIC_test([x12, x3], method = "Permutation", samplesize = 100))
print(dHSIC_test([x12, x3], method = "Bootstrap", samplesize = 999))
```

```
0.00990099009901
0.001
```

We can use any kernel. Gaussian kernel and Laprasian kernel., which are often used, are pre-implimented. The default is Gaussian kernel.

```
In [31]:
```

```
print(dHSIC_test([x12, x3], kernel = "Gaussian"))
print(dHSIC_test([x12, x3], kernel = "Laprasian"))
```

```
0.0
0.0
```

```
In [32]:
```

```
def my_kernel(x, y):
return np.exp(-np.sum(np.abs(x - y) ** 2))
my_kernel(np.array([1,2]), np.array([3,4]))
```

```
Out[32]:
```

```
0.00033546262790251185
```

```
In [33]:
```

```
print(dHSIC_test([x12, x3], kernel = my_kernel))
```

```
0.0
```

Note that inputs of user-defined kernel function should be array type, not float type. Also the output of the function should be float type.

We can get detailed information by using detail method.

First, we define a dictionaly to put detailed information.

```
In [34]:
```

```
detail = {}
```

Next, just give the dictionary to the function.

```
In [35]:
```

```
dHSIC_test([x12, x3], detail = detail)
print(detail)
```

```
{'critical value': 0.52771129667280015, 'statistic': 45.164293934647823, 'dHSIC': 0.045164293934647826}
```

```
In [36]:
```

```
dHSIC_test([x12, x3], detail = detail, alpha = 0.01)
print(detail)
```

```
{'critical value': 0.63138353631016109, 'statistic': 45.164293934647823, 'dHSIC': 0.045164293934647826}
```

## Citations ¶

- Gretton, A., Bousquet, O., Smola, A., and Schölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. In Algorithmic learning theory, pp. 63–77. Springer-Verlag.
- Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., and Smola, A. J. (2007), A kernel statistical test of independence. Advances in Neural Information Processing Systems (NIPS 20), pp. 585–592.
- Pfister, N. , Bühlmann, P. , Schölkopf, B. and Peters, J. (2018), Kernel‐based tests for joint independence. J. R. Stat. Soc. B, 80: 5-31. doi:10.1111/rssb.12235