edo1z blog

プログラミングなどに関するブログです

TensorFlow - Local Response Normalization(LRN)(局所的応答正規化)

参考:theanoで局所コントラスト正規化(Local Contrast Normalization)を使う

正規化の方法にはいろいろあり、代表的なものを挙げると Global Contrast Normalization(GCN) Local Contrast Normalization(LCN) Local Response Normalization(LRN) ZCA whitening Local mean subtraction CNN内の正規化層としては、LCNやらLRNが使われる。 LCNはその名の通り、特徴マップの局所領域内でコントラストを正規化する。この処理は一つの特徴マップ内で完結するので、すべての特徴マップに対して独立して行う。 対してLRNでは、同一位置における異なる特徴マップ間で正規化する。どちらもいくつかのハイパーパラメータはあるが、学習の対象となるパラメータはないので、誤差伝播が容易に可能である。

参考:theanoでLocal Response Normalization(LRN)を使う

LRNは端的に述べると、「同一位置(ピクセル)において複数の特徴マップ間で正規化する」ということだそうだ。元の論文にも書いてあるが、LRNは"brightness normalization"であり、LCNのように輝度の平均を減算して0にしないことがミソらしい。

[mathjax] $$\displaystyle bi{x,y}=ai{x,y}/ \left( k+\alpha \sum^{min(N-1,i+\frac{n}{2})}{j=max(0,i-\frac{n}{2})} (aj{x,y})2 \right)^\beta $$

k, n, α, βがパラメータである{a^i_{x,y}}はi番目の特徴マップの(x,y)のピクセルを、Nは特徴マップの総数を表す。 summationの部分は、「i番目の特徴マップに対して、n近傍の特徴マップの二乗和をとる」という意味である。

参考:tf.nn.local_response_normalization(input, depth_radius=None, bias=None, alpha=None, beta=None, name=None)

Local Response Normalization. The 4-D input tensor is treated as a 3-D array of 1-D vectors (along the last dimension), and each vector is normalized independently. Within a given vector, each component is divided by the weighted, squared sum of inputs within depth_radius. In detail,

翻訳結果

4次元入力テンソルは、(最後の次元に沿って)1次元ベクトルの3次元配列として扱われ、各ベクトルは独立して正規化されます。 所与のベクトル内で、各成分は、depth_radius内の入力の加重二乗和で除算される。

sqr_sum[a, b, c, d] =
    sum(input[a, b, c, d - depth_radius : d + depth_radius + 1] ** 2)
output = input / (bias + alpha * sqr_sum) ** beta

使用例

norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='norm1')

Python3で関数をつくってみる

def lrn(input, depth_radius, bias, alpha, beta):
    input_t = input.transpose([2, 0, 1])
    sqr_sum = np.zeros(input_t.shape)
    for i in range(input_t.shape[0]):
        start_idx = i - depth_radius
        if start_idx < 0: start_idx = 0
        end_idx = i + depth_radius + 1
        sqr_sum[i] = sum(input_t[start_idx : end_idx] ** 2)
    return (input_t / (bias + alpha * sqr_sum) ** beta).transpose(1, 2, 0)

使ってみる

import tensorflow as tf
import numpy as np
from PIL import Image

input = np.array([
    [
        [230,230,230],[210,210,210],[190,190,190],[170,170,170],[150,150,150]
    ],
    [
        [230,230,230],[210,210,210],[190,190,190],[170,170,170],[150,150,150]
    ],
    [
        [230,230,230],[210,210,210],[190,190,190],[170,170,170],[150,150,150]
    ],
    [
        [230,230,230],[210,210,210],[190,190,190],[170,170,170],[150,150,150]
    ],
    [
        [230,230,230],[210,210,210],[190,190,190],[170,170,170],[150,150,150]
    ],
])

depth_radius = 2
bias = 1.0
alpha = 0.001 / 9.0
beta = 0.75

def lrn(input, depth_radius, bias, alpha, beta):
    input_t = input.transpose([2, 0, 1])
    sqr_sum = np.zeros(input_t.shape)
    for i in range(input_t.shape[0]):
        start_idx = i - depth_radius
        if start_idx < 0: start_idx = 0
        end_idx = i + depth_radius + 1
        sqr_sum[i] = sum(input_t[start_idx : end_idx] ** 2)
    return (input_t / (bias + alpha * sqr_sum) ** beta).transpose(1, 2, 0)

output = lrn(input, depth_radius, bias, alpha, beta)
print(output)
Image.fromarray(np.uint8(output)).save('./img/lrn.jpg')

結果

[[[ 25.64542621  25.64542621  25.64542621]
  [ 26.62530279  26.62530279  26.62530279]
  [ 27.69886502  27.69886502  27.69886502]
  [ 28.8700044   28.8700044   28.8700044 ]
  [ 30.13193797  30.13193797  30.13193797]]

 [[ 25.64542621  25.64542621  25.64542621]
  [ 26.62530279  26.62530279  26.62530279]
  [ 27.69886502  27.69886502  27.69886502]
  [ 28.8700044   28.8700044   28.8700044 ]
  [ 30.13193797  30.13193797  30.13193797]]

 [[ 25.64542621  25.64542621  25.64542621]
  [ 26.62530279  26.62530279  26.62530279]
  [ 27.69886502  27.69886502  27.69886502]
  [ 28.8700044   28.8700044   28.8700044 ]
  [ 30.13193797  30.13193797  30.13193797]]

 [[ 25.64542621  25.64542621  25.64542621]
  [ 26.62530279  26.62530279  26.62530279]
  [ 27.69886502  27.69886502  27.69886502]
  [ 28.8700044   28.8700044   28.8700044 ]
  [ 30.13193797  30.13193797  30.13193797]]

 [[ 25.64542621  25.64542621  25.64542621]
  [ 26.62530279  26.62530279  26.62530279]
  [ 27.69886502  27.69886502  27.69886502]
  [ 28.8700044   28.8700044   28.8700044 ]
  [ 30.13193797  30.13193797  30.13193797]]]

tf.nn.lrnを使ってみる

コード

import tensorflow as tf
import numpy as np
from PIL import Image

input = np.array([
    [
        [230,230,230],[210,210,210],[190,190,190],[170,170,170],[150,150,150]
    ],
    [
        [230,230,230],[210,210,210],[190,190,190],[170,170,170],[150,150,150]
    ],
    [
        [230,230,230],[210,210,210],[190,190,190],[170,170,170],[150,150,150]
    ],
    [
        [230,230,230],[210,210,210],[190,190,190],[170,170,170],[150,150,150]
    ],
    [
        [230,230,230],[210,210,210],[190,190,190],[170,170,170],[150,150,150]
    ],
])

depth_radius = 2
bias = 1.0
alpha = 0.001 / 9.0
beta = 0.75

input_for_tf = np.zeros([1, input.shape[0], input.shape[1], input.shape[2]])
input_for_tf[0] = input
output2 = tf.nn.lrn(input_for_tf, depth_radius, bias=bias, alpha=alpha, beta=beta)
with tf.Session() as sess:
    out = sess.run(output2)
    print(out)
    Image.fromarray(np.uint8(out[0])).show()

結果

[[[[ 25.64542389  25.64542389  25.64542389]
   [ 26.62529945  26.62529945  26.62529945]
   [ 27.6988678   27.6988678   27.6988678 ]
   [ 28.87000465  28.87000465  28.87000465]
   [ 30.13193893  30.13193893  30.13193893]]

  [[ 25.64542389  25.64542389  25.64542389]
   [ 26.62529945  26.62529945  26.62529945]
   [ 27.6988678   27.6988678   27.6988678 ]
   [ 28.87000465  28.87000465  28.87000465]
   [ 30.13193893  30.13193893  30.13193893]]

  [[ 25.64542389  25.64542389  25.64542389]
   [ 26.62529945  26.62529945  26.62529945]
   [ 27.6988678   27.6988678   27.6988678 ]
   [ 28.87000465  28.87000465  28.87000465]
   [ 30.13193893  30.13193893  30.13193893]]

  [[ 25.64542389  25.64542389  25.64542389]
   [ 26.62529945  26.62529945  26.62529945]
   [ 27.6988678   27.6988678   27.6988678 ]
   [ 28.87000465  28.87000465  28.87000465]
   [ 30.13193893  30.13193893  30.13193893]]

  [[ 25.64542389  25.64542389  25.64542389]
   [ 26.62529945  26.62529945  26.62529945]
   [ 27.6988678   27.6988678   27.6988678 ]
   [ 28.87000465  28.87000465  28.87000465]
   [ 30.13193893  30.13193893  30.13193893]]]]

おーほぼほぼ同じだ。適当な別の配列でも試してみよう。

適当な配列でも試してみる

適当な配列

input = np.zeros([1, 5, 5, 3])
num = 1
for h in range(5):
    for w in range(5):
        for c in range(3):
            input[0][h][w][c] = num
            num += 1

結果

[[[[  1.   2.   3.]
   [  4.   5.   6.]
   [  7.   8.   9.]
   [ 10.  11.  12.]
   [ 13.  14.  15.]]

  [[ 16.  17.  18.]
   [ 19.  20.  21.]
   [ 22.  23.  24.]
   [ 25.  26.  27.]
   [ 28.  29.  30.]]

  [[ 31.  32.  33.]
   [ 34.  35.  36.]
   [ 37.  38.  39.]
   [ 40.  41.  42.]
   [ 43.  44.  45.]]

  [[ 46.  47.  48.]
   [ 49.  50.  51.]
   [ 52.  53.  54.]
   [ 55.  56.  57.]
   [ 58.  59.  60.]]

  [[ 61.  62.  63.]
   [ 64.  65.  66.]
   [ 67.  68.  69.]
   [ 70.  71.  72.]
   [ 73.  74.  75.]]]]

自作関数の結果

[[[  0.99883492   1.99766984   2.99650476]
  [  3.97452398   4.96815498   5.96178597]
  [  6.88892644   7.87305879   8.85719114]
  [  9.70624045  10.6768645   11.64748854]
  [ 12.39542092  13.34891484  14.30240875]]

 [[ 14.93127874  15.86448366  16.79768858]
  [ 17.29503693  18.20530203  19.11556713]
  [ 19.47436707  20.35956557  21.24476407]
  [ 21.46299164  22.32151131  23.18003097]
  [ 23.25997335  24.09068669  24.92140002]]

 [[ 24.86882105  25.67104108  26.47326112]
  [ 26.29652922  27.06995655  27.84338388]
  [ 27.55264319  28.29730922  29.04197525]
  [ 28.64841279  29.36462311  30.08083343]
  [ 29.59606986  30.28435055  30.97263125]]

 [[ 30.40824252  31.06929128  31.73034003]
  [ 31.09750348  31.7321464   32.36678933]
  [ 31.67603931  32.28519391  32.89434852]
  [ 32.15542323  32.74006729  33.32471135]
  [ 32.54647174  33.10761781  33.66876387]]

 [[ 32.85916694  33.39784181  33.93651668]
  [ 33.10262795  33.61985651  34.13708507]
  [ 33.28511771  33.78191052  34.27870332]
  [ 33.41407404  33.89141795  34.36876187]
  [ 33.49615612  33.95500757  34.41385903]]]

tf.nn.lrnの結果

[[[[  0.99883491   1.99766982   2.99650478]
   [  3.97452402   4.96815491   5.96178627]
   [  6.88892651   7.8730588    8.85719109]
   [  9.70624065  10.67686462  11.64748859]
   [ 12.39542103  13.34891415  14.30240822]]

  [[ 14.93127823  15.86448288  16.79768753]
   [ 17.29503632  18.20530128  19.11556625]
   [ 19.47436714  20.35956573  21.24476433]
   [ 21.46299171  22.32151222  23.18003082]
   [ 23.25997543  24.09068871  24.92140198]]

  [[ 24.86882019  25.67103958  26.47325897]
   [ 26.29652977  27.06995773  27.8433857 ]
   [ 27.55264282  28.29730988  29.04197502]
   [ 28.6484127   29.36462212  30.08083344]
   [ 29.59606934  30.28435135  30.97263145]]

  [[ 30.40824318  31.06929207  31.73034096]
   [ 31.09750175  31.73214531  32.36678696]
   [ 31.67603874  32.2851944   32.89434814]
   [ 32.15542221  32.74006653  33.32471085]
   [ 32.54647446  33.10762024  33.66876602]]

  [[ 32.85916519  33.39783859  33.93651581]
   [ 33.1026268   33.61985397  34.13708496]
   [ 33.2851181   33.78191376  34.2787056 ]
   [ 33.41407394  33.89141846  34.36876297]
   [ 33.49615479  33.95500946  34.41386032]]]]

ほぼ同じ。

大きい写真にLRNをしてみて結果をみてみる

画像はこれです。

コード

import tensorflow as tf
import numpy as np
from PIL import Image

fpath = './img/sample_pic.jpg'
jpg = tf.read_file(fpath)
img = tf.image.decode_jpeg(jpg, channels=3)
input = tf.cast(tf.reshape(img, [1, 600, 800, 3]), dtype=tf.float32)

depth_radius = 2
bias = 1.0
alpha = 0.001 / 9.0
beta = 0.75

output = tf.nn.lrn(input, depth_radius, bias=bias, alpha=alpha, beta=beta)
with tf.Session() as sess:
    out = sess.run(output)
    print(out)
    Image.fromarray(np.uint8(out[0])).save('.・img/lrn_tf.jpg')

結果

これをCNNに入れ込むと効果が高まるって気づいた人すごいっす。