以Tensorflow为例。
=======================================
神经网络(TensorFlow举例)在GPU中训练时需要占用的内存大概有下面几部分组成:
1. TensorFlow的自身的计算库导入显存时所在内存,一般在session初始化时就会自动导入,这部分大小与TensorFlow的版本有关,这里大概估计普遍为450MB大小;
2. 神经网络的本身模型大小;
3. 优化器对神经网络模型进行优化时自身参数大小,一般一阶优化器的自身参数大小和神经网络模型本身大小相同,二阶优化器一般为模型本身大小的两倍(如:RMSProp优化器);
4. 神经网络前向计算时所产生的临时Tensor,这部分Tensor需要被临时保存,以便在反传计算梯度时使用,这部分Tensor的大小和模型的每一层结构形状有关(必须根据具体模型的每层形状来计算)也和具体的batch_size大小以及输入数据input_data的大小有关;
5. 神经网络向后性计算时被人为设定的一些负责监督运行状态或其他操作的Tensor,如:某些层的反传计算出的梯度进行正则化后或进行求mean/std/max/min等操作产生的Tensor,某些层前传时计算该层输出的一些统计Tensor(求mean/std/max/min等操作产生的Tensor);
--------------------------------------------
在估计一个模型在训练时所需的显存空间一般可以考虑将上面1, 2, 3部分的大小加总即可;4部分的大小难以计算,毕竟这个大小还和输入数据的大小和batch_size有关,而且每层的输出结构也是差异极大的;5部分的大小一般有限,可以不主要考虑。
PS:
这里需要说明一下,有些时候我们在跑模型训练的时候发现显存不够报错的时候我们可以通过调小batch_size的方式来解决,这时候我们就是通过调小第4部分显存大小来解决的;但是有些时候即使把batch_size设置为1也无法满足,那么就应该是第1,2,3部分所占显存空间已经超出了可用空间大小。
--------------------------------------------
关于第一部分显存大小,下面给出TensorFlow1.14版本的情况:
查看显存占用:
可以看到导入库后所占显存大小为104MB。
--------------------------------------------
关于第2, 3部分的大小,我们使用例子:(只给出项目中进行修改的文件)
https://gitee.com/devilmaycry812839668/paac/blob/master/actor_learner.py
我们在__init__函数中添加下面代码:
for var in self.optimizer_variables:
print(var)
print("="*30)
for var in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES):
print(var)
print("="*30)
all_var = tf.concat([tf.reshape(var, [-1]) for var in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)], axis=0)
print(all_var.shape)
print("="*30)
print(self.optimizer.get_slot_names())
for name in self.optimizer.get_slot_names():
for var in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES):
print(self.optimizer.get_slot(var, name))
运行结果:
<tf.Variable 'local_learning_1/conv1_weights/OptimizerVariables:0' shape=(8, 8, 4, 16) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_weights/OptimizerVariables_1:0' shape=(8, 8, 4, 16) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_biases/OptimizerVariables:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_biases/OptimizerVariables_1:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_weights/OptimizerVariables:0' shape=(4, 4, 16, 32) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_weights/OptimizerVariables_1:0' shape=(4, 4, 16, 32) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_biases/OptimizerVariables:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_biases/OptimizerVariables_1:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_weights/OptimizerVariables:0' shape=(2592, 256) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_weights/OptimizerVariables_1:0' shape=(2592, 256) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_biases/OptimizerVariables:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_biases/OptimizerVariables_1:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_weights/OptimizerVariables:0' shape=(256, 6) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_weights/OptimizerVariables_1:0' shape=(256, 6) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_biases/OptimizerVariables:0' shape=(6,) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_biases/OptimizerVariables_1:0' shape=(6,) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_weights/OptimizerVariables:0' shape=(256, 1) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_weights/OptimizerVariables_1:0' shape=(256, 1) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_biases/OptimizerVariables:0' shape=(1,) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_biases/OptimizerVariables_1:0' shape=(1,) dtype=float32_ref>
==============================
<tf.Variable 'local_learning_1/conv1_weights:0' shape=(8, 8, 4, 16) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_biases:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_weights:0' shape=(4, 4, 16, 32) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_weights:0' shape=(2592, 256) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_biases:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_weights:0' shape=(256, 6) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_biases:0' shape=(6,) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_weights:0' shape=(256, 1) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_biases:0' shape=(1,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_weights/OptimizerVariables:0' shape=(8, 8, 4, 16) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_weights/OptimizerVariables_1:0' shape=(8, 8, 4, 16) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_biases/OptimizerVariables:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_biases/OptimizerVariables_1:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_weights/OptimizerVariables:0' shape=(4, 4, 16, 32) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_weights/OptimizerVariables_1:0' shape=(4, 4, 16, 32) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_biases/OptimizerVariables:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_biases/OptimizerVariables_1:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_weights/OptimizerVariables:0' shape=(2592, 256) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_weights/OptimizerVariables_1:0' shape=(2592, 256) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_biases/OptimizerVariables:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_biases/OptimizerVariables_1:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_weights/OptimizerVariables:0' shape=(256, 6) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_weights/OptimizerVariables_1:0' shape=(256, 6) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_biases/OptimizerVariables:0' shape=(6,) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_biases/OptimizerVariables_1:0' shape=(6,) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_weights/OptimizerVariables:0' shape=(256, 1) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_weights/OptimizerVariables_1:0' shape=(256, 1) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_biases/OptimizerVariables:0' shape=(1,) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_biases/OptimizerVariables_1:0' shape=(1,) dtype=float32_ref>
==============================
(2033829,)
==============================
['momentum', 'rms']
<tf.Variable 'local_learning_1/conv1_weights/OptimizerVariables_1:0' shape=(8, 8, 4, 16) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_biases/OptimizerVariables_1:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_weights/OptimizerVariables_1:0' shape=(4, 4, 16, 32) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_biases/OptimizerVariables_1:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_weights/OptimizerVariables_1:0' shape=(2592, 256) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_biases/OptimizerVariables_1:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_weights/OptimizerVariables_1:0' shape=(256, 6) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_biases/OptimizerVariables_1:0' shape=(6,) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_weights/OptimizerVariables_1:0' shape=(256, 1) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_biases/OptimizerVariables_1:0' shape=(1,) dtype=float32_ref>
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
<tf.Variable 'local_learning_1/conv1_weights/OptimizerVariables:0' shape=(8, 8, 4, 16) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_biases/OptimizerVariables:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_weights/OptimizerVariables:0' shape=(4, 4, 16, 32) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_biases/OptimizerVariables:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_weights/OptimizerVariables:0' shape=(2592, 256) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_biases/OptimizerVariables:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_weights/OptimizerVariables:0' shape=(256, 6) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_biases/OptimizerVariables:0' shape=(6,) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_weights/OptimizerVariables:0' shape=(256, 1) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_biases/OptimizerVariables:0' shape=(1,) dtype=float32_ref>
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
通过结果我们可以知道,第2部分显存为模型的参数,即10个Tensor Variable所占,第3部分为20个针对模型参数的优化器参数,即20个Tensor Variable所占;其中模型参数(10个Variable)大小总共为(MB):
优化器的参数时模型参数的两倍,所以总共的Variable参数为3倍的模型参数(MB):
模型参数,10个Variable:
<tf.Variable 'local_learning_1/conv1_weights:0' shape=(8, 8, 4, 16) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_biases:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_weights:0' shape=(4, 4, 16, 32) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_weights:0' shape=(2592, 256) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_biases:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_weights:0' shape=(256, 6) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_biases:0' shape=(6,) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_weights:0' shape=(256, 1) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_biases:0' shape=(1,) dtype=float32_ref>
优化器参数(二阶),20个Variable:
<tf.Variable 'local_learning_1/conv1_weights/OptimizerVariables:0' shape=(8, 8, 4, 16) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_weights/OptimizerVariables_1:0' shape=(8, 8, 4, 16) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_biases/OptimizerVariables:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_biases/OptimizerVariables_1:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_weights/OptimizerVariables:0' shape=(4, 4, 16, 32) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_weights/OptimizerVariables_1:0' shape=(4, 4, 16, 32) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_biases/OptimizerVariables:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_biases/OptimizerVariables_1:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_weights/OptimizerVariables:0' shape=(2592, 256) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_weights/OptimizerVariables_1:0' shape=(2592, 256) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_biases/OptimizerVariables:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_biases/OptimizerVariables_1:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_weights/OptimizerVariables:0' shape=(256, 6) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_weights/OptimizerVariables_1:0' shape=(256, 6) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_biases/OptimizerVariables:0' shape=(6,) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_biases/OptimizerVariables_1:0' shape=(6,) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_weights/OptimizerVariables:0' shape=(256, 1) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_weights/OptimizerVariables_1:0' shape=(256, 1) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_biases/OptimizerVariables:0' shape=(1,) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_biases/OptimizerVariables_1:0' shape=(1,) dtype=float32_ref>
其中,这20个优化器参数在槽['momentum']中的有:
<tf.Variable 'local_learning_1/conv1_weights/OptimizerVariables_1:0' shape=(8, 8, 4, 16) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_biases/OptimizerVariables_1:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_weights/OptimizerVariables_1:0' shape=(4, 4, 16, 32) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_biases/OptimizerVariables_1:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_weights/OptimizerVariables_1:0' shape=(2592, 256) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_biases/OptimizerVariables_1:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_weights/OptimizerVariables_1:0' shape=(256, 6) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_biases/OptimizerVariables_1:0' shape=(6,) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_weights/OptimizerVariables_1:0' shape=(256, 1) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_biases/OptimizerVariables_1:0' shape=(1,) dtype=float32_ref>
其中,这20个优化器参数在槽['rms']中的有:
<tf.Variable 'local_learning_1/conv1_weights/OptimizerVariables:0' shape=(8, 8, 4, 16) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv1_biases/OptimizerVariables:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_weights/OptimizerVariables:0' shape=(4, 4, 16, 32) dtype=float32_ref>
<tf.Variable 'local_learning_1/conv2_biases/OptimizerVariables:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_weights/OptimizerVariables:0' shape=(2592, 256) dtype=float32_ref>
<tf.Variable 'local_learning_1/fc3_biases/OptimizerVariables:0' shape=(256,) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_weights/OptimizerVariables:0' shape=(256, 6) dtype=float32_ref>
<tf.Variable 'local_learning_2/actor_output_biases/OptimizerVariables:0' shape=(6,) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_weights/OptimizerVariables:0' shape=(256, 1) dtype=float32_ref>
<tf.Variable 'local_learning_2/critic_output_biases/OptimizerVariables:0' shape=(1,) dtype=float32_ref>
判断优化器中参数是否可以训练:
查询结果:
------------------------------------------------------
可以看到,在例子中,第2部分和第3部分的显存大小共为:7.7584MB
而第1部分的显存占用为104MB,那么我们运行下例子所在的项目,看下总共在训练时占用显存大小:
由此我们可以知道,即使是一些模型大小特别小的模型(模型参数加优化器参数共7.7584MB)也可以在运行时占用大量显存,这时候所占用显存的主要为第4部分和第5部分所占,我们将batch_size设置为1,然后再看下:
此时项目启动命令:python3 train.py -g pong -df logs/ -ec 1 -ew 1 --max_local_steps 1
PS:
通过这个例子,我们可以知道,即使一个模型特别小(优化器参数也随之很小,共7.7584MB),但是计算过程中所需导入显存的lib库和训练过程中产生的临时Tensor、用于检测的Tensor所占的空间大小也可以是很大的,甚至是近百倍于模型参数大小;所以说一个模型在训练过程中所需最小显存空间并不由模型参数大小所完全决定,有时候训练过程中的临时Tensor大小会大于模型参数大小;通过修改batch_size可以缩小训练过程中的临时Tensor大小,但是这个缩小程度毕竟有限,例子中通过减小batch size所获增加的空余显存空间为832-592=240MB。
------------------------------------------------------
RMSProp优化器:
参考:
https://www.coder.work/article/93009
=======================================