Pytorch 的 Data Parallelism 多GPU训练
简单步骤
-
确定Device,看是否有可利用的GPU:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
-
正常定义并实例化模型和Dataloader
-
如果检测到的GPU多于一块,将模型并行化:
1
2
3if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
model = nn.DataParallel(model) -
将模型部署到相应的设备上:
model.to(device)
-
运行模型(循环中将Input数据也加载到相应设备上):
1
2
3
4
5for data in rand_loader:
input = data.to(device)
output = model(input)
print("Outside: input size", input.size(),
"output_size", output.size()) -
得到结果:
1
2
3
4
5
6
7
8
9
10
11
12In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])