YOLOV3模型训练策略

使用two-stage训练策略或者one-stage训练策略：
（1）Two-stage训练策略：
　　First stage: Restore darknet53_body part weights from COCO checkpoints, train the yolov3_head with big learning rate like 1e-3 until the loss reaches to a low level.
Second stage: Restore the weights from the first stage, then train the whole model with small learning rate like 1e-4 or smaller. At this stage remember to restore the optimizer parameters if you use optimizers like adam.
（2）One-stage训练策略：
　　Just restore the whole weight file except the last three convolution layers (Conv_6, Conv_14, Conv_22). In this condition, be careful about the possible nan loss value.
加入一些其它有用的训练策略：
　　Cosine decay of lr (SGDR)：学习率余弦衰减
　　Multi-scale training：多尺度训练
　　Label smoothing：标签平滑
　　Mix up data augmentation：混合数据增强
　　Focal loss：焦点损失
　　这些都是好的训练策略，但是并不意味着它们肯定会提升性能。对于自己的任务应选择合适的策略。
Loss nan? 设置一个更大的预热epoch数量或者更小的学习率，并多次尝试。如果fine-tune整个模型的话，使用adam优化器有时会造成nan。可以尝试选择momentum优化器。