热带地理 ›› 2020, Vol. 40 ›› Issue (2): 303-313.doi: 10.13284/j.cnki.rddl.003229

• “地理空间智能技术及应用”专题 • 上一篇    下一篇

采用双注意力机制Deeplabv3+算法的遥感影像语义分割

刘文祥, 舒远仲(), 唐小敏, 刘金梅   

  1. 南昌航空大学 信息工程学院图像处理省重点实验室,南昌 330063
  • 收稿日期:2019-10-11 修回日期:2020-02-17 出版日期:2020-03-10 发布日期:2020-05-15
  • 通讯作者: 舒远仲 E-mail:au0888@qq.com
  • 作者简介:刘文祥(1994—),男,江西高安人,硕士,主要研究方向为机器视觉和人工智能算法应用,(E-mail) 1609192581@qq.com。

Remote Sensing Image Segmentation Using Dual Attention Mechanism Deeplabv3+ Algorithm

Liu Wenxiang, Shu Yuanzhong(), Tang Xiaomin, Liu Jinmei   

  1. School of Information Engineering, Nanchang Hangkong University, Nanchang 330063,China
  • Received:2019-10-11 Revised:2020-02-17 Online:2020-03-10 Published:2020-05-15
  • Contact: Shu Yuanzhong E-mail:au0888@qq.com

摘要:

针对DeepLabv3+网络在遥感影像上呈现出拟合速度慢,边缘目标分割不精确,大尺度目标分割类内不一致、存在孔洞等缺陷,提出在该网络中引入双注意力机制模块(Dual Attention Mechanism Module, DAMM),设计并实现了将DAMM结构与ASPP(Atous Spatial Pyramid Pooling)层串联或并联的2种不同连接方式网络模型 ,串联连接方式中先将特征图送入DAMM后,再经过ASPP结构;并联连接方式中将双注意力机制层与ASPP层并行连接,网络并行处理主干网提取特征图,再融合两层处理特征信息。将改进的2种方法通过INRIA Aerial Image高分辨率遥感影像数据集验 证,结果表明,串联或并联方式2种网络都能有效改善Deeplabv3+的不足,并联方式网络性能更好,其对原网络缺陷改善效果更明显,并在测试数据集上mIoU达到85.44%,比Deeplabv3+提高了1.8%,而串联方式网络提高了1.12%。并联结构网络更符合本文需求,其形成了一种对DeepLabv3+网络上述问题进行统一改善的方案。

关键词: 遥感影像, 深度学习, DeepLabv3+, 注意力机制, 语义分割

Abstract:

Remote sensing image processing technology based on deep learning can prospectively be used to determine the characteristics of large numbers of remote sensing image data and complex scenes. However, deep-learning algorithms in remote sensing image processing have certain shortcomings, e.g., the popular DeepLabv3+ network has slow fitting speeds, inaccurate edge target segmentation, inconsistencies, and holes in large-scale target segmentation. We therefore proposed a method for introducing a Dual Attention Mechanism Module (DAMM) to DeepLabv3+ to address the above deficiencies. We designed two different network models that connected the DAMM structure to the Atous Spatial Pyramid Pooling (ASPP) layer in series or parallel. In the serial connection method, the feature map was first sent to the DAMM and then passed through the ASPP structure. Furthermore, the feature map was defused with middle-low layer feature information through the decoder layer and restored to the original image resolution. In the parallel connection method, the DAMM and ASPP layers processed the feature map extracted from the backbone network in parallel and subsequently fused the processed feature map information. The mixed feature map was restored to its original resolution by the decoder. The two improved methods were verified by the INRIA Aerial Image high-resolution remote sensing dataset. The results showed that both the series and parallel methods could effectively improve the shortcomings of Deeplabv3+. The experimental results showed that the parallel network had superior performance, and improvements in the original network defects were more obvious. The parallel method achieved a higher score [85.44% Mean Intersection Over Union (MIOU)] in the test dataset, which was 1.8% higher than Deeplabv3+. And the serial network increased by 1.12% compared to Deeplabv3+. The effects of the position and channel attention mechanisms in the DAMM structure were also determined. The ablation study results showed that the channel and position attention mechanisms improved the performance of the Deeplabv3+ model. In the test set, the channel and position attention mechanism mIoU increased by 0.95 and 1.32%, respectively. The experiments revealed that the position attention mechanism had a greater effect on edge target segmentation, the channel attention mechanism had a greater effect on large-scale hole phenomena, and the channel and position attention mechanism promoted network fitting speed in training. The proposed improved DeepLabv3+ algorithm can provide a scientific basis and reference for semantic segmentation of big data remote sensing images.

Key words: remote sensing image, deep learning, DeepLabv3+, attention mechanism, semantic segmentation

中图分类号: 

  • TP751