Semantic segmentation of cilia using fully - Convolutional dense networks

Benefits of Dense Connections These results show that dense connections are extremely efficient in facilitating increased the flow of information and gradients between layers. This allows the network to reuse feature maps and act as natural regularization by making information from earlier layers available to later layers. This is similar to Implicit Deep Supervision from loss function by shorter connections but with the added benefit of being less complicated than DSN. [9] show that each individual layer in ResNets [6] contribute little to the model overall and can be randomly dropped out during training. This redundancy in features is avoided in DenseNets through their dense connections, acting as a kind of inherent multi-scale supervision [14]. Because there is no need to relearn redundant feature maps in a DenseNet, there is less need for regularization, and in our experiments we found that too much regularization to even degrade performance accuracy. Sometimes the FC-DenseNet correctly predicted background cilia even when they were not annotated in the ground truth (Figure 2. 2nd column). Additionally, there is some ambiguity in the distinction between side and top cilia in the manual annotations. After demonstrating the effectiveness of segmentation from a single frame, subsequent frames of the video could be taken into context to alleviate some of this ambiguity and further refine prediction by incorporating the temporal dimension. Unlike most fully convolutional networks, FC-DenseNets achieve excellent results without the need for pre-trained weights or post-processing steps such as Conditional Random Fields (CRF) to finetune the segmentation masks [2, 13]. However, since our model is trainable endto-end, using transfer learning [24] and pre-training or CRF post-processing might well boost model performance even more.

8 trang | Chia sẻ: hachi492 | Lượt xem: 70 | Lượt tải: 0

Bạn đang xem nội dung tài liệu Semantic segmentation of cilia using fully - Convolutional dense networks, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

73TẠP CHÍ KHOA HỌC, Số 42, tháng 9 năm 2020 1 INTRODUCTION Cilia help to regulate the respiratory system by generating fluid flow to deliver nutrients and propel out mucus and foreign matter. These hairlike structures are vital in reproduction, homeostasis, and organs development. Defects in ciliary motion have been associated with a wide range of diseases such as birth defects, sinopulmonary infection, and congenital heart disease [22] [5]. Assessment of ciliary motion SEMANTIC SEGMENTATION OF CILIA USING FULLY-CONVOLUTIONAL DENSE NETWORKS Nguyễn Trung Kiên Phòng Khảo thí và Đảm bảo chất lượng Email: kiennt@dhhp.edu.vn Ngày nhận bài: 29/5/2020 Ngày PB đánh giá: 19/6/2020 Ngày duyệt đăng: 03/7/2020 TÓM TẮT Cilia là cấu trúc giống như lông mao nhô ra từ gần như mọi tế bào trong cơ thể. Các bệnh nhân liên quan đến thiểu năng trí tuệ thường có chứng năng lông mao bị gián đoạn dẫn đến một loạt các bệnh lý đi kèm. Tuy nhiên, hầu hết các kỹ thuật để đánh giá chức năng hoạt động của những lông mao này đều dựa vào nhận dạng thủ công và theo dõi lâm sàng. Trong điều kiện phát triển mạnh mẽ của khoa học kỹ thuật quan sát hiện nay thì việc thực hiện thủ công này dễ bị lỗi và cần thêm kỹ thuật phân tích chi tiết và chính xác hơn. Các mạng chập sâu có thể mã hóa các tính năng phức tạp trong khi đủ nhạy để nhận dạng và phân lớp các dạng biểu hiện chức năng dựa trên hình ảnh của những lông mao này. Chúng tôi so sánh DenseNets tích chập hoàn toàn với một mô hình cơ sở mà không có kết nối đầy đủ cho nhiệm vụ phân lớp biểu hiện lâm sàng của bệnh thiểu năng trí tuệ thông qua phân đoạn ngữ nghĩa các khung của viedeo sinh thiết. Được đào tạo trên 235 hình ảnh được gắn nhãn, DenseNets tích chập hoàn toàn của chúng tôi đạt được độ chính xác pixel tổng tể là 90.15%, chính xác hơn 13% so với U-Net. Chúng tôi nhấn mạnh các đặc tính có lợi của DenseNets cho phân đoạn ngữ nghĩa trên các bộ dữ liệu y sinh nhỏ. Từ khóa: Cilia, thiểu năng trí tuệ, phân đoạn ngữ nghĩa, mạng DenseNet, mạng tích chập kết nối đầy đủ. Abstract: Cilia are hairlike structures protruding from nearly every cell in the body. Diseases known as ciliopathies where cilia function is disrupted can result in a wide spectrum of diseases. However, most techniques for assessing ciliary motion rely on manual identification and tracking of cilia. This annotation is tedious and error-prone and more analytical techniques impose strong assumptions such as periodic motion of beating cilia. Deep convolutional networks are able to encode complex features while remaining sensitive enough to segment cilia that exhibit a variety of motion patterns. We compare fully convolutional DenseNets to a baseline model without dense connections for the novel task of cilia segmentation from frames of biopsy videos. Trained on 235 labeled images, our fully convolutional DenseNet achieves an overall pixel accuracy of 90.15%which is 13% more accurate than U-Net. We highlight the advantageous properties of DenseNets for semantic segmentation on small biomedical datasets. Keywords: Cilia, Ciliopathies, Semantic Segmentation, DenseNets, Fully Convolutional DenseNets 74 TRƯỜNG ĐẠI HỌC HẢI PHÒNG from biopsy videos relies on accurate detection and segmentation of cilia from the surrounding cell, a task that even in a recent ciliary motion analysis pipelines remains as manual labor [16] [21]. Therefore, accurate and fully automated segmentation of motile cilia provides clinical significance in the identification and further study of ciliopathies and their underlying motion patterns. Current methods of ciliopathy identification and diagnosis rely on an ensemble of techniques. Electron microscopy (EM) can elucidate structural defects that connect to certain ciliopathies,but some ciliopathies present without any discernible structural deformities [19] [23]. Examining ciliary beat pattern from videos of ciliary biopsies is among the most promising methods, but this is usually conducted manually; consequently, it is tedious and time-consuming, and the eventua conclusions are highly subjective [17]. Furthermore, there is little consensus on deterministic definitions of ciliary motion phenotypes, ruling out any possibility of cross-institutional collaboration. While [16] proposed an initial ciliary motion quantification pipeline, it was limited in its reliance on manually- selected regions of interest on which to operate. Therefore, a fast and reliable method automating the region-of-interest selection process is desirable to clinicians and researchers, accelerating the adoption of objective ciliary motion measures and bringing an end-to-end ciliary motion analysis pipeline closer to fruition. To our knowledge, no computational pipeline for semantic cilia segmentation from microscopy images has been proposed using deep learning techniques. Deep learning approaches to semantic segmentation have gained state of the art performance on many benchmark datasets in biomedical imaging. The task of cilia segmentation is challenging due to their size, shape, and orientation. A single cilium can present itself in only a few pixels in an image depending on its orientation to the camera perspective. In addition, the hairlike shape of cilia can easily be mistaken for extraneous recording artifacts such as poor focus, inconsistent lighting, and blur from a shaky camera perspective. Another challenge in cilia detection is the assumption and reliance of temporal information about cilia. A model for automatic detection of cilia should ideally be able to segment and classify cilia that do not display a regular motion; diseased ciliary motion patterns may show little, if any, motility, thus the segmentation model should be able to identify cilia from a single frame of microscopy video. Fully convolutional networks (FCN) comprise of a feature extractor that feeds into an upsampling path to recover back an image of the predicted segmentation mask [15] [18]. Shortcut or skip connections can be introduced between the downsampling and upsampling paths to allow for passing of feature information to deeper layers. These skip connection allow for deep networks to be trained as the gradient can flow propagate more easily from deep layers. This type of architecture has been applied to biomedical imaging data as an end-to- end model for segmentation tasks [18]. 75TẠP CHÍ KHOA HỌC, Số 42, tháng 9 năm 2020 DenseNets [8] extend the idea of shortcut connections by concatenating the feature maps of every layer in the same dense block, similar to Inception style networks [21], effectively maximizing the information flow between layers. While densely connected layers add more parameters per layer due to increased number of shortcut connections, the overall number of parameters is reduced because less feature maps are needed in each layer. This property allows DenseNets to be very deep while remaining extremely parameter efficient. We propose that this attribute also makes DenseNets well suited for small datasets because dense connections naturally reduce overfitting and eliminate the need for heavy regularization in some cases [26]. Our main contributions in this paper are as follows: We establish an accurate and reliable computational model for semantic segmentation of cilia from a single frame of microscopy video. – We compare the performance of fully convolutional networks with varying numbers of dense connections to a baseline FCN without dense connections. – We highlight the advantageous properties of DenseNets that make this architecture extremely suitable for biomedical data with few labeled examples. 2. FULLY CONVOLUTIONAL DENSE NETWORKS In DenseNets, each layer has direct access to the gradients from loss and original input. The number of feature maps in each layer is controlled by a growth parameter k We implement a version of the Tiramisu network [11] with a growth rate k = 16 and train several models with different depths and hyper-parameter settings. 2.1 Network Architecture The Network comprises of dense blocks in both the downsampling and upsampling paths. Dense blocks are followed by transition down blocks in the downsampling path and preceded by transition up block in the upsampling path. More layers are stacked in each dense block until the bottleneck dense block in the middle of the network after which the number of layers decease in each subsequent dense block. Ship connections connect dense blocks in the downsampling and upsampling Figure 1: Diagram of FC-DenseNet 74. Yellow circles represent the concatenation operation and dashed lines represent skip connections. paths, facilitating information flow from ear- lier layers so that high-level features can be reused in deeper layers. A convolutional 76 TRƯỜNG ĐẠI HỌC HẢI PHÒNG layer is added before the first block and after the last block. An overview of the ar- chitecture for a fully convolutional DenseNet with 74 layers is shown in figure 1. nonebbl@ id@@englishid@@english. 2.2. Dense Blocks Whereas in Residual blocks [6] sum an identity mapping with a nonlinear transformation, dense blocks (DB) use the concatenation operation to maximize feature reuse between all layers in a DB to ease the flow of gradients to earlier layers during backpropagation. Layers in a DB are iteratively concatenated with the number of connections with layer L having L(2+1) 2 connections instead of just L. This means the number of feature maps in a DenseNet increase linearly with depth. Each layer in a DB comprises of a Batch Normalization [10] followed by a 3×3 convolution with Rectified Linear Units [7]. Dropout [20] and weight decay regularization are added to control overfitting. 2.3. Transition Blocks Transition down (TD) blocks reduce dimensionality of the feature maps. To this end, TD blocks contain a 1 × 1 convolution with Batch Normalization, ReLU activations, and Dropout. In the original Tiramisu paper [11], 2 × 2 max pooling is used in the TD blocks, however we find that using a stride of 2 instead obtains better results. In FC-DenseNet, Transition up (TU) and dense blocks replace the upsampling path of a FCN. TU blocks consist of 3 × 3 transposed convolutionswith stride 2 to match the stride of the downsampling path. 2.4 Training We train FC-DenseNets with different number of layers to study the optimal tradeoff between speed and accuracy. The Adam [12] optimizer was used to minimize the loss function and we vary several regularization parameters tunings: dropout, l2 weight decay, and learning rate annealing. All models weights were initialized with HeUniform initialization [7]. Each model was trained for 100 epochs with a batch size of 4 on 2× Titan X GPU and all models were implemented in Tensorflow with Keras [1, 3]. 3. EXPERIMENTS 3.1 Cilia Data 325 grayscale videos from two separate video cohorts of nasal brush biopsies were collected from 149 patients (data from [16]; see it for details on patient recruitment, nasal biopsy extraction, and spectroscopic techniques and technologies). Each video depicted cilia along with varying levels of recording artifacts such as extraneous camera movement, uneven lighting, or poor focus. A cilium structure observed vertically to the camera’s perspective appears very different than a cilium lining the side of the cell body. To address this, we separate cilia annotations into side cilia and top cilia class labels. Four class annotations (side, cilia, top cilia, cell body, and background) were manually segmented out using ITK-SNAP [25]. Only the first frame of each video is used so that the model avoids relying on any information about ciliary motion in the temporal dimension for segmentation. The pixels were normalized by subtracting 77TẠP CHÍ KHOA HỌC, Số 42, tháng 9 năm 2020 the mean pixel value and dividing by the standard deviation. The dimension of each image varied in resolution so all images were resized to 224×224 pixels and transformed by random flips to augment the data. We set aside 75 of the 325 images as a holdout test set and randomly choose samples from the remaining 250 for training and validation using a 60=40% train-validation split. 3.2 Evaluation Metric Because the task is multi-class segmentation, we train the optimizer to minimize sparse categorical cross entropy to account for the skew in distribution of pixel class labels. We evaluate models on overall pixel classification accuracy, using the class with the highest probability for each pixel as the predicted pixel class. Categorical cross entropy is defined as: ( )logL n y p 1 cce ij ij j i c i n 1 =- == // where i indexes samples, j indexes the number of classes, y is the sample label, and ( , ) . .p s t p0 1 1ij ijjd =/ 3.3. Baseline We select a U-Net [18] architecture model as our baseline as it incorporates skip connections between the downsampling and upsampling paths, similar to our DenseNet model. The baseline model uses pre-trained weights on ImageNet [4] Model Parameters Pre-trained Dropout Decay Accuracy U-Net 30 M Yes 0.3 0.001 76.90% FC-DenseNet 50 1.7 M No 0.2 0.001 85.23% FC-DenseNet 74 3.7 M No 0.1 0.0001 87.59% FC-DenseNet 103 9.6 M No 0.1 None 89.27% FC-DenseNet 136 33 M No None None 90.15% Table 1: Table of results. The parameters are represented in millions. In this table we conduct experiments with 5 network models. In which, because FC-DeseNet is rated for outstanding efficiency, we conduct experiments with different number of network layers to evaluate the effect of the number of network layers on the overall performance of the networ. 3.4. Results The best model attains, on average, 90.15% overall pixel accuracy on the holdout test set. This is a about 13% more accurate than the U-Net baseline. Even the FC-DenseNet with 50 layers is 8% more accurate than the baseline. This difference in performance is considerable especially since FC-DenseNet 50 has 10× fewer parameters than the baseline. In addition, despite achieving better performance, none of the DenseNet models were pre- trained on ImageNet. DenseNets also needed less regularization and converged to a minima faster than the baseline model. 78 TRƯỜNG ĐẠI HỌC HẢI PHÒNG Figure 2: Top row depicts grayscale frames of microscopy images. Middle row depicts ground truth masks. Bottom row is the predicted masks. Black represents the background class; dark gray represents the cell body; medium gray represents the side cilia; light gray represents the top cilia. 3.5 Benefits of Dense Connections These results show that dense connections are extremely efficient in facilitating increased the flow of information and gradients between layers. This allows the network to reuse feature maps and act as natural regularization by making information from earlier layers available to later layers. This is similar to Implicit Deep Supervision from loss function by shorter connections but with the added benefit of being less complicated than DSN. [9] show that each individual layer in ResNets [6] contribute little to the model overall and can be randomly dropped out during training. This redundancy in features is avoided in DenseNets through their dense connections, acting as a kind of inherent multi-scale supervision [14]. Because there is no need to relearn redundant feature maps in a DenseNet, there is less need for regularization, and in our experiments we found that too much regularization to even degrade performance accuracy. Sometimes the FC-DenseNet correctly predicted background cilia even when they were not annotated in the ground truth (Figure 2. 2nd column). Additionally, there is some ambiguity in the distinction between side and top cilia in the manual annotations. After demonstrating the effectiveness of segmentation from a single frame, subsequent frames of the video could 79TẠP CHÍ KHOA HỌC, Số 42, tháng 9 năm 2020 be taken into context to alleviate some of this ambiguity and further refine prediction by incorporating the temporal dimension. Unlike most fully convolutional networks, FC-DenseNets achieve excellent results without the need for pre-trained weights or post-processing steps such as Conditional Random Fields (CRF) to finetune the segmentation masks [2, 13]. However, since our model is trainable end- to-end, using transfer learning [24] and pre-training or CRF post-processing might well boost model performance even more. 4. CONCLUSIONS In this paper, we demonstrate the efficacy of a fully convolutional DenseNet on the challenging task of cilia segmentation and explore the implicit regularizing properties inherent to this network architecture. We also highlight the advantageous properties of fully convolution DenseNets for biomedical datasets with few labels for semantic segmentation. References 1. Abadi, M., et al., TensorFlow (2016), Large-Scale Machine Learning on Heterogeneous Distributed Systems. CoRR abs/1603.04467. 2. Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A., DeepLab (2016): Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. CoRR abs/1606.00915. 3. Chollet, F., Keras (2015), https://github. com/fchollet/keras. 4. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, k., Fei-Fei, L.: ImageNet (2009), A Large- Scale Hierarchial Image Database. Conference on Computer Vision and Pattern Recognition. 5. Drozdzal, M., Vorontsov, E., Chartrand, G., Kadoury, S., Pal, C. (2016), The Importance of Skip Connections in Biomedical Image Segmentation. CoRR abs/1608.04117. 6. He, K., Zhang, X., Ren, S., Sun, J. (2015), Deep Residual Learning for Image Recognition. CoRR abs/1512.03385. 7. He, K., Zhang, X., Ren, S., Sun, J., Delving Deep into Rectifiers (2015), Surpassing Human- Level Performance on ImageNet Classification. CoRR abs/1502.01852. 8. Huang, G., Liu, Z., Weinberger, K. (2016), Densely Connected Convolutional Networks. CoRR abs/1608.06993. 9. Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K. (2016), Deep Networks with Stochastic Depth. CoRR abs/1603.09382. 10. Ioffe, S., Szegedy, C. (2015), Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. CoRR abs/1502.03167. 11. J†gou, S., Drozdzal, M., V¡zquez, D., Romero, A., Bengio, Y. (2016), The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation. CoRR abs/1611.09326. 12. Kingma, D., Ba, J., Adam (2014), A Method for Stochastic Optimization CoRR abs/1412.6980. 13. Kr¨ahenb¨uhl, P., Koltun, V. (2012), Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. CoRR abs/1210.5644. 14. Lee, C.-Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z. (2014), Deeply-Supervised Nets. CoRR abs/1409.5185. 15. Long, J., Shelhamer, E., Darrell, T. (2014), Fully Convolutional Networks for Semantic Segmentation. CoRR abs/1411.4038. 16. Quinn, S., Zahid, M.J., Durkin, J.R., Francis, R.J., Lo, C.W., Chennubhotla, S.C. 80 TRƯỜNG ĐẠI HỌC HẢI PHÒNG (2015), Automated identification of abnormal respiratory ciliary motion in nasal biopsies. Science Translational Medicine, 7(299), pp. 299ra124–299ra124. 17. Raidt, J. et al. (2014), Ciliary beat pattern and frequency in genetic variants of primary ciliary dyskinesia. European Respiratory Journal, pp. erj00520–2014. 18. Ronneberger, O., Fischer, P., Brox, T. (2015), U-Net: Convolutional Networks for Biomedical Image Segmentation. CoRR abs/1505.04597. 19. Stannard, W.A., Chilvers, M.A., Rutman, A.R., Williams, C.D., O’Callaghan, C. (2010), Diagnostic testing of patients suspected of primary ciliary dyskinesia. American journal of respiratory and critical care medicine, 181(4), pp. 307–314. 20. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. (2014), Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15, 1929–1958. 21. Szegedy, C., Vanhoucke, V, Ioffe, S., Shlens, J., Wojna, Z. (2015), Rethinking the Inception Architecture for Computer Vision. CoRR abs/1512.00567. 22. Waters, A.M., Beales, P.L. (2011), Ciliopathies: an expanding disease spectrum. Pediatric Nephrology 26(7), 1039–1056. 23. Walkter, W.T., Jackson, C.L., Lackie, P.M., Hogg, C., Lucas, J.S. (2012), Nitric oxide in primary ciliary dyskinesia. European Respiratory Journal 40 (4), 1024–1032. 24. Yosinski, J., Clune, J., Bengio, Y., Lipson, H. (2014), How transferable are features in deep neural networks? CoRR abs/1411.1792. 25. Yushkevich, et al. (2006), User-Guided 3D Active Contour Segmentation of Anatomical Structures, Significantly Improved Efficiency and Reliability. Neuroimage 31-3, 1116–1128. 26. Xu, Weiwen (2019), Cilia Segmentation in Medical Videos with Fourier Convolutional Neural Network. Diss. University of Georgia.

Các file đính kèm theo tài liệu này:

semantic_segmentation_of_cilia_using_fully_convolutional_den.pdf