<pre id="dnv3r"><output id="dnv3r"><delect id="dnv3r"></delect></output></pre>
<address id="dnv3r"><pre id="dnv3r"><p id="dnv3r"></p></pre></address>

<pre id="dnv3r"><output id="dnv3r"><delect id="dnv3r"></delect></output></pre><output id="dnv3r"><delect id="dnv3r"><menuitem id="dnv3r"></menuitem></delect></output>
<pre id="dnv3r"><output id="dnv3r"></output></pre>

<noframes id="dnv3r">

<pre id="dnv3r"></pre>

<p id="dnv3r"></p>
<p id="dnv3r"><delect id="dnv3r"></delect></p>

<pre id="dnv3r"></pre>

<pre id="dnv3r"><output id="dnv3r"></output></pre>
<p id="dnv3r"><output id="dnv3r"></output></p>

<p id="dnv3r"></p>
<address id="dnv3r"><p id="dnv3r"><p id="dnv3r"></p></p></address>
<p id="dnv3r"><delect id="dnv3r"></delect></p>
<pre id="dnv3r"></pre>
<p id="dnv3r"></p>

<p id="dnv3r"></p>

<pre id="dnv3r"></pre>
<p id="dnv3r"></p>

<p id="dnv3r"><output id="dnv3r"><delect id="dnv3r"></delect></output></p><p id="dnv3r"><output id="dnv3r"></output></p>
<p id="dnv3r"><output id="dnv3r"><menuitem id="dnv3r"></menuitem></output></p><pre id="dnv3r"><output id="dnv3r"></output></pre>
<p id="dnv3r"></p>

<p id="dnv3r"><p id="dnv3r"><output id="dnv3r"></output></p></p>
<noframes id="dnv3r">

<p id="dnv3r"></p>

<p id="dnv3r"></p>
<p id="dnv3r"></p>

<output id="dnv3r"></output>

<p id="dnv3r"><output id="dnv3r"></output></p>

<pre id="dnv3r"><output id="dnv3r"><delect id="dnv3r"></delect></output></pre>
<p id="dnv3r"><output id="dnv3r"></output></p>
<output id="dnv3r"><delect id="dnv3r"><menuitem id="dnv3r"></menuitem></delect></output>
<pre id="dnv3r"><p id="dnv3r"></p></pre><pre id="dnv3r"><output id="dnv3r"></output></pre>

<pre id="dnv3r"></pre>

<pre id="dnv3r"></pre>
<pre id="dnv3r"></pre>

<p id="dnv3r"></p>

<pre id="dnv3r"></pre>

<p id="dnv3r"></p>
<p id="dnv3r"></p><p id="dnv3r"></p>
<p id="dnv3r"><delect id="dnv3r"></delect></p>
<pre id="dnv3r"></pre>

<noframes id="dnv3r">

<pre id="dnv3r"><output id="dnv3r"><menuitem id="dnv3r"></menuitem></output></pre>

<p id="dnv3r"></p>

<p id="dnv3r"></p>

<pre id="dnv3r"><p id="dnv3r"><delect id="dnv3r"></delect></p></pre>

<pre id="dnv3r"></pre>

<pre id="dnv3r"><output id="dnv3r"></output></pre>

<p id="dnv3r"></p>

<pre id="dnv3r"></pre>

<p id="dnv3r"><p id="dnv3r"><delect id="dnv3r"></delect></p></p>
<p id="dnv3r"><delect id="dnv3r"><listing id="dnv3r"></listing></delect></p><p id="dnv3r"></p>

<p id="dnv3r"></p>

<noframes id="dnv3r"><pre id="dnv3r"><output id="dnv3r"></output></pre><address id="dnv3r"><p id="dnv3r"></p></address><noframes id="dnv3r"><p id="dnv3r"><output id="dnv3r"></output></p>

<p id="dnv3r"></p><pre id="dnv3r"></pre>

<p id="dnv3r"><delect id="dnv3r"></delect></p>
<pre id="dnv3r"><p id="dnv3r"></p></pre>

<p id="dnv3r"><output id="dnv3r"></output></p>
<pre id="dnv3r"></pre>
<p id="dnv3r"><delect id="dnv3r"></delect></p>
<pre id="dnv3r"><delect id="dnv3r"></delect></pre><p id="dnv3r"></p>

<output id="dnv3r"></output>

<noframes id="dnv3r"><output id="dnv3r"></output>

<p id="dnv3r"><output id="dnv3r"></output></p>

<pre id="dnv3r"><output id="dnv3r"></output></pre>

<noframes id="dnv3r"><p id="dnv3r"></p>

<p id="dnv3r"><output id="dnv3r"></output></p>
<noframes id="dnv3r"><output id="dnv3r"></output>

<pre id="dnv3r"></pre>

<p id="dnv3r"><output id="dnv3r"></output></p>

<p id="dnv3r"></p>

<pre id="dnv3r"></pre><pre id="dnv3r"><output id="dnv3r"></output></pre>

已发表的部分论文

To promote the applications of semantic segmentation, quality evaluation is important to assess different algorithms and guide their development and optimization. In this paper, we establish a subjective semantic segmentation quality assessment database based on stimulus-comparison method. Given that the database reflects the relative quality of semantic segmentation result pairs, we adopt a robust regression mapping model to explore the relationship between subjective assessment and objective distance. With the help of the regression model, we can examine whether objective metrics coincide with subjective judgement. In addition, we propose a novel Relative Quality Prediction Network (RQPN) based on Siamese CNN as a new objective metric. The metric is trained by our subjective assessment database and can be applied to evaluate the performances of semantic segmentation algorithms, even if the algorithms did not be used to build the database. Experiments are conducted to show the advance and reliability of our database and demonstrate that results predicted by RQPN are more consistent to subjective assessment than existing objective metrics.
Image compression is essential for remote sensing due to the large volume of produced remote sensing imagery and system's limited transmission or storage capacity. As one of the most important applications, classification might be affected due to the introduced distortion during compression. Hence, we perform a quantitative study on the effects of compression on remote sensing image classification and propose a method to estimate the remote sensing image classification accuracy based on fractal analysis. Multiscale feature extraction is performed and a multiple kernel learning approach is proposed accordingly. The experimental results on our established database indicate that the classification accuracy predicted by our method exhibits high consistency with the ground truth and our method shows its superiority when compared with other classical reference algorithms.
Inspired by the characteristics of the human visual system, a novel method is proposed for detecting visually salient regions on 3D point clouds. First, the local distinctness of each point is evaluated based on the difference with its local surroundings. Then the point cloud is decomposed into small clusters and the initial global rarity value of each cluster is calculated; a random walk ranking method is then used to introduce cluster-level global rarity refinement to each point in all clusters. Finally, an optimization framework is proposed to integrate both the local distinctness and the global rarity values to obtain the final saliency detection result of the point cloud. We compare the proposed method with several relevant algorithms and apply it to some computer graphics applications such as interest point detection, viewpoint selection and mesh simplification. The experimental results demonstrate the superior performance of the proposed method.
Reinforcement learning (RL) has shown its advantages in image captioning by optimizing the non-differentiable metric directly in the reward learning process. However, due to the reward hacking problem in RL, maximizing reward may not lead to better quality of the caption, especially from the aspects of propositional content and distinctiveness. In this work, we propose to use a new learning method, meta learning, to utilize supervision from the ground truth whilst optimizing the reward function in RL. To improve the propositional content and the distinctiveness of the generated captions, the proposed model provides the global optimal solution by taking different gradient steps towards the supervision task and the reinforcement task, simultaneously. Experimental results on MS COCO validate the effectiveness of our approach when compared with the state-of-the-art methods.
In this paper, we propose a novel two-stream neural network for video saliency prediction. Unlike some traditional methods developed based on hand-crafted feature extraction and integration, our proposed method automatically learns saliency related spatiotemporal features from human fixations without any pre-processing, post-processing or manual tuning. Video frames are routed through the spatial stream network to compute static or color saliency maps for each of them. And a new two-stage temporal stream network is proposed, which is composed by a pre-trained 2D-CNN model (SF-Net) to extract saliency related features and a shallow 3D-CNN model (Te-Net) to process these features, for temporal or dynamic saliency maps. It can reduce requirement of video gaze data, improve training efficiency and achieve high performance. A fusion network is adopted to combine the outputs of both streams and generate the final saliency maps. Besides, a Convolutional Gaussian Priors (CGP) layer is proposed to learn the bias phenomenon in viewing behavior to improve the performance of video saliency prediction. The proposed method is compared with state-of-the-art saliency models on two public video saliency benchmark datasets. Results demonstrate that our model can achieve advanced performance on video saliency prediction.
In this paper, an optimized Coding Tree Unit (CTU)-level rate control approach is proposed for low-delay high efficiency video coding (H.265/HEVC). Unlike the traditional explicitly estimated rate-distortion (R-D) model, the distributions of parameters of the estimated R-D model are considered. Accordingly, we formulate the CTU-level rate allocation for H.265/HEVC as the decision making problem, where the objective is to minimize the summation of CTUs’ expected distortion under the given rate constraint. Moreover, a two-stage Bisection based method is proposed to solve the optimization problem, as the rate control algorithm for H.265/HEVC. The R-D performance improvements on Bj?ntegaard delta bit rate (BD-BR) demonstrate the advances of the proposed method.
Dictionary learning has emerged as a powerful tool for a range of image processing applications and a proper dictionary always plays a key issue to the final achievable performance. In this paper, a class-oriented discriminative dictionary learning (CODDL) method is presented for image classification applications. It takes a comprehensive consideration of multiple optimization objectives, emphasizing class discrimination of both dictionary atoms and representation coefficients. The atoms of the learned dictionary should be grouped into class level sub-dictionaries. Meanwhile, the sparse representation coefficients of an input sample should be concentrated on the sub-dictionary of the class it belongs to. Then, based on the learned class-oriented discriminative dictionary, the structured representation coefficients can thus be used for image classification with a simple and efficient classification scheme. The superior performance of the proposed algorithm is demonstrated through extensive experiments.
With the increasing popularity of micro-video sharing where people shoot short-videos effortlessly and share their daily stories on social media platforms, the micro-video recommendation has attracted extensive research efforts to provide users with micro-videos that interest them. In this paper, a hypothesis we explore is that, not only do users have multi-modal interest, but micro-videos have multi-modal targeted audience segments. As a result, we propose a novel framework User-Video Co-Attention Network (UVCAN), which can learn multi-modal information from both user and microvideo side using attention mechanism. In addition, UVCAN reasons about the attention in a stacked attention network fashion for both user and micro-video. Extensive experiments on two datasets collected from Toffee present superior results of our proposed UVCAN over the state-of-the-art recommendation methods, which demonstrate the effectiveness of the proposed framework.
Spectral decorrelation has been considered an important approach in multispectral image compression to remove the redundancy among bands. In this paper, we propose a novel adaptive spectral decorrelation (ASD) method based on clustering analysis for Moderate Resolution Imaging Spectroradiometer (MODIS) image compression. The remote sensing image bands are divided into different clusters by the method based on density peak clustering. Then reversible Karhunen-Loeve transform and polynomial least square estimation are employed to reduce the band redundancy, achieving an effective spectral decorrelation. As shown by the experimental results, our method achieves remarkable bit-saving when compared to the state-of-the-arts algorithms on MODIS image dataset.
In this paper, we present an end-to-end multi-level fusion based framework for 3D object detection from a single monocular image. The whole network is composed of two parts: one for 2D region proposal generation and another for simultaneously predictions of objects’ 2D locations, orientations, dimensions, and 3D locations. With the help of a stand-alone module to estimate the disparity and compute the 3D point cloud, we introduce the multi-level fusion scheme. First, we encode the disparity information with a front view feature representation and fuse it with the RGB image to enhance the input. Second, features extracted from the original input and the point cloud are combined to boost the object detection. For 3D localization, we introduce an extra stream to predict the location information from point cloud directly and add it to the aforementioned location prediction. The proposed algorithm can directly output both 2D and 3D object detection results in an endto-end fashion with only a single RGB image as the input. The experimental results on the challenging KITTI benchmark demonstrate that our algorithm significantly outperforms monocular state-of-the-art methods.
In this paper, a novel deep architecture named BraidNet is proposed for person re-identification. BraidNet has a specially designed WConv layer, and the cascaded WConv structure learns to extract the comparison features of two images, which are robust to misalignments and color differences across cameras. Furthermore, a Channel Scaling layer is designed to optimize the scaling factor of each input channel, which helps mitigate the zero gradient problem in the training phase. To solve the problem of imbalanced volume of negative and positive training samples, a Sample Rate Learning strategy is proposed to adaptively update the ratio between positive and negative samples in each batch. Experiments conducted on CUHK03-Detected, CUHK03-Labeled, CUHK01, Market-1501 and DukeMTMC-reID datasets demonstrate that our method achieves competitive performance when compared to state-of-the-art methods.
In this paper, a novel image captioning approach is proposed to describe the content of images. Inspired by the visual processing of our cognitive system, we propose a visual-semantic LSTM model to locate the attention objects with their low-level features in the visual cell, and then successively extract high-level semantic features in the semantic cell. In addition, a state perturbation term is introduced to the word sampling strategy in the REINFORCE based method to explore proper vocabularies in the training process. Experimental results on MS COCO and Flickr30K validate the effectiveness of our approach when compared to the stateof-the-art methods.
Predicting scanpath when a certain stimulus is presented plays an important role in modeling visual attention and search. This paper presents a model that integrates convolutional neural network and long short-term memory (LSTM) to generate realistic scanpaths. The core part of the proposed model is a dual LSTM unit, i.e., an inhibition of return LSTM (IOR-LSTM) and a region of interest LSTM (ROI-LSTM), capturing IOR dynamics and gaze shift behavior simultaneously. IOR-LSTM simulates the visual working memory to adaptively integrate and forget scene information. ROI-LSTM is responsible for predicting the next ROI given the inhibited image features. Experimental results indicate that the proposed architecture can achieve superior performance in predicting scanpaths.
Emerging virtual reality (VR) applications bring much challenge to video coding for 360-degree videos. To compress this kind of video, each picture should be projected to a 2D plane (e.g. equirctangular projection map) first, adapting to the input of existing video coding systems. At the display side, an inverse projection is performed before viewport rendering. However, such a project introduces much different levels of distortions depending on the location, which makes the rate-distortion optimization process in video coding much inefficient. In this paper, we consider the distortion in spherical domain and analyse its influence to the rate-distortion optimization process. Then we derive the optimal rate-distortion relationship in spherical domain and present its optimal solution based on HEVC/H.265. Experimental results show that the proposed method can bring up to 11.5% bit-saving compared with the current HEVC/H.265 anchor for 360-degree video coding.
By analyzing the photographer's intention towards the photo, we can have a better understanding of what the photographers want to convey to the viewers, thus helping to improve image analyzing and understanding. In this paper, a novel method is presented to improve saliency detection based on exploring the relationship between the photographer's intention and the viewer's attention. An intention rate is derived to quantify the intention and integrated with traditional saliency detection in a unified framework accordingly. We evaluate the proposed scheme with several classic saliency detection algorithms on different datasets. The experiments demonstrate the superior improvements of our method.
Thermal cameras can capture images invariant to illumination conditions. However, thermal facial images are difficult to be recognized by human examiners. In this letter, an end-to-end framework, which consists of a generative network and a detector network, is proposed to translate thermal facial images into visible ones. The generative network aims at generating visible images given the thermal ones. The detector can locate important facial landmarks on visible faces and help the generative network to generate more realistic images that are easier to be recognized. As demonstrated in the experiments, the faces generated by our method have good visual quality and maintain identity preserving features.
How to exploit various features of users and points of interest (POIs) for accurate POI recommendation is important in location-based social networks (LBSNs). In this paper, a novel POI recommendation framework, named RecNet, is proposed, which is developed based on a deep neural network (DNN) to incorporate various features in LBSNs and learn their joint influence on user behavior. More specifically, co-visiting, geographical and categorical influences in LBSNs are exploited to alleviate the data sparsity issue in POI recommendation and are converted to feature vector representations of POIs and users via feature embedding. Moreover, the embedded POIs and users are fed into a DNN pairwise to adaptively learn high-order interactions between features. Our method is evaluated on two publicly available LBSNs datasets and experimental results show that RecNet outperforms state-of-the-art algorithms for POI recommendation.
In this paper, we review the recent advances in the pipeline of omnidirectional video processing including projection and evaluation. Being distinct from the traditional video, the omnidirectional video, also called panoramic video or 360 degree video, is in the spherical domain, thus specialized tools are necessary. For this type of video, each picture should be projected to a 2-D plane for encoding and decoding, adapting to the input of existing video coding systems. Thus the coding influence of the projection and the accuracy of the evaluation method are very important in this pipeline. Recent advances, such as different projection methods benefiting video coding, specialized video quality evaluation metrics and optimized methods for transmission, are all presented and classified in this paper. In addition, the coding performances under different projection methods are specified. The future trends of omnidirectional video processing are also discussed.
Achieving a stable video quality is an important task in video compression. In this paper, we present a multi-layer quantization control method to compress video to a certain target video quality based on the new hierarchical partition structure of H.265/HEVC. We first model the rate-quality characteristics of different video sequences using ν-support vector regression with a Gaussian radial basis function as the kernel function. Then, we propose a Kalman filter-based multi-layer quantization control method for the quality-constrained video coding. The advantage of our proposed approach in H.265/HEVC is demonstrated through experimental results. Compared with the other H.265/HEVC rate control algorithms and the fixed-QP scheme, the proposed method achieves a more stable video quality.

Publication List on DBLP

Copyright © 2016 Lab. for Intelligent Information Processing (LabIIP). All rights reserved.

a彩彩票登录