Papers

Memos for papers, which are related to ML, CV and NLP.

CV

Recognition

Wide Residual Networks
Densely Connected Convolutional Networks
Deep Pyramidal Residual Networks with Separated Stochastic Depth
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
Dual Path Networks
CondenseNet: An Efficient DenseNet using Learned Group Convolutions
Recurrent Models of Visual Attention

Detection (Including Instance Segmentation)

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
SSD: Single Shot MultiBox Detector
Feature Pyramid Networks for Object Detection
DSSD : Deconvolutional Single Shot Detector
Speed/accuracy trade-offs for modern convolutional object detectors
Focal Loss for Dense Object Detection
DetNet: A Backbone network for Object Detection
Light-Head R-CNN: In Defense of Two-Stage Object Detector
Fully Convolutional Instance-aware Semantic Segmentation
Mask R-CNN
Fast and accurate object detection in high resolution 4K and 8K video using GPUs
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

Pedestrian detection

Faster R-CNN with Densenet for scale aware pedestrian detection vis-a-vis head negative suppression

Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation

Captioning

Self-critical Sequence Training for Image Captioning
Show and Tell: A Neural Image Caption Generator
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Deep visual-semantic alignments for generating image descriptions

GAN

Generative Adversarial Nets
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Image-to-Image Translation with Conditional Adversarial Networks
cGAN-based Manga Colorization Using a Single Training Image
Learning from Simulated and Unsupervised Images through Adversarial Training

Robust Reading

TextBoxes++: A Single-Shot Oriented Scene Text Detector
Synthetic data generation for Indic handwritten text recognition
Reading Scene Text with Attention Convolutional Sequence Modeling

Visualization

SmoothGrad: removing noise by adding noise

Video

Tracking

Improving Online Multiple Object tracking with Deep Metric Learning
SIMPLE ONLINE AND REALTIME TRACKING

Detection

Mobile Video Object Detection with Temporally-Aware Feature Maps
Towards High Performance Video Object Detection for Mobiles

Else

Multiple Frames Matching for Object Discovery in Video
Unsupervised Learning of Video Representations using LSTMs
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
DeepMark: One-Shot Clothing Detection

3D

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Else

FlowNet: Learning Optical Flow with Convolutional Networks
FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
Learning to Compose Domain-Specific Transformations for Data Augmentation
Spatial Transformer Networks

NLP

NMT

Effective Approaches to Attention-based Neural Machine Translation
Neural Machine Translation by Jointly Learning to Align and Translate
Sequence to Sequence Learning with Neural Networks
Attention Is All You Need

ML

Positive-Unlabeled Learning with Non-Negative Risk Estimator

ELSE

Unsupervised Deep Embedding for Clustering Analysis
Attention-Based Models for Speech Recognition
Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex
What’s your ML Test Score? A rubric for ML production systems
Multimodal Emoji Prediction
Born Again Neural Networks
Digital Auditor: A Framework for Matching Duplicate Invoices
Pedestrian Detection: A Benchmark

papers
papers copied to clipboard

Metadata

Papers

CV

Recognition

Detection (Including Instance Segmentation)

Pedestrian detection

Semantic Segmentation

Captioning

GAN

Robust Reading

Visualization

Video

Tracking

Detection

Else

3D

Else

NLP

NMT

ML

ELSE

← Metadata

Owner

Metadata

papers papers copied to clipboard

Metadata

Papers

CV

Recognition

Detection (Including Instance Segmentation)

Pedestrian detection

Semantic Segmentation

Captioning

GAN

Robust Reading

Visualization

Video

Tracking

Detection

Else

3D

Else

NLP

NMT

ML

ELSE

← Metadata

Owner

Metadata

papers
papers copied to clipboard