papers icon indicating copy to clipboard operation
papers copied to clipboard

This is a repository for summarizing papers especially related to machine learning.

Papers

Memos for papers, which are related to ML, CV and NLP.

CV

Recognition

  • Wide Residual Networks
  • Densely Connected Convolutional Networks
  • Deep Pyramidal Residual Networks with Separated Stochastic Depth
  • SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
  • Dual Path Networks
  • CondenseNet: An Efficient DenseNet using Learned Group Convolutions
  • Recurrent Models of Visual Attention

Detection (Including Instance Segmentation)

  • Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
  • SSD: Single Shot MultiBox Detector
  • Feature Pyramid Networks for Object Detection
  • DSSD : Deconvolutional Single Shot Detector
  • Speed/accuracy trade-offs for modern convolutional object detectors
  • Focal Loss for Dense Object Detection
  • DetNet: A Backbone network for Object Detection
  • Light-Head R-CNN: In Defense of Two-Stage Object Detector
  • Fully Convolutional Instance-aware Semantic Segmentation
  • Mask R-CNN
  • Fast and accurate object detection in high resolution 4K and 8K video using GPUs
  • Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

Pedestrian detection

  • Faster R-CNN with Densenet for scale aware pedestrian detection vis-a-vis head negative suppression

Semantic Segmentation

  • Fully Convolutional Networks for Semantic Segmentation
  • SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
  • U-Net: Convolutional Networks for Biomedical Image Segmentation

Captioning

  • Self-critical Sequence Training for Image Captioning
  • Show and Tell: A Neural Image Caption Generator
  • Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
  • Deep visual-semantic alignments for generating image descriptions

GAN

  • Generative Adversarial Nets
  • Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
  • Image-to-Image Translation with Conditional Adversarial Networks
  • cGAN-based Manga Colorization Using a Single Training Image
  • Learning from Simulated and Unsupervised Images through Adversarial Training

Robust Reading

Visualization

  • SmoothGrad: removing noise by adding noise

Video

Tracking

  • Improving Online Multiple Object tracking with Deep Metric Learning
  • SIMPLE ONLINE AND REALTIME TRACKING

Detection

  • Mobile Video Object Detection with Temporally-Aware Feature Maps
  • Towards High Performance Video Object Detection for Mobiles

Else

  • Multiple Frames Matching for Object Discovery in Video

  • Unsupervised Learning of Video Representations using LSTMs

  • Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

  • DeepMark: One-Shot Clothing Detection

3D

  • PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Else

NLP

NMT

  • Effective Approaches to Attention-based Neural Machine Translation
  • Neural Machine Translation by Jointly Learning to Align and Translate
  • Sequence to Sequence Learning with Neural Networks
  • Attention Is All You Need

ML

  • Positive-Unlabeled Learning with Non-Negative Risk Estimator

ELSE

  • Unsupervised Deep Embedding for Clustering Analysis
  • Attention-Based Models for Speech Recognition
  • Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex
  • What’s your ML Test Score? A rubric for ML production systems
  • Multimodal Emoji Prediction
  • Born Again Neural Networks
  • Digital Auditor: A Framework for Matching Duplicate Invoices
  • Pedestrian Detection: A Benchmark