Deep-Object-Detection icon indicating copy to clipboard operation
Deep-Object-Detection copied to clipboard

图解物体检测 & 网络框架. Inspired by awesome object detection, deep object detection does a easy way for understanding in Chinese.

图解物体检测 & 网络框架

Inspired by awesome object detection, deep object detection does a easy way for understanding in Chinese.

目录

  • 图解网络架构
    • LeNet_AlexNet
    • LeNet_AlexNet_Keras代码实现
    • VGG16网络与代码实现
    • VGG19网络与代码实现
    • Resnet
    • Inception-v4: 2016
    • SqueezeNet:2016
    • DenseNet:2016
    • Xception:2016
    • ResNeXt:2016
    • ROR: 2016
    • MobileNet-v1:2017
    • ShuffleNet:2017
    • SENet : 2017
    • MobileNet-V2:2018
    • ShuffleNet-V2: 2018
    • MobileNet-V3: 2019
    • EfficientNet: 2019
    • Transformer in Transformer: 2021
    • ViT-Image Recognition at Scale: 2021
    • Perceiver: 2021
  • 图解Object_Detection框架
    • Multi-stage Object Detection
      • RCNN : 2014
      • SPPnet : 2014
      • FCN : 2015
      • Fast R-CNN : 2015
      • Faster R-CNN : 2015
      • FPN : 2016
      • Mask R-CNN : 2017
      • Soft-NMS : 2017
      • Segmentation is all you need : 2019
    • Single Stage Object Detection
      • DenseBox : 2015
      • SSD : 2016
      • YoLov2 : 2016
      • RetinaNet : 2017
      • YoLov3 : 2018
      • M2Det : 2019
      • CornerNet-Lite : 2019
  • 图解 Action Classification
    • :lemon: MLAD :date: 2021.03.04v1 :blush: University of Central Florida
  • 数据集Object_Detection
    • General Dataset
    • Animal
    • Plant
    • Food
    • Transportation
    • Scene
    • Face

图解网络架构

LeNet_AlexNet

LeNet_AlexNet_Keras代码实现

LeNet-Keras for mnist handwriting digital image classification

LeNet-Keras restructure

Accuracy: 98.54%

===================================

AlexNet-Keras for oxflower17 image classification

AlexNet-Keras restructure: 修改后的网络 val_acc: ~80%, 过拟合

===================================

VGG16网络与代码实现

VGG16 Keras 官方代码实现

VGG16-Keras oxflower17 物体分类: 修改后的网络 val_acc: ~86.4%, 过拟合

VGG19网络与代码实现

VGG19 Keras 官方代码实现

Resnet

===================================

Inception-v4: 2016

  • Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

SqueezeNet:2016

  • SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

DenseNet:2016

  • DenseNet : Densely Connected Convolutional Networks

  • DenseNet- Github

    • Dense Block 层间链接采用concat, 而不是按元素add

Xception:2016

  • Xception: Deep Learning with Depthwise Separable Convolutions

ResNeXt:2016

  • ResNeXt: Aggregated Residual Transformations for Deep Neural Networks

ROR: 2016

  • ROR - Residual Networks of Residual Networks: Multilevel Residual Networks

MobileNet-v1:2017

ShuffleNet:2017

SENet : 2017

  • SENet Squeeze-and-Excitation Networks

MobileNet-V2:2018

  • MobileNetV2 : Inverted Residuals and Linear Bottlenecks

  • 图解MobileNetv2:

ShuffleNet-V2: 2018

  • ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

MobileNet-V3: 2019

EfficientNet: 2019

  • EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Transformer in Transformer: 2021

ViT-Image Recognition at Scale: 2021

Perceiver: 2021

  • Perceiver : General Perception with Iterative Attention

ViT image classification - Keras code

=============================

图解Object_Detection框架

通用文档

2010

2011

2013

2014

2017

2018

===========================

Multi-stage Object Detection

RCNN : 2014

SPPnet : 2014

FCN : 2015

Fast R-CNN : 2015

Faster R-CNN : 2015

FPN : 2016

Mask R-CNN : 2017

Soft-NMS : 2017

Segmentation is all you need : 2019

============================

Single Stage Object Detection

DenseBox : 2015

SSD : 2016

  • SSD: Single Shot MultiBox Detector - ECCV

    • 工作流程:

      • 特征提取网络为VGG-16, 边界框 和 分类 为特征图金字塔
    • 网络架构:

    • 损失函数:

      • 位置Smooth L1 Loss 和 多分类Softmax 的和

YoLov2 : 2016

  • YOLOv2 YOLO9000: Better, Faster, Stronger

    • 工作流程:

      • 在图像分类任务上预训练 CNN网络

      • 图像拆分为单元格, 如果一个对象的中心在一个单元格内,该单元格就“负责”检测该对象

        每个单元预测(a)边界框位置,(b)置信度分数,(c)以边界框中的对象的存在为条件的对象类的概率

      • 修改预训练的CNN的最后一层以输出预测张量

    • 网络架构:

    • 损失函数:

      • 2部分组成: 边界框回归 和 分类条件概率 - 都采用平方差的和

RetinaNet : 2017

  • RetinaNet:Focal Loss for Dense Object Detection

    • 工作流程:

      • 焦点损失为明显的,容易错误分类的情况(具有噪声纹理或部分对象的背景)分配更多权重,并且降低简单情况权重(明显空白背景)

      • 特征提取网络为ResNet, 特征金字塔提高检测性能

    • 网络架构:

YoLov3 : 2018

  • YOLOv3: An Incremental Improvement

    • bbox 预测使用尺寸聚类

      • 每个box有4个坐标

      • 训练时, 使用误差平方和损失函数 sum of squared error loss

      • bbox object分值, 用 logistic regression

      • 分类器 使用 logistic regression, 损失函数binary cross-entropy

    • 借鉴了 FPN 网络

    • 特征提取卷积网络

      • 3x3, 1x1 卷积层交替

      • 借鉴了 ResNet, 使用了直连, 分别从卷积层或直连层进行直连

M2Det : 2019

CornerNet-Lite : 2019

  • CornerNet-Lite : Efficient Keypoint Based Object Detection
    • CornerNet-Saccade: 处理特征图的像素, 一个裁剪多个检测; 离线处理
    • CornetNet-Squeeze: 骨干网络, 使用SqueezeNet, 沙漏架构; 实时处理

参考资料: 目标检测算法总结

=============================

图解 Action Classification

:lemon: MLAD :date: 2021.03.04v1 :blush: University of Central Florida

Network

=============================

数据集Object_Detection

不确定每个数据集都包含完整的物体检测数据标注。

General Dataset

Animal

Stanford Dogs 🐶 Dataset : Over 20,000 images of 120 dog breeds

  • Context

    The Stanford Dogs dataset contains images of 120 breeds of dogs from around the world. This dataset has been built using images and annotation from ImageNet for the task of fine-grained image categorization. It was originally collected for fine-grain image categorization, a challenging problem as certain dog breeds have near identical features or differ in colour and age.

    来源于imagenet, 用于图像细粒度分类

  • Content

    • Number of categories: 120
    • Number of images: 20,580
    • Annotations: Class labels, Bounding boxes

Honey Bee pollen : High resolution images of individual bees on the ramp

  • Context

    This image dataset has been created from videos captured at the entrance of a bee colony in June 2017 at the Bee facility of the Gurabo Agricultural Experimental Station of the University of Puerto Rico.

    识别 蜜蜂 🐝 授粉 或者 未授粉

  • Content

    • images/ contains images for pollen bearing and no pollen bearing honey bees.

      • The prefix of the images names define their class: e.g. NP1268-15r.jpg for non-pollen and P7797-103r.jpg for pollen bearing bees.
      • The numbers correspond to frame and item number respectively, you need to be careful that they are not numbered sequentially.
    • Read-skimage.ipynb Jupyter notebook for simple script to load the data and create the dataset using skimage library.

Plant

Flowers Recognition : This dataset contains labeled 4242 images of flowers.

  • Context

    This dataset contains 4242 images of flowers. The data collection is based on the data flicr, google images, yandex images. You can use this datastet to recognize plants from the photo.

  • Content

    • five classes: chamomile, tulip, rose, sunflower, dandelion
    • each class there are about 800 photos
    • resolution: about 320x240 pixels

VGG - 17 Category Flower Dataset

  • Context

    • 17 category flower dataset with 80 images for each class
    • 80 images for each category
  • Content

    • The datasplits used in this paper are specified in datasplits.mat

    • There are 3 separate splits. The results in the paper are averaged over the 3 splits.

    • Each split has a training file (trn1,trn2,trn3), a validation file (val1, val2, val3) and a testfile (tst1, tst2 or tst3).

VGG - 102 Category Flower Dataset

  • Context

    • 102 category dataset, consisting of 102 flower categories
    • Each class consists of between 40 and 258 images
  • Content

    • The datasplits used in this paper are specified in setid.mat.

    • The results in the paper are produced on a 103 category database. - - The two categories labeled Petunia have since been merged since they are the same.

    • There is a training file (trnid), a validation file (valid) and a testfile (tstid).

Fruits 360 dataset : A dataset with 65429 images of 95 fruits

  • Context

    The following fruits are included: Apples (different varieties: Golden, Red Yellow, Granny Smith, Red, Red Delicious), Apricot, Avocado, Avocado ripe, Banana (Yellow, Red, Lady Finger), Cactus fruit, Cantaloupe (2 varieties), Carambula, Cherry (different varieties, Rainier), Cherry Wax (Yellow, Red, Black), Chestnut, Clementine, Cocos, Dates, Granadilla, Grape (Blue, Pink, White (different varieties)), Grapefruit (Pink, White), Guava, Hazelnut, Huckleberry, Kiwi, Kaki, Kumsquats, Lemon (normal, Meyer), Lime, Lychee, Mandarine, Mango, Mangostan, Maracuja, Melon Piel de Sapo, Mulberry, Nectarine, Orange, Papaya, Passion fruit, Peach (different varieties), Pepino, Pear (different varieties, Abate, Kaiser, Monster, Williams), Physalis (normal, with Husk), Pineapple (normal, Mini), Pitahaya Red, Plums (different varieties), Pomegranate, Pomelo Sweetie, Quince, Rambutan, Raspberry, Redcurrant, Salak, Strawberry (normal, Wedge), Tamarillo, Tangelo, Tomato (different varieties, Maroon, Cherry Red), Walnut.

  • Content

    • Total number of images: 65429.
      • Training set size: 48905 images (one fruit per image).
      • Test set size: 16421 images (one fruit per image).
      • Multi-fruits set size: 103 images (more than one fruit (or fruit class) per image)
    • Number of classes: 95 (fruits).
    • Image size: 100x100 pixels.
  • GitHub download: Fruits-360 dataset

Plant Seedlings Classification : Determine the species of a seedling from an image

V2 Plant Seedlings Dataset : Images of crop and weed seedlings at different growth stages

  • Context

    • The V1 version of this dataset was used in the Plant Seedling Classification playground competition here on Kaggle. This is the V2 version. Some samples in V1 contained multiple plants. The dataset’s creators have now removed those samples.
  • Content

    • This dataset contains 5,539 images of crop and weed seedlings.
    • The images are grouped into 12 classes as shown in the above pictures. These classes represent common plant species in Danish agriculture. Each class contains rgb images that show plants at different growth stages.
    • The images are in various sizes and are in png format.

Food

UEC Food-256 Japan Food

  • Context

    • The dataset "UEC FOOD 256" contains 256-kind food photos. Each food photo has a bounding box indicating the location of the food item in the photo.

    • Most of the food categories in this dataset are popular foods in Japan and other countries.

  • Content

    • [1-256] : directory names correspond to food ID.

    • [1-256]/*.jpg : food photo files (some photos are duplicated in two or more directories, since they includes two or more food items.)

    • [1-256]/bb_info.txt: bounding box information for the photo files in each directory

    • category.txt : food list including the correspondences between food IDs and food names in English

    • category_ja.txt : food list including the correspondences between food IDs and food names in Japanese

    • multiple_food.txt: the list representing food photos including two or more food items

FoodDD: Food Detection Dataset, 论文

NutriNet: A Deep Learning Food and Drink Image Recognition System for Dietary Assessment

ChineseFoodNet: A large-scale Image Dataset for Chinese Food Recognition - 2017

Yummly-28K - 2017

- Content 

    - 27,638 recipes in total. 
    - Each recipe contains one recipe image, the ingredients, the cuisine and the course information.     
    - There are 16 kinds of cuisines (e.g,“American”,“Italian” and “Mexican”) 
    - and 13 kinds of recipe courses (e.g, “Main Dishes”,“Desserts” and “Lunch and Snacks”).

VireoFood-172 dataset, 论文-2016

Dishes: a restaurant-oriented food dataset - 2015

Transportation

Boat types recognition : About 1,500 pictures of boats classified in 9 categories

  • Context

    This dataset is used on this blog post https://clorichel.com/blog/2018/11/10/machine-learning-and-object-detection/ where you'll train an image recognition model with TensorFlow to find about anything on pictures and videos.

  • Content

    1,500 pictures of boats, of various sizes, but classified by those different types: buoy, cruise ship, ferry boat, freight boat, gondola, inflatable boat, kayak, paper boat, sailboat.

Scene

Intel Image Classification : Image Scene Classification of Multiclass

  • Context

    image data of Natural Scenes around the world

  • Content

    • This Data contains around 25k images of size 150x150 distributed under 6 categories. {'buildings' -> 0, 'forest' -> 1, 'glacier' -> 2, 'mountain' -> 3, 'sea' -> 4, 'street' -> 5 }

    • The Train, Test and Prediction data is separated in each zip files. There are around 14k images in Train, 3k in Test and 7k in Prediction. This data was initially published on https://datahack.analyticsvidhya.com by Intel to host a Image classification Challenge.

Face

CelebFaces Attributes (CelebA) Dataset : Over 200K images of celebrities with 40 binary attribute annotations