Deep-Object-Detection
Deep-Object-Detection copied to clipboard
图解物体检测 & 网络框架. Inspired by awesome object detection, deep object detection does a easy way for understanding in Chinese.
Inspired by awesome object detection, deep object detection does a easy way for understanding in Chinese.
目录
- 图解网络架构
- LeNet_AlexNet
- LeNet_AlexNet_Keras代码实现
- VGG16网络与代码实现
- VGG19网络与代码实现
- Resnet
- Inception-v4: 2016
- SqueezeNet:2016
- DenseNet:2016
- Xception:2016
- ResNeXt:2016
- ROR: 2016
- MobileNet-v1:2017
- ShuffleNet:2017
- SENet : 2017
- MobileNet-V2:2018
- ShuffleNet-V2: 2018
- MobileNet-V3: 2019
- EfficientNet: 2019
- Transformer in Transformer: 2021
- ViT-Image Recognition at Scale: 2021
- Perceiver: 2021
- 图解Object_Detection框架
- Multi-stage Object Detection
- RCNN : 2014
- SPPnet : 2014
- FCN : 2015
- Fast R-CNN : 2015
- Faster R-CNN : 2015
- FPN : 2016
- Mask R-CNN : 2017
- Soft-NMS : 2017
- Segmentation is all you need : 2019
- Single Stage Object Detection
- DenseBox : 2015
- SSD : 2016
- YoLov2 : 2016
- RetinaNet : 2017
- YoLov3 : 2018
- M2Det : 2019
- CornerNet-Lite : 2019
- Multi-stage Object Detection
- 图解 Action Classification
- :lemon: MLAD :date: 2021.03.04v1 :blush: University of Central Florida
- 数据集Object_Detection
- General Dataset
- Animal
- Plant
- Food
- Transportation
- Scene
- Face
图解网络架构
LeNet_AlexNet
LeNet_AlexNet_Keras代码实现
LeNet-Keras for mnist handwriting digital image classification
LeNet-Keras restructure
Accuracy: 98.54%
===================================
AlexNet-Keras for oxflower17 image classification
AlexNet-Keras restructure: 修改后的网络 val_acc: ~80%, 过拟合
===================================
VGG16网络与代码实现
VGG16-Keras oxflower17 物体分类: 修改后的网络 val_acc: ~86.4%, 过拟合
VGG19网络与代码实现
Resnet
-
ResNet Deep Residual Learning for Image Recognition - CVPR
-
残差块与直连层:
-
残差网络架构:
-
残差网络中 Shortcut Connection 参考文章
- 1995 - Neural networks for pattern recognition - Bishop
- 1996 - Pattern recognition and neural networks - Ripley
- 1999 - Modern applied statistics with s-plus - Venables & Ripley
-
-
Convolutional Neural Networks at Constrained Time Cost
- 实验表明: 加深网络, 会出现训练误差
===================================
Inception-v4: 2016
- Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
SqueezeNet:2016
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
DenseNet:2016
-
DenseNet : Densely Connected Convolutional Networks
-
- Dense Block 层间链接采用concat, 而不是按元素add
Xception:2016
- Xception: Deep Learning with Depthwise Separable Convolutions
ResNeXt:2016
- ResNeXt: Aggregated Residual Transformations for Deep Neural Networks
ROR: 2016
- ROR - Residual Networks of Residual Networks: Multilevel Residual Networks
MobileNet-v1:2017
-
MobileNets : Efficient Convolutional Neural Networks for Mobile Vision Applications
-
图解MobileNetv1:
-
参考资料:
ShuffleNet:2017
-
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
-
图解ShuffleNet单元块:
-
Code:
SENet : 2017
- SENet Squeeze-and-Excitation Networks
MobileNet-V2:2018
-
MobileNetV2 : Inverted Residuals and Linear Bottlenecks
-
图解MobileNetv2:
ShuffleNet-V2: 2018
- ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
MobileNet-V3: 2019
- MobileNet V3: Searching for MobileNetV3
EfficientNet: 2019
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Transformer in Transformer: 2021
ViT-Image Recognition at Scale: 2021
-
Vision Transformers: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Perceiver: 2021
- Perceiver : General Perception with Iterative Attention
ViT image classification - Keras code
=============================
图解Object_Detection框架
通用文档
2010
2011
2013
-
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
-
sliding window detector on an image pyramid
-
Overfeat 算法流程:
2014
-
VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition
-
SPP: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
2017
2018
===========================
Multi-stage Object Detection
RCNN : 2014
-
Region-Based Convolutional Networks for Accurate Object Detection and Segmentation
-
v5 Rich feature hierarchies for accurate object detection and semantic segmentation - CVPR
-
region proposal with scale-normalized before classifying with a ConvNet
-
SPPnet : 2014
FCN : 2015
- FCN -Fully convolutional networks for semantic segmentation - CVPR
-
全卷积网络将最后的三层全连接层, 用多通道同尺寸卷积核, 转换成卷积层; 使输入图像尺寸可以改动
-
语义分割的网络结构:
-
提取不同的池化层特征图, 对特征图进行上采样
-
上采样使用反卷积(转置卷积) : 导致反卷积后的图像不够细致
-
跳层结构, 特征图融合: 元素按像素相加(Keras里面 add 函数)
-
将特征图转换成原图像大小进行像素预测
-
-
语义分割的问题定义:
-
像素值二分类
-
最后一层卷积为1x1x21(VOC 20类物体+1类背景)
-
-
code:
-
Fast R-CNN : 2015
-
Fast R-CNN - ICCV
Faster R-CNN : 2015
-
Faster R-CNN: To- wards real-time object detection with region proposal net- works - NIPS
- RPN(Region Proposal Network) & Anchor Box
FPN : 2016
-
Feature Pyramid Networks for Object Detection
-
Idea from traditional CV feature pyramids, for compute and memory intensive in DL
想法源自传统计算机视觉中的特征金字塔, 深度学习中没用是因为计算密集,占内存
-
bottome-up in FeedForward: deepest layer of each stage should have the strongest features
每阶段的最深的一层应该有最强的特征
-
Mask R-CNN : 2017
Soft-NMS : 2017
Segmentation is all you need : 2019
============================
Single Stage Object Detection
DenseBox : 2015
SSD : 2016
-
SSD: Single Shot MultiBox Detector - ECCV
-
工作流程:
- 特征提取网络为VGG-16, 边界框 和 分类 为特征图金字塔
-
网络架构:
-
损失函数:
-
位置Smooth L1 Loss 和 多分类Softmax 的和
-
-
YoLov2 : 2016
-
YOLOv2 YOLO9000: Better, Faster, Stronger
-
工作流程:
-
在图像分类任务上预训练 CNN网络
-
图像拆分为单元格, 如果一个对象的中心在一个单元格内,该单元格就“负责”检测该对象
每个单元预测(a)边界框位置,(b)置信度分数,(c)以边界框中的对象的存在为条件的对象类的概率
-
修改预训练的CNN的最后一层以输出预测张量
-
-
网络架构:
-
损失函数:
-
2部分组成: 边界框回归 和 分类条件概率 - 都采用平方差的和
-
-
RetinaNet : 2017
-
RetinaNet:Focal Loss for Dense Object Detection
-
工作流程:
-
焦点损失为明显的,容易错误分类的情况(具有噪声纹理或部分对象的背景)分配更多权重,并且降低简单情况权重(明显空白背景)
-
特征提取网络为ResNet, 特征金字塔提高检测性能
-
-
网络架构:
-
YoLov3 : 2018
-
YOLOv3: An Incremental Improvement
-
bbox 预测使用尺寸聚类
-
每个box有4个坐标
-
训练时, 使用误差平方和损失函数 sum of squared error loss
-
bbox object分值, 用 logistic regression
-
分类器 使用 logistic regression, 损失函数binary cross-entropy
-
-
借鉴了 FPN 网络
-
特征提取卷积网络
-
3x3, 1x1 卷积层交替
-
借鉴了 ResNet, 使用了直连, 分别从卷积层或直连层进行直连
-
-
M2Det : 2019
CornerNet-Lite : 2019
- CornerNet-Lite : Efficient Keypoint Based Object Detection
- CornerNet-Saccade: 处理特征图的像素, 一个裁剪多个检测; 离线处理
- CornetNet-Squeeze: 骨干网络, 使用SqueezeNet, 沙漏架构; 实时处理
=============================
图解 Action Classification
:lemon: MLAD :date: 2021.03.04v1 :blush: University of Central Florida
Network
=============================
数据集Object_Detection
不确定每个数据集都包含完整的物体检测数据标注。
General Dataset
-
- 14,197,122张图像
-
PASCAL Visual Object Classes Challenge 2008 (VOC2008), VOC-2012
-
-
近900万个图像URL数据集, 数千个类的图像级标签边框并且进行了标注。
-
数据集包含9,011,219张图像的训练集, 41,260张图像的验证集, 125,436张图像的测试集。
-
-
- Corel5K图像集,共5000幅图片,包含50个语义主题,有公共汽车、恐龙、海滩等。
Animal
Stanford Dogs 🐶 Dataset : Over 20,000 images of 120 dog breeds
-
Context
The Stanford Dogs dataset contains images of 120 breeds of dogs from around the world. This dataset has been built using images and annotation from ImageNet for the task of fine-grained image categorization. It was originally collected for fine-grain image categorization, a challenging problem as certain dog breeds have near identical features or differ in colour and age.
来源于imagenet, 用于图像细粒度分类
-
Content
- Number of categories: 120
- Number of images: 20,580
- Annotations: Class labels, Bounding boxes
Honey Bee pollen : High resolution images of individual bees on the ramp
-
Context
This image dataset has been created from videos captured at the entrance of a bee colony in June 2017 at the Bee facility of the Gurabo Agricultural Experimental Station of the University of Puerto Rico.
识别 蜜蜂 🐝 授粉 或者 未授粉
-
Content
-
images/ contains images for pollen bearing and no pollen bearing honey bees.
- The prefix of the images names define their class: e.g. NP1268-15r.jpg for non-pollen and P7797-103r.jpg for pollen bearing bees.
- The numbers correspond to frame and item number respectively, you need to be careful that they are not numbered sequentially.
-
Read-skimage.ipynb Jupyter notebook for simple script to load the data and create the dataset using skimage library.
-
Plant
Flowers Recognition : This dataset contains labeled 4242 images of flowers.
-
Context
This dataset contains 4242 images of flowers. The data collection is based on the data flicr, google images, yandex images. You can use this datastet to recognize plants from the photo.
-
Content
- five classes: chamomile, tulip, rose, sunflower, dandelion
- each class there are about 800 photos
- resolution: about 320x240 pixels
VGG - 17 Category Flower Dataset
-
Context
- 17 category flower dataset with 80 images for each class
- 80 images for each category
-
Content
-
The datasplits used in this paper are specified in datasplits.mat
-
There are 3 separate splits. The results in the paper are averaged over the 3 splits.
-
Each split has a training file (trn1,trn2,trn3), a validation file (val1, val2, val3) and a testfile (tst1, tst2 or tst3).
-
VGG - 102 Category Flower Dataset
-
Context
- 102 category dataset, consisting of 102 flower categories
- Each class consists of between 40 and 258 images
-
Content
-
The datasplits used in this paper are specified in setid.mat.
-
The results in the paper are produced on a 103 category database. - - The two categories labeled Petunia have since been merged since they are the same.
-
There is a training file (trnid), a validation file (valid) and a testfile (tstid).
-
Fruits 360 dataset : A dataset with 65429 images of 95 fruits
-
Context
The following fruits are included: Apples (different varieties: Golden, Red Yellow, Granny Smith, Red, Red Delicious), Apricot, Avocado, Avocado ripe, Banana (Yellow, Red, Lady Finger), Cactus fruit, Cantaloupe (2 varieties), Carambula, Cherry (different varieties, Rainier), Cherry Wax (Yellow, Red, Black), Chestnut, Clementine, Cocos, Dates, Granadilla, Grape (Blue, Pink, White (different varieties)), Grapefruit (Pink, White), Guava, Hazelnut, Huckleberry, Kiwi, Kaki, Kumsquats, Lemon (normal, Meyer), Lime, Lychee, Mandarine, Mango, Mangostan, Maracuja, Melon Piel de Sapo, Mulberry, Nectarine, Orange, Papaya, Passion fruit, Peach (different varieties), Pepino, Pear (different varieties, Abate, Kaiser, Monster, Williams), Physalis (normal, with Husk), Pineapple (normal, Mini), Pitahaya Red, Plums (different varieties), Pomegranate, Pomelo Sweetie, Quince, Rambutan, Raspberry, Redcurrant, Salak, Strawberry (normal, Wedge), Tamarillo, Tangelo, Tomato (different varieties, Maroon, Cherry Red), Walnut.
-
Content
- Total number of images: 65429.
- Training set size: 48905 images (one fruit per image).
- Test set size: 16421 images (one fruit per image).
- Multi-fruits set size: 103 images (more than one fruit (or fruit class) per image)
- Number of classes: 95 (fruits).
- Image size: 100x100 pixels.
- Total number of images: 65429.
Plant Seedlings Classification : Determine the species of a seedling from an image
-
Context
- a dataset containing images of approximately 960 unique plants belonging to 12 species at several growth stages
-
Content
V2 Plant Seedlings Dataset : Images of crop and weed seedlings at different growth stages
-
Context
- The V1 version of this dataset was used in the Plant Seedling Classification playground competition here on Kaggle. This is the V2 version. Some samples in V1 contained multiple plants. The dataset’s creators have now removed those samples.
-
Content
- This dataset contains 5,539 images of crop and weed seedlings.
- The images are grouped into 12 classes as shown in the above pictures. These classes represent common plant species in Danish agriculture. Each class contains rgb images that show plants at different growth stages.
- The images are in various sizes and are in png format.
Food
-
Context
-
The dataset "UEC FOOD 256" contains 256-kind food photos. Each food photo has a bounding box indicating the location of the food item in the photo.
-
Most of the food categories in this dataset are popular foods in Japan and other countries.
-
-
Content
-
[1-256] : directory names correspond to food ID.
-
[1-256]/*.jpg : food photo files (some photos are duplicated in two or more directories, since they includes two or more food items.)
-
[1-256]/bb_info.txt: bounding box information for the photo files in each directory
-
category.txt : food list including the correspondences between food IDs and food names in English
-
category_ja.txt : food list including the correspondences between food IDs and food names in Japanese
-
multiple_food.txt: the list representing food photos including two or more food items
-
FoodDD: Food Detection Dataset, 论文
NutriNet: A Deep Learning Food and Drink Image Recognition System for Dietary Assessment
ChineseFoodNet: A large-scale Image Dataset for Chinese Food Recognition - 2017
- Content
- 27,638 recipes in total.
- Each recipe contains one recipe image, the ingredients, the cuisine and the course information.
- There are 16 kinds of cuisines (e.g,“American”,“Italian” and “Mexican”)
- and 13 kinds of recipe courses (e.g, “Main Dishes”,“Desserts” and “Lunch and Snacks”).
VireoFood-172 dataset, 论文-2016
Dishes: a restaurant-oriented food dataset - 2015
Transportation
Boat types recognition : About 1,500 pictures of boats classified in 9 categories
-
Context
This dataset is used on this blog post https://clorichel.com/blog/2018/11/10/machine-learning-and-object-detection/ where you'll train an image recognition model with TensorFlow to find about anything on pictures and videos.
-
Content
1,500 pictures of boats, of various sizes, but classified by those different types: buoy, cruise ship, ferry boat, freight boat, gondola, inflatable boat, kayak, paper boat, sailboat.
Scene
Intel Image Classification : Image Scene Classification of Multiclass
-
Context
image data of Natural Scenes around the world
-
Content
-
This Data contains around 25k images of size 150x150 distributed under 6 categories. {'buildings' -> 0, 'forest' -> 1, 'glacier' -> 2, 'mountain' -> 3, 'sea' -> 4, 'street' -> 5 }
-
The Train, Test and Prediction data is separated in each zip files. There are around 14k images in Train, 3k in Test and 7k in Prediction. This data was initially published on https://datahack.analyticsvidhya.com by Intel to host a Image classification Challenge.
-
