mmocr icon indicating copy to clipboard operation
mmocr copied to clipboard

Kevinnunu/add LMDBDumper and TextRecogLMDBConfigGenerator, update prepare_dataset.py script

Open KevinNuNu opened this issue 3 years ago • 1 comments

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

增加LMDBDumper和TextRecogLMDBConfigGenerator,以配合RecogLMDBDataset和LoadImageFromLMDB使用

Modification

1.tools/dataset_converters/prepare_dataset.py脚本增加--lmdb参数(仅textrecog任务支持),mmocr/datasets/preparers/data_preparer.py 的parser_cfg环节判断是否需要使用LMDB格式,完成对dataset_zoo中原始config的读取和更新,从而不需要人为手动修改原始config文件 2.mmocr/datasets/preparers/dumpers/dumpers.py 修复了JsonDumper输出文件无法正常显示中文的问题 3.mmocr/datasets/preparers/dumpers/dumpers.py 新增了LMDBDumper 4.mmocr/datasets/preparers/config_generator.py 新增TextRecogLMDBConfigGenerator,输出文件名以及内容中的dataset_name都会在原来的基础上加上'_lmdb‘后缀加以区分,举例如下: image

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories? If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Before PR:

  • [ ] I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
  • [ ] Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
  • [ ] Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
  • [ ] New functionalities are covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • [ ] The documentation has been modified accordingly, including docstring or example tutorials.

After PR:

  • [ ] If the modification has potential influence on downstream or other related projects, this PR should be tested with some of those projects.
  • [ ] CLA has been signed and all committers have signed the CLA in this PR.

KevinNuNu avatar Jan 12 '23 06:01 KevinNuNu

Codecov Report

Base: 88.13% // Head: 87.46% // Decreases project coverage by -0.68% :warning:

Coverage data is based on head (aadcdd4) compared to base (9b0f1da). Patch coverage: 12.12% of modified lines in pull request are covered.

:exclamation: Current head aadcdd4 differs from pull request most recent head 7c80662. Consider uploading reports for the commit 7c80662 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           dev-1.x    #1673      +/-   ##
===========================================
- Coverage    88.13%   87.46%   -0.68%     
===========================================
  Files          176      176              
  Lines        11012    11109      +97     
  Branches      1555     1573      +18     
===========================================
+ Hits          9705     9716      +11     
- Misses        1017     1103      +86     
  Partials       290      290              
Flag Coverage Δ
unittests 87.46% <12.12%> (-0.68%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmocr/datasets/preparers/config_generator.py 25.88% <0.00%> (-2.69%) :arrow_down:
mmocr/datasets/preparers/data_preparer.py 53.48% <5.00%> (-14.70%) :arrow_down:
mmocr/datasets/preparers/dumpers/dumpers.py 32.95% <15.71%> (-67.05%) :arrow_down:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Feb 02 '23 08:02 codecov[bot]