docling icon indicating copy to clipboard operation
docling copied to clipboard

fix(html): use 'start' attribute when parsing ordered lists from HTML docs

Open ceberam opened this issue 11 months ago • 1 comments

Description

This PR is about leveraging the start attribute in HTML <ol> tags (ordered lists) when parsing HTML documents.

In HTML documents, items in ordered lists are always parsed using numbers starting with 1 followed by a period as markers. The export to markdown is therefore simplified, since this notation correspond to the basic syntax in markdown.

However HTML allows ordered lists to start with another number, if indicated by the attribute start. Even though the markdown basic syntax recommend to always start with the number 1, applications like GitHub Flavored Markdown usually accept other starting numbers.

In this PR, the HTML backend creates ordered lists starting with the number indicated in start attribute, if it exists.

Issue resolved by this Pull Request: Resolves #1058

Checklist:

  • [x] Documentation has been updated, if necessary.
  • [x] Examples have been added, if necessary.
  • [x] Tests have been added, if necessary.

ceberam avatar Feb 26 '25 15:02 ceberam

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • [X] title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

mergify[bot] avatar Feb 26 '25 15:02 mergify[bot]