python-markdownify icon indicating copy to clipboard operation
python-markdownify copied to clipboard

Add rowspan support for HTML tables

Open ExcitingFrog opened this issue 4 months ago • 0 comments

Add rowspan Support for HTML Tables

Problem Statement

Previously, python-markdownify did not properly handle HTML tables with rowspan attributes. When encountering table cells with rowspan > 1, the resulting Markdown table would have missing cells in subsequent rows, leading to malformed table structure and incorrect column alignment.

Example of the problem:

<table>
    <tr>
        <th>Name</th>
        <th>Department</th>
        <th>Age</th>
    </tr>
    <tr>
        <td rowspan="2">John</td>
        <td>IT</td>
        <td>30</td>
    </tr>
    <tr>
        <td>Management</td>
        <td>31</td>
    </tr>
</table>

Previous (incorrect) output:

| Name | Department | Age |
| --- | --- | --- |
| John | IT | 30 |
| Management | 31 |  <!-- Missing cell, causing misalignment -->

Solution

This PR implements comprehensive rowspan support by:

  1. Detection Logic: Added _table_has_rowspan() method to detect tables containing rowspan attributes
  2. Grid Algorithm: Implemented _build_rowspan_cells() method that:
    • Tracks which columns are occupied by rowspan cells from previous rows
    • Calculates the correct placement of empty placeholder cells
    • Handles complex scenarios with multiple rowspan cells and nested table structures
  3. Backward Compatibility: Tables without rowspan continue to use the original optimized logic
  4. Empty Cell Generation: Properly formatted empty cells (| |) are inserted where rowspan cells span multiple rows

New (correct) output:

| Name | Department | Age |
| --- | --- | --- |
| John | IT | 30 |
|  | Management | 31 |  <!-- Proper empty cell for rowspan -->

Implementation Details

Core Changes in markdownify/__init__.py:

  • convert_tr() method: Enhanced to detect and handle rowspan tables
  • _table_has_rowspan() method: Efficient detection of tables with rowspan attributes
  • _build_rowspan_cells() method: Algorithm to calculate empty cell placement for each row
  • Grid tracking: Maintains occupied column positions across table rows
  • Column counting: Accurate calculation of total columns including rowspan effects

Key Features:

  • Simple rowspan: Basic single-column row spanning
  • Complex rowspan: Multiple consecutive rows (rowspan > 2)
  • Mixed scenarios: Rowspan combined with colspan attributes
  • Multiple rowspan: Multiple rowspan cells in the same row
  • Table headers: Proper handling of rowspan in <thead> sections
  • Backward compatibility: No impact on tables without rowspan
  • Performance: Rowspan processing only activates when needed

Testing

New Test Coverage in tests/test_tables.py:

Added comprehensive test cases covering various rowspan scenarios:

  1. Simple rowspan: Basic two-row spanning functionality
  2. Complex rowspan: Multi-row spanning (rowspan="3")
  3. Rowspan + colspan: Combined row and column spanning
  4. Multiple rowspan: Multiple rowspan cells in the same row
  5. Thead rowspan: Rowspan in table headers with colspan

Test Examples:

# Simple rowspan test
table_with_simple_rowspan = """<table>
    <tr><th>Name</th><th>Department</th><th>Age</th></tr>
    <tr><td rowspan="2">John</td><td>IT</td><td>30</td></tr>
    <tr><td>Management</td><td>31</td></tr>
</table>"""

# Expected output with proper empty cell placement
expected = '\n\n| Name | Department | Age |\n| --- | --- | --- |\n| John | IT | 30 |\n|  | Management | 31 |\n\n'

Test Integration:

  • Integrated approach: Rowspan tests are integrated into existing test_table() and test_table_infer_header() functions
  • Format consistency: New tests follow the same format and style as existing table tests
  • Full coverage: Tests both normal and table_infer_header=True modes
  • Regression prevention: All existing tests (83 total) continue to pass

Compatibility

  • Backward compatible: No breaking changes to existing functionality
  • API unchanged: No new parameters or configuration options required
  • Performance: Minimal overhead for tables without rowspan
  • Edge cases: Handles malformed HTML gracefully
  • Option support: Works correctly with all existing markdownify options

Files Changed

  • markdownify/__init__.py: Core rowspan implementation (+~100 lines)
  • tests/test_tables.py: Comprehensive test coverage (+~50 lines)

Testing Results

83 passed in 0.10s

All existing tests pass, confirming no regressions. New rowspan functionality is fully tested and validated.

-------------------------------test result---------------------------- ===================================================== test session starts ====================================================== platform linux -- Python 3.12.7, pytest-8.3.4, pluggy-1.5.0 -- xxxx cachedir: .pytest_cache rootdir: xxxx configfile: pyproject.toml plugins: anyio-4.8.0 collected 2 items

tests/test_tables.py::test_table PASSED [ 50%] tests/test_tables.py::test_table_infer_header PASSED [100%]

====================================================== 2 passed in 0.06s =======================================================

===================================================== test session starts ====================================================== platform linux -- Python 3.12.7, pytest-8.3.4, pluggy-1.5.0 -- xxxx cachedir: .pytest_cache rootdir: xxxxx configfile: pyproject.toml plugins: anyio-4.8.0 collected 83 items

tests/test_advanced.py::test_chomp PASSED [ 1%] tests/test_advanced.py::test_nested PASSED [ 2%] tests/test_advanced.py::test_ignore_comments PASSED [ 3%] tests/test_advanced.py::test_ignore_comments_with_other_tags PASSED [ 4%] tests/test_advanced.py::test_code_with_tricky_content PASSED [ 6%] tests/test_advanced.py::test_special_tags PASSED [ 7%] tests/test_args.py::test_strip PASSED [ 8%] tests/test_args.py::test_do_not_strip PASSED [ 9%] tests/test_args.py::test_convert PASSED [ 10%] tests/test_args.py::test_do_not_convert PASSED [ 12%] tests/test_args.py::test_strip_document PASSED [ 13%] tests/test_args.py::test_strip_pre PASSED [ 14%] tests/test_basic.py::test_single_tag PASSED [ 15%] tests/test_basic.py::test_soup PASSED [ 16%] tests/test_basic.py::test_whitespace PASSED [ 18%] tests/test_conversions.py::test_a PASSED [ 19%] tests/test_conversions.py::test_a_spaces PASSED [ 20%] tests/test_conversions.py::test_a_with_title PASSED [ 21%] tests/test_conversions.py::test_a_shortcut PASSED [ 22%] tests/test_conversions.py::test_a_no_autolinks PASSED [ 24%] tests/test_conversions.py::test_a_in_code PASSED [ 25%] tests/test_conversions.py::test_b PASSED [ 26%] tests/test_conversions.py::test_b_spaces PASSED [ 27%] tests/test_conversions.py::test_blockquote PASSED [ 28%] tests/test_conversions.py::test_blockquote_with_nested_paragraph PASSED [ 30%] tests/test_conversions.py::test_blockquote_with_paragraph PASSED [ 31%] tests/test_conversions.py::test_blockquote_nested PASSED [ 32%] tests/test_conversions.py::test_br PASSED [ 33%] tests/test_conversions.py::test_code PASSED [ 34%] tests/test_conversions.py::test_dl PASSED [ 36%] tests/test_conversions.py::test_del PASSED [ 37%] tests/test_conversions.py::test_div_section_article PASSED [ 38%] tests/test_conversions.py::test_em PASSED [ 39%] tests/test_conversions.py::test_figcaption PASSED [ 40%] tests/test_conversions.py::test_header_with_space PASSED [ 42%] tests/test_conversions.py::test_h1 PASSED [ 43%] tests/test_conversions.py::test_h2 PASSED [ 44%] tests/test_conversions.py::test_hn PASSED [ 45%] tests/test_conversions.py::test_hn_chained PASSED [ 46%] tests/test_conversions.py::test_hn_nested_tag_heading_style PASSED [ 48%] tests/test_conversions.py::test_hn_nested_simple_tag PASSED [ 49%] tests/test_conversions.py::test_hn_nested_img PASSED [ 50%] tests/test_conversions.py::test_hn_atx_headings PASSED [ 51%] tests/test_conversions.py::test_hn_atx_closed_headings PASSED [ 53%] tests/test_conversions.py::test_hn_newlines PASSED [ 54%] tests/test_conversions.py::test_head PASSED [ 55%] tests/test_conversions.py::test_hr PASSED [ 56%] tests/test_conversions.py::test_i PASSED [ 57%] tests/test_conversions.py::test_img PASSED [ 59%] tests/test_conversions.py::test_video PASSED [ 60%] tests/test_conversions.py::test_kbd PASSED [ 61%] tests/test_conversions.py::test_p PASSED [ 62%] tests/test_conversions.py::test_pre PASSED [ 63%] tests/test_conversions.py::test_q PASSED [ 65%] tests/test_conversions.py::test_script PASSED [ 66%] tests/test_conversions.py::test_style PASSED [ 67%] tests/test_conversions.py::test_s PASSED [ 68%] tests/test_conversions.py::test_samp PASSED [ 69%] tests/test_conversions.py::test_strong PASSED [ 71%] tests/test_conversions.py::test_strong_em_symbol PASSED [ 72%] tests/test_conversions.py::test_sub PASSED [ 73%] tests/test_conversions.py::test_sup PASSED [ 74%] tests/test_conversions.py::test_lang PASSED [ 75%] tests/test_conversions.py::test_lang_callback PASSED [ 77%] tests/test_conversions.py::test_spaces PASSED [ 78%] tests/test_custom_converter.py::test_custom_conversion_functions PASSED [ 79%] tests/test_custom_converter.py::test_soup PASSED [ 80%] tests/test_escaping.py::test_asterisks PASSED [ 81%] tests/test_escaping.py::test_underscore PASSED [ 83%] tests/test_escaping.py::test_xml_entities PASSED [ 84%] tests/test_escaping.py::test_named_entities PASSED [ 85%] tests/test_escaping.py::test_hexadecimal_entities PASSED [ 86%] tests/test_escaping.py::test_single_escaping_entities PASSED [ 87%] tests/test_escaping.py::test_misc PASSED [ 89%] tests/test_lists.py::test_ol PASSED [ 90%] tests/test_lists.py::test_nested_ols PASSED [ 91%] tests/test_lists.py::test_ul PASSED [ 92%] tests/test_lists.py::test_inline_ul PASSED [ 93%] tests/test_lists.py::test_nested_uls PASSED [ 95%] tests/test_lists.py::test_bullets PASSED [ 96%] tests/test_lists.py::test_li_text PASSED [ 97%] tests/test_tables.py::test_table PASSED [ 98%] tests/test_tables.py::test_table_infer_header PASSED [100%]

====================================================== 83 passed in 0.10s ======================================================

ExcitingFrog avatar Sep 05 '25 08:09 ExcitingFrog