Add rowspan support for HTML tables
Add rowspan Support for HTML Tables
Problem Statement
Previously, python-markdownify did not properly handle HTML tables with rowspan attributes. When encountering table cells with rowspan > 1, the resulting Markdown table would have missing cells in subsequent rows, leading to malformed table structure and incorrect column alignment.
Example of the problem:
<table>
<tr>
<th>Name</th>
<th>Department</th>
<th>Age</th>
</tr>
<tr>
<td rowspan="2">John</td>
<td>IT</td>
<td>30</td>
</tr>
<tr>
<td>Management</td>
<td>31</td>
</tr>
</table>
Previous (incorrect) output:
| Name | Department | Age |
| --- | --- | --- |
| John | IT | 30 |
| Management | 31 | <!-- Missing cell, causing misalignment -->
Solution
This PR implements comprehensive rowspan support by:
-
Detection Logic: Added
_table_has_rowspan()method to detect tables containing rowspan attributes -
Grid Algorithm: Implemented
_build_rowspan_cells()method that:- Tracks which columns are occupied by rowspan cells from previous rows
- Calculates the correct placement of empty placeholder cells
- Handles complex scenarios with multiple rowspan cells and nested table structures
- Backward Compatibility: Tables without rowspan continue to use the original optimized logic
-
Empty Cell Generation: Properly formatted empty cells (
| |) are inserted where rowspan cells span multiple rows
New (correct) output:
| Name | Department | Age |
| --- | --- | --- |
| John | IT | 30 |
| | Management | 31 | <!-- Proper empty cell for rowspan -->
Implementation Details
Core Changes in markdownify/__init__.py:
-
convert_tr()method: Enhanced to detect and handle rowspan tables -
_table_has_rowspan()method: Efficient detection of tables with rowspan attributes -
_build_rowspan_cells()method: Algorithm to calculate empty cell placement for each row - Grid tracking: Maintains occupied column positions across table rows
- Column counting: Accurate calculation of total columns including rowspan effects
Key Features:
- ✅ Simple rowspan: Basic single-column row spanning
- ✅ Complex rowspan: Multiple consecutive rows (rowspan > 2)
- ✅ Mixed scenarios: Rowspan combined with colspan attributes
- ✅ Multiple rowspan: Multiple rowspan cells in the same row
- ✅ Table headers: Proper handling of rowspan in
<thead>sections - ✅ Backward compatibility: No impact on tables without rowspan
- ✅ Performance: Rowspan processing only activates when needed
Testing
New Test Coverage in tests/test_tables.py:
Added comprehensive test cases covering various rowspan scenarios:
- Simple rowspan: Basic two-row spanning functionality
- Complex rowspan: Multi-row spanning (rowspan="3")
- Rowspan + colspan: Combined row and column spanning
- Multiple rowspan: Multiple rowspan cells in the same row
- Thead rowspan: Rowspan in table headers with colspan
Test Examples:
# Simple rowspan test
table_with_simple_rowspan = """<table>
<tr><th>Name</th><th>Department</th><th>Age</th></tr>
<tr><td rowspan="2">John</td><td>IT</td><td>30</td></tr>
<tr><td>Management</td><td>31</td></tr>
</table>"""
# Expected output with proper empty cell placement
expected = '\n\n| Name | Department | Age |\n| --- | --- | --- |\n| John | IT | 30 |\n| | Management | 31 |\n\n'
Test Integration:
-
Integrated approach: Rowspan tests are integrated into existing
test_table()andtest_table_infer_header()functions - Format consistency: New tests follow the same format and style as existing table tests
-
Full coverage: Tests both normal and
table_infer_header=Truemodes - Regression prevention: All existing tests (83 total) continue to pass
Compatibility
- ✅ Backward compatible: No breaking changes to existing functionality
- ✅ API unchanged: No new parameters or configuration options required
- ✅ Performance: Minimal overhead for tables without rowspan
- ✅ Edge cases: Handles malformed HTML gracefully
- ✅ Option support: Works correctly with all existing markdownify options
Files Changed
-
markdownify/__init__.py: Core rowspan implementation (+~100 lines) -
tests/test_tables.py: Comprehensive test coverage (+~50 lines)
Testing Results
83 passed in 0.10s
All existing tests pass, confirming no regressions. New rowspan functionality is fully tested and validated.
-------------------------------test result---------------------------- ===================================================== test session starts ====================================================== platform linux -- Python 3.12.7, pytest-8.3.4, pluggy-1.5.0 -- xxxx cachedir: .pytest_cache rootdir: xxxx configfile: pyproject.toml plugins: anyio-4.8.0 collected 2 items
tests/test_tables.py::test_table PASSED [ 50%] tests/test_tables.py::test_table_infer_header PASSED [100%]
====================================================== 2 passed in 0.06s =======================================================
===================================================== test session starts ====================================================== platform linux -- Python 3.12.7, pytest-8.3.4, pluggy-1.5.0 -- xxxx cachedir: .pytest_cache rootdir: xxxxx configfile: pyproject.toml plugins: anyio-4.8.0 collected 83 items
tests/test_advanced.py::test_chomp PASSED [ 1%] tests/test_advanced.py::test_nested PASSED [ 2%] tests/test_advanced.py::test_ignore_comments PASSED [ 3%] tests/test_advanced.py::test_ignore_comments_with_other_tags PASSED [ 4%] tests/test_advanced.py::test_code_with_tricky_content PASSED [ 6%] tests/test_advanced.py::test_special_tags PASSED [ 7%] tests/test_args.py::test_strip PASSED [ 8%] tests/test_args.py::test_do_not_strip PASSED [ 9%] tests/test_args.py::test_convert PASSED [ 10%] tests/test_args.py::test_do_not_convert PASSED [ 12%] tests/test_args.py::test_strip_document PASSED [ 13%] tests/test_args.py::test_strip_pre PASSED [ 14%] tests/test_basic.py::test_single_tag PASSED [ 15%] tests/test_basic.py::test_soup PASSED [ 16%] tests/test_basic.py::test_whitespace PASSED [ 18%] tests/test_conversions.py::test_a PASSED [ 19%] tests/test_conversions.py::test_a_spaces PASSED [ 20%] tests/test_conversions.py::test_a_with_title PASSED [ 21%] tests/test_conversions.py::test_a_shortcut PASSED [ 22%] tests/test_conversions.py::test_a_no_autolinks PASSED [ 24%] tests/test_conversions.py::test_a_in_code PASSED [ 25%] tests/test_conversions.py::test_b PASSED [ 26%] tests/test_conversions.py::test_b_spaces PASSED [ 27%] tests/test_conversions.py::test_blockquote PASSED [ 28%] tests/test_conversions.py::test_blockquote_with_nested_paragraph PASSED [ 30%] tests/test_conversions.py::test_blockquote_with_paragraph PASSED [ 31%] tests/test_conversions.py::test_blockquote_nested PASSED [ 32%] tests/test_conversions.py::test_br PASSED [ 33%] tests/test_conversions.py::test_code PASSED [ 34%] tests/test_conversions.py::test_dl PASSED [ 36%] tests/test_conversions.py::test_del PASSED [ 37%] tests/test_conversions.py::test_div_section_article PASSED [ 38%] tests/test_conversions.py::test_em PASSED [ 39%] tests/test_conversions.py::test_figcaption PASSED [ 40%] tests/test_conversions.py::test_header_with_space PASSED [ 42%] tests/test_conversions.py::test_h1 PASSED [ 43%] tests/test_conversions.py::test_h2 PASSED [ 44%] tests/test_conversions.py::test_hn PASSED [ 45%] tests/test_conversions.py::test_hn_chained PASSED [ 46%] tests/test_conversions.py::test_hn_nested_tag_heading_style PASSED [ 48%] tests/test_conversions.py::test_hn_nested_simple_tag PASSED [ 49%] tests/test_conversions.py::test_hn_nested_img PASSED [ 50%] tests/test_conversions.py::test_hn_atx_headings PASSED [ 51%] tests/test_conversions.py::test_hn_atx_closed_headings PASSED [ 53%] tests/test_conversions.py::test_hn_newlines PASSED [ 54%] tests/test_conversions.py::test_head PASSED [ 55%] tests/test_conversions.py::test_hr PASSED [ 56%] tests/test_conversions.py::test_i PASSED [ 57%] tests/test_conversions.py::test_img PASSED [ 59%] tests/test_conversions.py::test_video PASSED [ 60%] tests/test_conversions.py::test_kbd PASSED [ 61%] tests/test_conversions.py::test_p PASSED [ 62%] tests/test_conversions.py::test_pre PASSED [ 63%] tests/test_conversions.py::test_q PASSED [ 65%] tests/test_conversions.py::test_script PASSED [ 66%] tests/test_conversions.py::test_style PASSED [ 67%] tests/test_conversions.py::test_s PASSED [ 68%] tests/test_conversions.py::test_samp PASSED [ 69%] tests/test_conversions.py::test_strong PASSED [ 71%] tests/test_conversions.py::test_strong_em_symbol PASSED [ 72%] tests/test_conversions.py::test_sub PASSED [ 73%] tests/test_conversions.py::test_sup PASSED [ 74%] tests/test_conversions.py::test_lang PASSED [ 75%] tests/test_conversions.py::test_lang_callback PASSED [ 77%] tests/test_conversions.py::test_spaces PASSED [ 78%] tests/test_custom_converter.py::test_custom_conversion_functions PASSED [ 79%] tests/test_custom_converter.py::test_soup PASSED [ 80%] tests/test_escaping.py::test_asterisks PASSED [ 81%] tests/test_escaping.py::test_underscore PASSED [ 83%] tests/test_escaping.py::test_xml_entities PASSED [ 84%] tests/test_escaping.py::test_named_entities PASSED [ 85%] tests/test_escaping.py::test_hexadecimal_entities PASSED [ 86%] tests/test_escaping.py::test_single_escaping_entities PASSED [ 87%] tests/test_escaping.py::test_misc PASSED [ 89%] tests/test_lists.py::test_ol PASSED [ 90%] tests/test_lists.py::test_nested_ols PASSED [ 91%] tests/test_lists.py::test_ul PASSED [ 92%] tests/test_lists.py::test_inline_ul PASSED [ 93%] tests/test_lists.py::test_nested_uls PASSED [ 95%] tests/test_lists.py::test_bullets PASSED [ 96%] tests/test_lists.py::test_li_text PASSED [ 97%] tests/test_tables.py::test_table PASSED [ 98%] tests/test_tables.py::test_table_infer_header PASSED [100%]
====================================================== 83 passed in 0.10s ======================================================