Parts of the merged cells text is getting cut off when merged
When I am merging cells that have text that spans multiple cells, both rows and columns, only the text from the first cell it is in is getting transferred. I am assuming I have to do something like the combine headers function but I am having trouble finding out how to access those other cells. I have added a picture of the table similar to the one that is giving me problems as well as my code and results. Any help with this would be greatly appreciated!
textract_json = call_textract(input_document=documentName, features = [Textract_Features.TABLES])
t_doc = TDocumentSchema().load(textract_json)
ordered_doc = order_blocks_by_geo(t_doc)
trp_doc = Document(TDocumentSchema().dump(ordered_doc))
table_index = 1
dataframes = []
def combine_headers(top_h, mid_h, bottom_h):
try:
bottom_h[4] = top_h[4] + " " + mid_h[4] + " " + bottom_h[4]
bottom_h[5] = top_h[4] + " " + mid_h[4] + " " + bottom_h[5]
except:
pass
for page in trp_doc.pages:
for table in page.tables:
table_data = []
headers = table.get_header_field_names()
if(len(headers)>0):
print("Statememt headers: "+ repr(headers))
top_header= headers[0]
middle_header = headers[1]
bottom_header = headers[2]
combine_headers(top_header, middle_header, bottom_header)
for r, row in enumerate(table.rows_without_header):
table_data.append([])
for c, cell in enumerate(row.cells):
table_data[r].append(cell.mergedText)
if len(table_data)>0:
df = pd.DataFrame(table_data, columns=bottom_header)
print(df.to_markdown())
Table:

As you can see below, in the headers, after "Local (Up" gets cut off because it runs into the next cell, the same happens with all of the length class rows they cut off the "pages)" part of that row. It also happens with the extra long books part. Results:
| Length Class | Category Class | Codes | Codes | Distribution Local (Up To Mark Up Factor | Distribution Local (Up To Cost Factor | |
|---|---|---|---|---|---|---|
| 0 | Short Books (0 100 | Children's | Non-fiction Fiction | 011-- 012-- | 1.10 | 1.00 |
| 1 | Short Books (0 100 | Mystery | Non-fiction Fiction | 021-- 022-- | 1.55 | 1.15 |
| 2 | Short Books (0 100 | Romance | Non-fiction Fiction | 031-- 032-- | 1.40 | 1.00 |
| 3 | ||||||
| 4 | Medium Books (101 500 | Children's | Non-fiction Fiction | 211-- 212-- | 1.05 | 0.95 |
| 5 | Medium Books (101 500 | Mystery | Non-fiction Fiction | 221-- 222-- | 1.50 | 0.70 |
| 6 | Medium Books (101 500 | Romance | Non-fiction Fiction | 231-- 232-- | 1.40 | 0.75 |
| 7 | ||||||
| 8 | Long Books (501 - 1,000 | Children's | Non-fiction Fiction | 311-- 312-- | 1.10 | 0.65 |
| 9 | Long Books (501 - 1,000 | Mystery | Non-fiction Fiction | 321-- 322-- | 1.55 | 0.90 |
| 10 | Long Books (501 - 1,000 | Romance | Non-fiction Fiction | 331-- 332-- | 1.25 | 0.70 |
| 11 | ||||||
| 12 | Extra-Long (Over 1,000 | Extra-Long (Over 1,000 | Non-fiction Fiction | 401-- 402-- | 2.45 | 1.15 |
| 13 |
Hey @GradyMellin I am also facing the same issue.Did you get any workaround for this?