AngleSharp.Css icon indicating copy to clipboard operation
AngleSharp.Css copied to clipboard

GetInnerText is behaving different from HTML innerText for tables

Open timothy3001 opened this issue 1 year ago • 2 comments

Prerequisites

  • [X] Can you reproduce the problem in a MWE?
  • [X] Are you running the latest version of AngleSharp.Css?
  • [X] Did you check the FAQs to see if that helps you?
  • [X] Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g., AngleSharp.Xml for Xml support)
  • [X] Did you perform a search in the issues?

Description

When using GetInnerText the result returned misses linebreaks for the table rows.

If I use HTMLs "innerText", the linebreaks after each tablerow are correct.

I also tried to add "
" after a "" element, but it is ignored. Everything between "" and "" seems to be ignored.

Thanks a lot for the awesome project!

Steps to Reproduce

Setup simple Anglesharp example, config like the following:

IConfiguration config = Configuration
    .Default
    .WithCss(new CssParserOptions
    {
        IsToleratingInvalidSelectors = true,
        IsIncludingUnknownDeclarations = true,
        IsIncludingUnknownRules = true,
    })
    .WithRenderDevice(new DefaultRenderDevice
    {
        DeviceHeight = 768,
        DeviceWidth = 1024,
        
    });

Then parse the following HTML:

<html>
	<head>
	</head>
	<body>
		<h2>Test</h2>
		<table>
			<tbody>
				<tr>
				</tr>
				<tr>
					<td>Titel: </td>
					<td>Herr</td>
				</tr>
				<tr>
					<td>Vorname: </td>
					<td>Horst</td>
				</tr>
				<tr>
					<td>Nachname: </td>
					<td>Hammer</td>
				</tr>
			</tbody>
		</table>
	</body>
</html>

Expected Behavior

The result when going with document.body.innerText from Chrome devtools console:

Test

Titel:	Herr
Vorname:	Horst
Nachname:	Hammer

Actual Behavior

The result from anglesharp GetInnerText:

Test





Titel: Herr Vorname: Horst Nachname: Hammer 

Possible Solution / Known Workarounds

No response

timothy3001 avatar Jul 29 '24 12:07 timothy3001

The outcome to preserve the table is definitely nice - I don't think we (at the moment) respect the display set to table.

This could certainly be improved (but I am not sure if this is / should be classified as a bug - IIRC we pretty much follow the spec).

FlorianRappl avatar Jul 29 '24 13:07 FlorianRappl

Oh ok, since other browsers deal differently with tables, I thought it was out of spec.

But of course, feel free to change this to improvement or feature request or something.

timothy3001 avatar Jul 30 '24 07:07 timothy3001