EmailReplyParser icon indicating copy to clipboard operation
EmailReplyParser copied to clipboard

Why do all the $quoteHeadersRegex have a beginning (^) and end ($) regex?

Open davidnagli opened this issue 2 years ago • 1 comments

I am using this to parse emails, but my emails are in unparsed, raw HTML (because I plan to later display them on a webpage and I want to preserve their formatting). The problem is that since this this package uses /^\s*(On(?:(?!.*On\b|\bwrote:)[\s\S])+wrote:)$/m to match the quote headers, if I don't parse and get rid of all the tags (which I don't want to do), the EmailReplyParser fails to parse the quoted headers since technically they are not on the beginning of the line.

Here's a simple example:

<p>On Wednesday, March 22, 2023, 3:25 PM, XXX <XXXX.com> wrote:</p>

To get around this, I removed the ^ and $ from the regular expression which fixed the problem, but I was wondering if maybe there was some original motivation behind having it there in the first place... I don't want to remove something on my end that will break something for me down the line.

Is there a reason for the ^ and $ (beginning and end matching)?

If yes, I suppose there's another workaround where I can use the end of the previous HTML tag "/>" as the "beginning".

davidnagli avatar Mar 22 '23 20:03 davidnagli

I'm actually not using this library ,but probably the rationale is that

EmailReplyParser is a PHP library for parsing plain text email content

osdiab avatar Dec 06 '24 07:12 osdiab