opensearchserver icon indicating copy to clipboard operation
opensearchserver copied to clipboard

Regex works in RegexBuddy but not in OpenSearchServer

Open mzeidhassan opened this issue 9 years ago • 1 comments

First, thanks for the great product. I just came across it yesterday and I think it's really amazing.

I am using the web crawler to extract some items from ecommerce site, but the regex doesn't work in the HTMLSourceParser.

Here is the source HTML

tentLoaded",i,!1),s[p]("load",r,!1)):(l[d]("onreadystatechange",o),s[d]("onload",r)),u("mark",["firstbyte",a()],null,"api");var h=0},{}]},{},["loader"]);</script> <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale = 1.0"> <title>HP Envy 17-N Notebook - Intel Core i7, 1 TB, 17.3 Inch, 8 GB, Windows</title> <meta name="robots" content="index,follow"><meta property="og:locale" content="en_US"><meta property="og:url" content="http://uae.com/ae-en/hp-envy-17-n-notebook-intel-core-i7-1-tb-17-3-inch-8-gb-windows-silver-9293703/i/"><meta property="og:title" content="HP Envy 17-N Notebook - Intel Core i7, 1 TB, 17.3 Inch, 8 GB, Windows, Silver"><meta property="og:type" content="product">"

This regex works just fine everywhere

.*?

but it doesn't work in OSS. What kind of flavor do you use in OSS? What should I use to extract data? I hope you can help me with this part.

Thanks

mzeidhassan avatar Mar 08 '17 06:03 mzeidhassan

Thanks for your support.

You should use captures. This pattern should work:

(.*?)

emmanuel-keller avatar Mar 08 '17 08:03 emmanuel-keller