docconv
docconv copied to clipboard
Unable to parse HTML
So I was trying to parse content from multiple document formats and turns out it works for other document formats pdf, doc etc. but not for html files somehow
below is the minimal example with sample html
main.go
package main
import (
"fmt"
"log"
"code.sajari.com/docconv"
)
func main() {
// Attempt to read file
txt, err := docconv.ConvertPath("test.html")
if err != nil {
log.Fatal(err)
}
fmt.Println(txt.Body)
}
test.html
<!DOCTYPE html>
<html>
<body>
<h1>This is heading 1</h1>
<h2>This is heading 2</h2>
<h3>This is heading 3</h3>
<h4>This is heading 4</h4>
<h5>This is heading 5</h5>
<h6>This is heading 6</h6>
</body>
</html>
As of now output is blank
also I noticed that there's no release from 2019 feb so code.sajari.com might be sending older library is there any way to maybe pre-release? version or configure CI to do that
I have the same problem, in Ubuntu x64 and OSX arm M1 mac. No errors, no meta info or content.