feat: Printing the url list to txt file with `output-file`
upd: The url list of all request is saved in the ./output/output.txt file.
feat: Print query results as array with output-json
upg: Dependencies upgraded.
upd: Readme new flags added.
I also checked it with 17k rows of data. It worked flawlessly.
I hope I made bad code. I recently started working to improve myself in Go. Thanks @atomicptr
The requirement here is met. https://github.com/atomicptr/crab/issues/4
When I get to the computer, I will ask about the necessary corrections and what some of the requirements mean in detail. Thanks @atomicptr
I also updated the readme and help sections. There will be no such problem. Thanks for the information and collaboration. @atomicptr
Best regards from Turkey
There seems to be a problem with the records after scanning large xml files :(
Seems like it could solve this problem:
if data["status"] != nil || data["url"] != nil || data["time"] != nil || data["duration"] != nil {
status = data["status"].(float64)
url = data["url"].(string)
time = data["time"].(float64)
duration = data["duration"].(float64)
_, err = file.WriteString(fmt.Sprintf("%d\t%s\t%d\t%d", int(status), url, int(time), int(duration)) + "\n")
} else {
_, err = file.WriteString(message + "\n")
}
I may need some time to make sure the code works.
This time problem solved :)
I thought it would be good to relocate the error message so it can be parsed.
I'm sending the final version this way.
The command I used for testing: (It contains about 17000 pages.)
crab crawl:sitemap https://www.agroworlddergisi.com/sitemap_index.xml --num-workers=500 --http-timeout=60s --output-file ./output/output.txt --output-json ./output/output.json
Thanks for the changes, I found a few more things but otherwise it looks great :)
I have completed the final changes required. I am updating the code and sending it. @atomicptr
whoops something went wrong here :eyes:
I kinda fucked something up by using a gh cli tool not quite sure what yet although I merged the changes with #11
Thank you very much for the PR! :)
Thanks for the fixes and Merge. @atomicptr