crab icon indicating copy to clipboard operation
crab copied to clipboard

feat: Printing the url list to txt file with `output-file`

Open ramazansancar opened this issue 1 year ago • 7 comments

upd: The url list of all request is saved in the ./output/output.txt file. feat: Print query results as array with output-json upg: Dependencies upgraded. upd: Readme new flags added.

I also checked it with 17k rows of data. It worked flawlessly.

I hope I made bad code. I recently started working to improve myself in Go. Thanks @atomicptr

ramazansancar avatar May 11 '24 00:05 ramazansancar

The requirement here is met. https://github.com/atomicptr/crab/issues/4

ramazansancar avatar May 11 '24 00:05 ramazansancar

When I get to the computer, I will ask about the necessary corrections and what some of the requirements mean in detail. Thanks @atomicptr

ramazansancar avatar May 11 '24 10:05 ramazansancar

I also updated the readme and help sections. There will be no such problem. Thanks for the information and collaboration. @atomicptr

Best regards from Turkey

ramazansancar avatar May 15 '24 09:05 ramazansancar

There seems to be a problem with the records after scanning large xml files :(

image

Seems like it could solve this problem:

if data["status"] != nil || data["url"] != nil || data["time"] != nil || data["duration"] != nil {
	status = data["status"].(float64)
	url = data["url"].(string)
	time = data["time"].(float64)
	duration = data["duration"].(float64)

	_, err = file.WriteString(fmt.Sprintf("%d\t%s\t%d\t%d", int(status), url, int(time), int(duration)) + "\n")
} else {
	_, err = file.WriteString(message + "\n")
}

I may need some time to make sure the code works.

ramazansancar avatar May 15 '24 11:05 ramazansancar

This time problem solved :)

image

I thought it would be good to relocate the error message so it can be parsed.

image I'm sending the final version this way.

The command I used for testing: (It contains about 17000 pages.)

crab crawl:sitemap https://www.agroworlddergisi.com/sitemap_index.xml --num-workers=500 --http-timeout=60s --output-file ./output/output.txt --output-json ./output/output.json

ramazansancar avatar May 15 '24 13:05 ramazansancar

Thanks for the changes, I found a few more things but otherwise it looks great :)

atomicptr avatar May 22 '24 12:05 atomicptr

I have completed the final changes required. I am updating the code and sending it. @atomicptr

ramazansancar avatar May 22 '24 13:05 ramazansancar

whoops something went wrong here :eyes:

atomicptr avatar May 26 '24 11:05 atomicptr

I kinda fucked something up by using a gh cli tool not quite sure what yet although I merged the changes with #11

Thank you very much for the PR! :)

atomicptr avatar May 26 '24 11:05 atomicptr

Thanks for the fixes and Merge. @atomicptr

ramazansancar avatar May 26 '24 14:05 ramazansancar