SerpScrap
SerpScrap copied to clipboard
serpscrap.SerpScrap() returns None for some keywords
Hi. Does somebody have any idea what could be the reason that on some keywords i get the data while on others i don't ?
for example, dog food:
import serpscrap
keywords = ['dog food']
config = serpscrap.Config()
config.set('scrape_urls', True)
scrap = serpscrap.SerpScrap()
scrap.init(config=config.get(), keywords=keywords)
scrap.as_csv('/tmp/output')
2019-09-22 11:55:14,988 - root - INFO -
Going to scrape 2 keywords with 1
proxies by using 1 threads.
2019-09-22 11:55:14,990 - scrapcore.scraping - INFO -
[+] SelScrape[localhost][search-type:normal][https://www.google.com/search?] using search engine "google".
Num keywords=1, num pages for keyword=[1]
2019-09-22 11:55:24,286 - scrapcore.scraper.selenium - INFO - https://www.google.com/search?
2019-09-22 11:55:55,364 - scrapcore.scraping - INFO -
[google]SelScrape localhost - Keyword: "dog food" with [1, 2] pages,
slept 22 seconds before scraping. 1/1 already scraped
2019-09-22 11:55:56,767 - scrapcore.scraper.selenium - INFO - Requesting the next page
2/2 keywords processed.
2019-09-22 11:56:01,961 - root - INFO - Scraping URL: https://www.mypetneedsthat.com/best-dry-dog-foods-guide/
2019-09-22 11:56:02,681 - root - INFO - Scraping URL: https://www.businessinsider.com/best-dog-food
2019-09-22 11:56:02,686 - root - INFO - Scraping URL: https://www.akc.org/expert-advice/nutrition/best-dog-food-choosing-whats-right-for-your-dog/
2019-09-22 11:56:02,689 - root - INFO - Scraping URL: https://www.amazon.com/Best-Sellers-Pet-Supplies-Dry-Dog-Food/zgbs/pet-supplies/2975360011
2019-09-22 11:56:02,690 - root - INFO - Scraping URL: https://www.chewy.com/b/food-332
2019-09-22 11:56:26,122 - root - INFO - Scraping URL: https://www.petco.com/shop/en/petcostore/category/dog/dog-food
2019-09-22 11:56:26,123 - root - INFO - Scraping URL: https://www.petflow.com/dog/food
2019-09-22 11:56:26,843 - root - INFO - Scraping URL: https://www.dogfoodadvisor.com/
2019-09-22 11:56:27,735 - root - INFO - Scraping URL: https://www.petsmart.com/dog/food/dry-food/
2019-09-22 11:56:27,737 - root - INFO - Scraping URL: https://www.petsmart.com/dog/food/
2019-09-22 11:56:27,738 - root - INFO - Scraping URL: https://www.purina.com/dogs/dog-food
2019-09-22 11:56:28,635 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=fBABfWqSN2I
2019-09-22 11:56:31,757 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=7P85BMCCboI
2019-09-22 11:56:36,807 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=az0ktsWYydw
2019-09-22 11:56:39,645 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=njJ99wPByy4
2019-09-22 11:56:42,571 - root - INFO - Scraping URL: https://nypost.com/video/homeless-man-and-his-dog-reuniting-is-pure-joy/
2019-09-22 11:56:45,156 - root - INFO - Scraping URL: /aclk?sa=l&ai=DChcSEwjRyYG5h-TkAhUM1WQKHSiFASYYABAAGgJwag&sig=AOD64_2IRYpCakgEzR3BK1oqeuLCVa3mjA&adurl=&rct=j&q=
2019-09-22 11:56:45,157 - root - INFO - Scraping URL: https://www.purina.com/dogs/dog-food
2019-09-22 11:56:45,867 - root - INFO - Scraping URL: https://en.wikipedia.org/wiki/Dog_food
2019-09-22 11:56:45,872 - root - INFO - Scraping URL: https://www.hillspet.com/dog-food
2019-09-22 11:56:45,876 - root - INFO - Scraping URL: https://www.smithsfoodanddrug.com/pl/dog-food/11103
2019-09-22 11:57:10,321 - root - INFO - Scraping URL: https://www.canidae.com/dog-food/
2019-09-22 11:57:10,325 - root - INFO - Scraping URL: https://www.petcarerx.com/dog/food-nutrition
2019-09-22 11:57:11,222 - root - INFO - Scraping URL: https://www.businessinsider.com/best-dog-food
2019-09-22 11:57:11,223 - root - INFO - Scraping URL: https://www.tractorsupply.com/tsc/catalog/dog-food
2019-09-22 11:57:12,249 - root - INFO - Scraping URL: https://www.thehonestkitchen.com/dog-food
2019-09-22 11:57:12,253 - root - INFO - Scraping URL: https://www.boxed.com/products/category/418/dog-food
2019-09-22 11:57:13,171 - root - INFO - Scraping URL: https://lifesabundance.com/category/dogfood.aspx
2019-09-22 11:57:13,174 - root - INFO - Scraping URL: //www.googleadservices.com/pagead/aclk?sa=L&ai=DChcSEwj5_NHFh-TkAhWTr-wKHSgSDVMYABAAGgJwag&ohost=www.google.com&cid=CAASEuRoai4G0R8MNbToVnZKzozmNA&sig=AOD64_10tA_ESFCwAHTPgPUTDsInBgYwEQ&adurl=&rct=j&q=
2019-09-22 11:57:13,178 - root - INFO - Scraping URL: https://freshpet.com/why-freshpet/
2019-09-22 11:57:13,901 - root - INFO - Scraping URL: https://pet-food.thecomparizone.com/?var1=82002114870&var2=381760664839&var4&var5=b&var7=1234567890&utm_source=google&utm_medium=cpc
None
Traceback (most recent call last):
File "C:\Users\rot\Anaconda3\lib\site-packages\serpscrap\csv_writer.py", line 14, in write
w.writerow(row)
File "C:\Users\rot\Anaconda3\lib\csv.py", line 155, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "C:\Users\rot\Anaconda3\lib\csv.py", line 151, in _dict_to_list
+ ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: 'url', 'encoding', 'meta_robots', 'meta_title', 'text_raw', 'last_modified', 'status'
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\serpscrap\csv_writer.py in write(self, file_name, my_dict)
13 for row in my_dict[0:]:
---> 14 w.writerow(row)
15 except Exception:
~\Anaconda3\lib\csv.py in writerow(self, rowdict)
154 def writerow(self, rowdict):
--> 155 return self.writer.writerow(self._dict_to_list(rowdict))
156
~\Anaconda3\lib\csv.py in _dict_to_list(self, rowdict)
150 raise ValueError("dict contains fields not in fieldnames: "
--> 151 + ", ".join([repr(x) for x in wrong_fields]))
152 return (rowdict.get(key, self.restval) for key in self.fieldnames)
ValueError: dict contains fields not in fieldnames: 'url', 'encoding', 'meta_robots', 'meta_title', 'text_raw', 'last_modified', 'status'
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
<ipython-input-16-3f66e8511348> in <module>
8 scrap = serpscrap.SerpScrap()
9 scrap.init(config=config.get(), keywords=keywords)
---> 10 scrap.as_csv('/tmp/output')
~\Anaconda3\lib\site-packages\serpscrap\serpscrap.py in as_csv(self, file_path)
146 writer = CsvWriter()
147 self.results = self.run()
--> 148 writer.write(file_path + '.csv', self.results)
149
150 def scrap_serps(self):
~\Anaconda3\lib\site-packages\serpscrap\csv_writer.py in write(self, file_name, my_dict)
15 except Exception:
16 print(traceback.print_exc())
---> 17 raise Exception
Exception:
Many thanks !!