SerpScrap icon indicating copy to clipboard operation
SerpScrap copied to clipboard

serpscrap.SerpScrap() returns None for some keywords

Open GefenPuravida opened this issue 6 years ago • 0 comments

Hi. Does somebody have any idea what could be the reason that on some keywords i get the data while on others i don't ?

for example, dog food:

import serpscrap

keywords = ['dog food']

config = serpscrap.Config()
config.set('scrape_urls', True)

scrap = serpscrap.SerpScrap()
scrap.init(config=config.get(), keywords=keywords)
scrap.as_csv('/tmp/output')
2019-09-22 11:55:14,988 - root - INFO - 
                Going to scrape 2 keywords with 1
                proxies by using 1 threads.
2019-09-22 11:55:14,990 - scrapcore.scraping - INFO - 
        [+] SelScrape[localhost][search-type:normal][https://www.google.com/search?] using search engine "google".
        Num keywords=1, num pages for keyword=[1]
        
2019-09-22 11:55:24,286 - scrapcore.scraper.selenium - INFO - https://www.google.com/search?
2019-09-22 11:55:55,364 - scrapcore.scraping - INFO - 
            [google]SelScrape localhost - Keyword: "dog food" with [1, 2] pages,
            slept 22 seconds before scraping. 1/1 already scraped
            
2019-09-22 11:55:56,767 - scrapcore.scraper.selenium - INFO - Requesting the next page
2/2 keywords processed.
2019-09-22 11:56:01,961 - root - INFO - Scraping URL: https://www.mypetneedsthat.com/best-dry-dog-foods-guide/
2019-09-22 11:56:02,681 - root - INFO - Scraping URL: https://www.businessinsider.com/best-dog-food
2019-09-22 11:56:02,686 - root - INFO - Scraping URL: https://www.akc.org/expert-advice/nutrition/best-dog-food-choosing-whats-right-for-your-dog/
2019-09-22 11:56:02,689 - root - INFO - Scraping URL: https://www.amazon.com/Best-Sellers-Pet-Supplies-Dry-Dog-Food/zgbs/pet-supplies/2975360011
2019-09-22 11:56:02,690 - root - INFO - Scraping URL: https://www.chewy.com/b/food-332
2019-09-22 11:56:26,122 - root - INFO - Scraping URL: https://www.petco.com/shop/en/petcostore/category/dog/dog-food
2019-09-22 11:56:26,123 - root - INFO - Scraping URL: https://www.petflow.com/dog/food
2019-09-22 11:56:26,843 - root - INFO - Scraping URL: https://www.dogfoodadvisor.com/
2019-09-22 11:56:27,735 - root - INFO - Scraping URL: https://www.petsmart.com/dog/food/dry-food/
2019-09-22 11:56:27,737 - root - INFO - Scraping URL: https://www.petsmart.com/dog/food/
2019-09-22 11:56:27,738 - root - INFO - Scraping URL: https://www.purina.com/dogs/dog-food
2019-09-22 11:56:28,635 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=fBABfWqSN2I
2019-09-22 11:56:31,757 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=7P85BMCCboI
2019-09-22 11:56:36,807 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=az0ktsWYydw
2019-09-22 11:56:39,645 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=njJ99wPByy4
2019-09-22 11:56:42,571 - root - INFO - Scraping URL: https://nypost.com/video/homeless-man-and-his-dog-reuniting-is-pure-joy/
2019-09-22 11:56:45,156 - root - INFO - Scraping URL: /aclk?sa=l&ai=DChcSEwjRyYG5h-TkAhUM1WQKHSiFASYYABAAGgJwag&sig=AOD64_2IRYpCakgEzR3BK1oqeuLCVa3mjA&adurl=&rct=j&q=
2019-09-22 11:56:45,157 - root - INFO - Scraping URL: https://www.purina.com/dogs/dog-food
2019-09-22 11:56:45,867 - root - INFO - Scraping URL: https://en.wikipedia.org/wiki/Dog_food
2019-09-22 11:56:45,872 - root - INFO - Scraping URL: https://www.hillspet.com/dog-food
2019-09-22 11:56:45,876 - root - INFO - Scraping URL: https://www.smithsfoodanddrug.com/pl/dog-food/11103
2019-09-22 11:57:10,321 - root - INFO - Scraping URL: https://www.canidae.com/dog-food/
2019-09-22 11:57:10,325 - root - INFO - Scraping URL: https://www.petcarerx.com/dog/food-nutrition
2019-09-22 11:57:11,222 - root - INFO - Scraping URL: https://www.businessinsider.com/best-dog-food
2019-09-22 11:57:11,223 - root - INFO - Scraping URL: https://www.tractorsupply.com/tsc/catalog/dog-food
2019-09-22 11:57:12,249 - root - INFO - Scraping URL: https://www.thehonestkitchen.com/dog-food
2019-09-22 11:57:12,253 - root - INFO - Scraping URL: https://www.boxed.com/products/category/418/dog-food
2019-09-22 11:57:13,171 - root - INFO - Scraping URL: https://lifesabundance.com/category/dogfood.aspx
2019-09-22 11:57:13,174 - root - INFO - Scraping URL: //www.googleadservices.com/pagead/aclk?sa=L&ai=DChcSEwj5_NHFh-TkAhWTr-wKHSgSDVMYABAAGgJwag&ohost=www.google.com&cid=CAASEuRoai4G0R8MNbToVnZKzozmNA&sig=AOD64_10tA_ESFCwAHTPgPUTDsInBgYwEQ&adurl=&rct=j&q=
2019-09-22 11:57:13,178 - root - INFO - Scraping URL: https://freshpet.com/why-freshpet/
2019-09-22 11:57:13,901 - root - INFO - Scraping URL: https://pet-food.thecomparizone.com/?var1=82002114870&var2=381760664839&var4&var5=b&var7=1234567890&utm_source=google&utm_medium=cpc
None
Traceback (most recent call last):
  File "C:\Users\rot\Anaconda3\lib\site-packages\serpscrap\csv_writer.py", line 14, in write
    w.writerow(row)
  File "C:\Users\rot\Anaconda3\lib\csv.py", line 155, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
  File "C:\Users\rot\Anaconda3\lib\csv.py", line 151, in _dict_to_list
    + ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: 'url', 'encoding', 'meta_robots', 'meta_title', 'text_raw', 'last_modified', 'status'
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\Anaconda3\lib\site-packages\serpscrap\csv_writer.py in write(self, file_name, my_dict)
     13                 for row in my_dict[0:]:
---> 14                     w.writerow(row)
     15         except Exception:

~\Anaconda3\lib\csv.py in writerow(self, rowdict)
    154     def writerow(self, rowdict):
--> 155         return self.writer.writerow(self._dict_to_list(rowdict))
    156 

~\Anaconda3\lib\csv.py in _dict_to_list(self, rowdict)
    150                 raise ValueError("dict contains fields not in fieldnames: "
--> 151                                  + ", ".join([repr(x) for x in wrong_fields]))
    152         return (rowdict.get(key, self.restval) for key in self.fieldnames)

ValueError: dict contains fields not in fieldnames: 'url', 'encoding', 'meta_robots', 'meta_title', 'text_raw', 'last_modified', 'status'

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
<ipython-input-16-3f66e8511348> in <module>
      8 scrap = serpscrap.SerpScrap()
      9 scrap.init(config=config.get(), keywords=keywords)
---> 10 scrap.as_csv('/tmp/output')

~\Anaconda3\lib\site-packages\serpscrap\serpscrap.py in as_csv(self, file_path)
    146         writer = CsvWriter()
    147         self.results = self.run()
--> 148         writer.write(file_path + '.csv', self.results)
    149 
    150     def scrap_serps(self):

~\Anaconda3\lib\site-packages\serpscrap\csv_writer.py in write(self, file_name, my_dict)
     15         except Exception:
     16             print(traceback.print_exc())
---> 17             raise Exception

Exception: 

Many thanks !!

GefenPuravida avatar Sep 22 '19 09:09 GefenPuravida