response from _get_results(query) contains NoneType which leads to parsing Fail #35

stRudolph · 2023-02-09T09:13:31Z

Hi Matt,

trying to scrape from google, I followed your blogpost on 3 lines google scraping and got the following error:

AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 results = seo.get_serps("stupid")
      2 print(results)
File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:144, in get_serps(query, output)
    133 """Return the first 10 Google search results for a given query.
    134 
    135 Args:
   (...)
    140     results (dict): Results of query.
    141 """
    143 response = _get_results(query)
--> 144 results = _parse_search_results(response)
    146 if results:
    147     if output == "dataframe":

File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:124, in _parse_search_results(response)
    118 output = []
    120 for result in results:
    121     item = {
    122         'title': result.find(css_identifier_title, first=True).text,
    123         'link': result.find(css_identifier_link, first=True).attrs['href'],
--> 124         'text': result.find(css_identifier_text, first=True).text
...
    125     }
    127     output.append(item)
    129 return output

AttributeError: 'NoneType' object has no attribute 'text'

then i tried your other blogpost scrape with python, which is not relying on the ecommercetools package, and followed it to the T.
here is the interesting part:

results = google_search("stupid")
results

yields normal output, rerunning this (jupyter cell) with keyword

results = google_search("allergy")
results

yields

AttributeError                            Traceback (most recent call last)
Cell In[9], line 1
----> 1 results = google_search("allergy")
      2 results

Cell In[8], line 3, in google_search(query)
      1 def google_search(query):
      2     response = get_results(query)
----> 3     return parse_results(response)

Cell In[7], line 17, in parse_results(response)
     10 output = []
     12 for result in results:
     14     item = {
     15         'title': result.find(css_identifier_title, first=True).text,
     16         'link': result.find(css_identifier_link, first=True).attrs['href'],
---> 17         'text': result.find(css_identifier_text, first=True).text
     18     }
     20     output.append(item)
     22 return output

AttributeError: 'NoneType' object has no attribute 'text'

So sometimes, the result.find(css_identifier_text, first=True): yields True , but NoneType ??
I have no Idea, under which circumstances this NoneType arises, but the behavior is as follows:
the seo.get_serps() from ecommercetools consistently throws the error, the "hand written" equivalent is keyword sensitive, e.g. "allergy" throws the error, "keyword sensitive" does not.

The text was updated successfully, but these errors were encountered:

stRudolph changed the title ~~response from _get_results(query) yields NoneType which leads to parsing Fail~~ response from _get_results(query) contains NoneType which leads to parsing Fail Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

response from _get_results(query) contains NoneType which leads to parsing Fail #35

response from _get_results(query) contains NoneType which leads to parsing Fail #35

stRudolph commented Feb 9, 2023 •

edited

Loading

response from _get_results(query) contains NoneType which leads to parsing Fail #35

response from _get_results(query) contains NoneType which leads to parsing Fail #35

Comments

stRudolph commented Feb 9, 2023 • edited Loading

stRudolph commented Feb 9, 2023 •

edited

Loading