Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

response from _get_results(query) contains NoneType which leads to parsing Fail #35

Open
stRudolph opened this issue Feb 9, 2023 · 0 comments

Comments

@stRudolph
Copy link

stRudolph commented Feb 9, 2023

Hi Matt,

trying to scrape from google, I followed your blogpost on 3 lines google scraping and got the following error:

AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 results = seo.get_serps("stupid")
      2 print(results)
File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:144, in get_serps(query, output)
    133 """Return the first 10 Google search results for a given query.
    134 
    135 Args:
   (...)
    140     results (dict): Results of query.
    141 """
    143 response = _get_results(query)
--> 144 results = _parse_search_results(response)
    146 if results:
    147     if output == "dataframe":

File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:124, in _parse_search_results(response)
    118 output = []
    120 for result in results:
    121     item = {
    122         'title': result.find(css_identifier_title, first=True).text,
    123         'link': result.find(css_identifier_link, first=True).attrs['href'],
--> 124         'text': result.find(css_identifier_text, first=True).text
...
    125     }
    127     output.append(item)
    129 return output

AttributeError: 'NoneType' object has no attribute 'text'

then i tried your other blogpost scrape with python, which is not relying on the ecommercetools package, and followed it to the T.
here is the interesting part:

results = google_search("stupid")
results

yields normal output, rerunning this (jupyter cell) with keyword

results = google_search("allergy")
results

yields

AttributeError                            Traceback (most recent call last)
Cell In[9], line 1
----> 1 results = google_search("allergy")
      2 results

Cell In[8], line 3, in google_search(query)
      1 def google_search(query):
      2     response = get_results(query)
----> 3     return parse_results(response)

Cell In[7], line 17, in parse_results(response)
     10 output = []
     12 for result in results:
     14     item = {
     15         'title': result.find(css_identifier_title, first=True).text,
     16         'link': result.find(css_identifier_link, first=True).attrs['href'],
---> 17         'text': result.find(css_identifier_text, first=True).text
     18     }
     20     output.append(item)
     22 return output

AttributeError: 'NoneType' object has no attribute 'text'

So sometimes, the result.find(css_identifier_text, first=True): yields True , but NoneType ??
I have no Idea, under which circumstances this NoneType arises, but the behavior is as follows:
the seo.get_serps() from ecommercetools consistently throws the error, the "hand written" equivalent is keyword sensitive, e.g. "allergy" throws the error, "keyword sensitive" does not.

@stRudolph stRudolph changed the title response from _get_results(query) yields NoneType which leads to parsing Fail response from _get_results(query) contains NoneType which leads to parsing Fail Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant