utf-8' codec can't decode bytes in position 31608-31609 #60

Uziel9999 · 2020-08-04T23:41:51Z

I am using Jupyter notebook to run the script. I used the example from this site, but with an actual company website. This is on windows 10 using the latest version of Anaconda.

What am I doing incorrectly?

Input:
from seoanalyzer import analyze
site = 'http://www.site.com'
sitemap = None
output = analyze(site, sitemap)
print(output)

Results:

UnicodeDecodeError Traceback (most recent call last)
in
4 sitemap = None
5
----> 6 output = analyze(site, sitemap)
7 print(output)

C:\ProgramData\Anaconda3\lib\site-packages\seoanalyzer\analyzer.py in analyze(url, sitemap_url)
15 site = Website(url, sitemap_url)
16
---> 17 site.crawl()
18
19 for p in site.crawled_pages:

C:\ProgramData\Anaconda3\lib\site-packages\seoanalyzer\website.py in crawl(self)
63 continue
64
---> 65 page.analyze()
66
67 self.content_hashes[page.content_hash].add(page.url)

C:\ProgramData\Anaconda3\lib\site-packages\seoanalyzer\page.py in analyze(self, raw_html)
170 return
171 else:
--> 172 raw_html = page.data.decode('utf-8')
173
174 self.content_hash = hashlib.sha1(raw_html.encode('utf-8')).hexdigest()

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 31608-31609: invalid continuation byteAdd any other context about the problem here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utf-8' codec can't decode bytes in position 31608-31609 #60

utf-8' codec can't decode bytes in position 31608-31609 #60

Uziel9999 commented Aug 4, 2020

utf-8' codec can't decode bytes in position 31608-31609 #60

utf-8' codec can't decode bytes in position 31608-31609 #60

Comments

Uziel9999 commented Aug 4, 2020

Results: