We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug When crawling websites that have non-ascii characters in the URL (for example the character é), I get this error:
é
UnicodeEncodeError: 'ascii' codec can't encode characters when calling self._output(request.encode('ascii'))
To Reproduce Steps to reproduce the behavior:
seoanalyze https://www.archi-graph.com/
Expected behavior Program should run as normal
Desktop (please complete the following information):
Smartphone (please complete the following information): N/A
Additional context I propose a fix that sanitizes all URLs passed to the get method in the http module
get
http
The text was updated successfully, but these errors were encountered:
import certifi import urllib3 from urllib import parse class Http(): def __init__(self): user_agent = {'User-Agent': 'Mozilla/5.0'} self.http = urllib3.PoolManager( timeout=urllib3.Timeout(connect=1.0, read=2.0), cert_reqs='CERT_REQUIRED', ca_certs=certifi.where(), headers=user_agent ) def get(self, url): sanitized_url = self.sanitize_url(url) return self.http.request('GET', sanitized_url) @staticmethod def sanitize_url(url): scheme, netloc, path, query, fragment = parse.urlsplit(url) path = parse.quote(path) sanitized_url = parse.urlunsplit((scheme, netloc, path, query, fragment)) return sanitized_url http = Http()
Adding the sanitize_url static method fixes the issue described above.
sanitize_url
Tested successfully by running seoanalyze https://www.archi-graph.com/ in the command line.
Sorry, something went wrong.
Nice. Thank you for this. I can get your fix dropped in the next release.
No branches or pull requests
Describe the bug
When crawling websites that have non-ascii characters in the URL (for example the character
é
), I get this error:UnicodeEncodeError: 'ascii' codec can't encode characters when calling self._output(request.encode('ascii'))
To Reproduce
Steps to reproduce the behavior:
seoanalyze https://www.archi-graph.com/
Expected behavior
Program should run as normal
Desktop (please complete the following information):
Smartphone (please complete the following information):
N/A
Additional context
I propose a fix that sanitizes all URLs passed to the
get
method in thehttp
moduleThe text was updated successfully, but these errors were encountered: