You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am scraping a few dozen URLs using splashr.
The code runs and completes fine when run directly from RStudio Server on my Digital Ocean Droplet. However, when it runs from a cron job it always fails when reading the 24th URL with this error:
Error in curl::curl_fetch_memory(url, handle = handle) : Recv failure: Connection reset by peer
Even when it works when running the code directly from RStudio, I see this error the first 14 scrapes:
QNetworkReplyImplPrivate::error: Internal problem, this method must only be called once.
But it completes OK.
Is there some memory management or garbage collection that I'm supposed to be doing between scrapes? What would account for the success of a direct run and the failure of the same script being run by a cron job? In short, how do I avoid the curl error mentioned above?
I am scraping a few dozen URLs using splashr.
The code runs and completes fine when run directly from RStudio Server on my Digital Ocean Droplet. However, when it runs from a cron job it always fails when reading the 24th URL with this error:
Error in curl::curl_fetch_memory(url, handle = handle) : Recv failure: Connection reset by peer
Even when it works when running the code directly from RStudio, I see this error the first 14 scrapes:
QNetworkReplyImplPrivate::error: Internal problem, this method must only be called once.
But it completes OK.
Is there some memory management or garbage collection that I'm supposed to be doing between scrapes? What would account for the success of a direct run and the failure of the same script being run by a cron job? In short, how do I avoid the curl error mentioned above?
The text was updated successfully, but these errors were encountered: