Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to get splashr to render content #13

Open
david-jankoski opened this issue Feb 2, 2018 · 1 comment
Open

unable to get splashr to render content #13

david-jankoski opened this issue Feb 2, 2018 · 1 comment

Comments

@david-jankoski
Copy link

Hey Bob,

Thank you for making this (and all your other) pkg - i'm a big fan and use it regularly to scrape stuff. So much easier and faster than the selenium route. This might be the wrong channel to ask this so please feel free to ignore/close this issue.

Here's my problem:
Recently Spencer Graves asked this question on the R-help mailing list - how to scrape this site
https://www.battleforthenet.com/scoreboard/

At first i thought ahh it would be a breeze with splashr so i tried

library("splashr")
url <- "https://www.battleforthenet.com/scoreboard"

page <- splashr::render_html(url = url, wait = 10)

res <- 
  page %>% 
  rvest::html_nodes("#senate") %>% 
  rvest::html_nodes(".politicians")

# returns
> {xml_nodeset (0)}

but this doesn't seem to work. Trying to see what's going on

res <- 
  page %>% 
  rvest::html_nodes("#senate") %>% 
  xml2::html_structure()

# returns
> [[1]]
<div#senate .politicians>
  {text}
  <h2>
    {text}
    <em>
      {text}
    {text}
  {text}
  <team-legend>
  <p>
    {text}
  {text}
  <politician-card [v-for, :politician, v-if]>

I'm not very knowledgable of web-dev things so i apologize if this might be something obvious. I'm trying to understand what's going on and why does this not work with splashr. My best guess would be that there is some kind of secret js mumbo jumbo going on which manages to keep away the real content from splashr...

I would be thankful if you could just point me in some direction on how to do this.
Thanks again for all your work!
david

@david-jankoski
Copy link
Author

ps so i tried "seeing" what splashr sees and as expected there is no table really there.

senate_page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant