Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time to switch to BOLDv5 #110

Open
nickschurch opened this issue Dec 4, 2024 · 9 comments
Open

Time to switch to BOLDv5 #110

nickschurch opened this issue Dec 4, 2024 · 9 comments
Assignees
Labels
API-related API-side problem needing work around critical Something to fix URGENTLY
Milestone

Comments

@nickschurch
Copy link

nickschurch commented Dec 4, 2024

I'm trying to use this package to get a bunch of COI sequences from BOLD. I know the code works because sometimes it returns things, but mostly I'm plagued with intermittent
Warning: Content was type '' when it should've been type 'text/html; charset=utf-8' errors, or, for larger groups of sequences, occasionally The request timed out, see 'If a request times out'. returning partial output

For example:

> y <- bold_tax_name("Archaeognatha")
Warning: Content was type '' when it should've been type 'text/html; charset=utf-8'

Is this normal? Is something wrong with the bold servers? Can I set the timeout length of the maximum number of sequences to return or something?

@salix-d
Copy link
Collaborator

salix-d commented Dec 4, 2024

Is there a time when it happens more often?

I can try reaching out to them, because if it can happen when only asking for one species, it's definitively on their server side.
I know they were working on a new version of the API, might be related to that 🤔

@salix-d
Copy link
Collaborator

salix-d commented Dec 4, 2024

Oooooooooooh, the new api is finaly out! 😮
Had asked them to let me know so I could update 😢

Well, I'll check how much has changed and see how long it will take to update.

@salix-d salix-d self-assigned this Dec 4, 2024
@salix-d salix-d added API-related API-side problem needing work around critical Something to fix URGENTLY labels Dec 4, 2024
@salix-d salix-d added this to the v1.5 milestone Dec 4, 2024
@salix-d salix-d changed the title Nothing but errors and warnings.... Time to switch to BOLDv5 Dec 4, 2024
@nickschurch
Copy link
Author

Is there a time when it happens more often?

I can try reaching out to them, because if it can happen when only asking for one species, it's definitively on their server side. I know they were working on a new version of the API, might be related to that 🤔

Yesterday afternoon was particularly bad, evening not so much and I managed to get most of what I was after (all COI-5p sequences for Animalia), but I'm pretty sure it's not all of it which is annoying for reproducibility.

@nickschurch
Copy link
Author

nickschurch commented Dec 5, 2024

BOLD is really odd. When I use the web interface to get records for the order Diptera, It lists 6,660,909 records. When I try to get these via the R tool, it times out, but I can loop over all the families under Diptera and query those individually. When I do, I get 6,241,574 records, failures for two families (Heterocheilidae & Braulidae), and timeouts for three families (Perissommatidae, Neminidae, Teratomyzidae).

Searching for the failed families on the website search interface returns 1 & 3 sequences respectively. Searching for the three families that are timing out on the website search interface returns no hits despite the fact I know there are some in the database I think because there are no public records (following the "sub-taxa" links from Animalia > Arthropoda > Insecta > Diptera, it lists the families (e.g., https://v4.boldsystems.org/index.php/Taxbrowser_Taxonpage?taxid=532727, but lists that there is no public data available for this family).

Using this tool to try and pull data for this family (where there is no public data) with bold_seq("Perissommatidae", marker = "COI-5P") returns 5056 sequences before timing out and saying it's only returned partial results. Looking at the head of these results is weird though, since the first one is Scyliorhinus canicula, a catfish(!!), not an insect at all. Something very weird is going on here.

See, this is why I hate this restricted kind of database and semi-closed-source system.

@salix-d
Copy link
Collaborator

salix-d commented Dec 5, 2024

I see their taxonomy broser is still using v4. There was some inconsistency between the taxonomy api and the seq/specimen one. They might still be working on fixing those tbh.

Regardless, it's pretty weird that it returns an unrelated taxon, I thought maybe they share a rank name, but not even x_x
ACTUALY, that's on the way their API (v4) work! when there's no match for the taxa it returns matches to the other parameters, in this case "marker=COP-5P", hence the huge result that make no sense.

@salix-d
Copy link
Collaborator

salix-d commented Dec 5, 2024

Well, it seems they decided to do their on package https://github.com/boldsystems-central/BOLDconnectR/ ...

@salix-d
Copy link
Collaborator

salix-d commented Dec 5, 2024

Searching for the failed families on the website search interface returns 1 & 3 sequences respectively. Searching for the three families that are timing out on the website search interface returns no hits despite the fact I know there are some in the database I think because there are no public records (following the "sub-taxa" links from Animalia > Arthropoda > Insecta > Diptera, it lists the families (e.g., https://v4.boldsystems.org/index.php/Taxbrowser_Taxonpage?taxid=532727, but lists that there is no public data available for this family).

Well, on their new web site, there's only one specimen for that one.
https://portal.boldsystems.org/result?query=Perissommatidae[tax]

@salix-d
Copy link
Collaborator

salix-d commented Dec 5, 2024

ACTUALY, that's on the way their API (v4) work! when there's no match for the taxa it returns matches to the other parameters, in this case "marker=COP-5P", hence the huge result that make no sense.

That's partially why they redid the whole API.
If you do try their new package, I'd be curious to hear if it provides what you need from them.

@nickschurch
Copy link
Author

I'll give it a go and see and let you know.

The code is working and giving me useful sequences eventually, I just have no idea whether it's complete at the end, after iterating over different taxonomic levels and multiple retrying to get around the BOLD database http errors, to try and get everything. And if it's not complete, I've got no idea what's missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-related API-side problem needing work around critical Something to fix URGENTLY
Projects
None yet
Development

No branches or pull requests

2 participants