Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Dropdown Selector for NIH Controlled Vocabulary in Keywords #267

Open
Saixel opened this issue Apr 9, 2024 · 7 comments · May be fixed by #318
Open

Implement Dropdown Selector for NIH Controlled Vocabulary in Keywords #267

Saixel opened this issue Apr 9, 2024 · 7 comments · May be fixed by #318
Assignees
Labels
FY25 Sprint 4 FY25 Sprint 4 FY25 Sprint 5 FY25 sprint 5 FY25 Sprint 6 FY25 Sprint 6 FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) NIH CAFE Issues associated with the NIH CAFE project Size: 0.5 Type: Feature

Comments

@Saixel
Copy link

Saixel commented Apr 9, 2024

Background

As part of our ongoing collaboration with the CAFE project team, we have identified a need to refine the user experience in selecting controlled vocabulary terms for dataset keywords. This initiative aims to align the Dataverse metadata input process with standard vocabularies and enhance data discoverability and consistency.

Feature Request

Implement a selector (dropdown, box options, widget, etc) to allow users to select and add terms from the NIH controlled vocabulary glossary as keywords.

Current State:

  • The 'Keyword' metadata section allows for manual text entry, with the option to add additional input fields.
  • There is a textual prompt directing users to the NIH glossary website for keyword selection.

Desired Functionality:

  • Replace the manual entry system with a dropdown selector or a similar UI component.
  • Dynamically populate options within the selector based on the NIH controlled vocabulary glossary.

Justification

The CAFE project team requires a more standardized and error-proof method for keyword selection to ensure metadata quality and consistency. This enhancement will support users in accurately tagging datasets, thus facilitating better data curation and searchability.

Implementation Considerations

  • Explore the use of "Controlled Vocabulary URL" for dynamic term loading from the NIH glossary.
  • Consider the integration of a resource that has been prepared with each keyword, a description, and a URL, possibly using this as a CSV or similar format to load selector options.
  • The selector UI must be intuitive and should support multiple term additions as per dataset requirements.
  • Backend integration must ensure correct storage and handling of the selected vocabulary terms.

Additional Context

This request is driven by user feedback and the project's commitment to improving data quality and curation practices within the CAFE project's use of Dataverse. We have already compiled a comprehensive list, which includes each keyword, its description, and the associated URL, ready to be utilized for the selector feature.

@Saixel Saixel added Type: Feature NIH CAFE Issues associated with the NIH CAFE project labels Apr 9, 2024
@Saixel Saixel self-assigned this Apr 9, 2024
@pdurbin
Copy link
Member

pdurbin commented Apr 10, 2024

Related:

However, I checked with @Saixel and he plans to implement this using a custom metadata block for the CAFE project rather than attempting to modify the keyword field in the citation block (which is what the issue above is about).

He said there are almost 300 controlled vocabulary values.

@scolapasta
Copy link

First, this should be moved to the harvard dataverse repo, as it should not require any code in the core.

Second, we're wondering about this:
"Dynamically populate options within the selector based on the NIH controlled vocabulary glossary"

Is the idea to haver these values read from an existing API? If so we would use the external CV functionality and the best next step woiuld be a spike to use this API and make sure there are not any unexpected behavior. (that spike would likely be a size 10, for someone who already has experience with the external CV functionality)
If not, and it's just using our external CV functionality, then all that needs to be done is add the values to the appropriate tsv file, can be sized as a 3.

@Saixel Saixel transferred this issue from IQSS/dataverse Apr 24, 2024
@Saixel
Copy link
Author

Saixel commented Apr 24, 2024

Related:

However, I checked with @Saixel and he plans to implement this using a custom metadata block for the CAFE project rather than attempting to modify the keyword field in the citation block (which is what the issue above is about).

He said there are almost 300 controlled vocabulary values.

@pdurbin Thanks for pointing out the related issue. My initial approach was to use a custom metadata block to avoid changing the current keyword block structure. However, I see in the comment in IQSS/dataverse#10288 that a similar case is suggested by implementing an autocomplete function. Our goal is to present a list of options for keyword selection from the prepared terms in a CSV. So either through a dropdown or autocomplete, either option could be a viable solution. If it's okay with you, we can dig deeper into this topic as we work on this implementation.

@Saixel
Copy link
Author

Saixel commented Apr 24, 2024

First, this should be moved to the harvard dataverse repo, as it should not require any code in the core.

Second, we're wondering about this: "Dynamically populate options within the selector based on the NIH controlled vocabulary glossary"

Is the idea to haver these values read from an existing API? If so we would use the external CV functionality and the best next step woiuld be a spike to use this API and make sure there are not any unexpected behavior. (that spike would likely be a size 10, for someone who already has experience with the external CV functionality) If not, and it's just using our external CV functionality, then all that needs to be done is add the values to the appropriate tsv file, can be sized as a 3.

@scolapasta The issue has been moved to the Harvard Dataverse repo as per your guidance (thanks for pointing this out). Regarding the "Dynamically populate options within the selector based on the NIH controlled vocabulary glossary" feature, I'd like to clarify that we don't have an API. Instead, we have a CSV with a list of almost 300 terms. If we can use the external CV functionality you mentioned for this purpose, I would appreciate any documentation or pointers to existing implementations to explore and test this further.

@pdurbin
Copy link
Member

pdurbin commented Apr 25, 2024

I'd like to clarify that we don't have an API. Instead, we have a CSV with a list of almost 300 terms. If we can use the external CV functionality you mentioned for this purpose

I would recommend playing around with the configuring Author Affiliation to look up from ROR. For config advice, please see IQSS/dataverse#10331 (comment)

That said, this feature depends on an external API (like the ROR API). So you'd need to build and host that API somehow.

It might be easier to use the database and put the 300 values in a controlled vocabulary. But if you have a plan for how to build an API and where to host it, it should be do-able. 😄

@Saixel Saixel added the Size: 3 A percentage of a sprint. label Apr 25, 2024
@Saixel Saixel added Size: 10 A percentage of a sprint. and removed Size: 3 A percentage of a sprint. labels Jul 17, 2024
@Saixel
Copy link
Author

Saixel commented Jul 31, 2024

After further discussion, we've decided to expedite the NIH controlled vocabulary integration by creating a new custom metadata block with a dropdown for CCH terms, using our prepared list. This approach will help us avoid the complexity and longer development time of modifying the existing keyword metadata block.

@Saixel Saixel added Size: 33 A percentage of a sprint. and removed Size: 10 A percentage of a sprint. labels Aug 14, 2024
@cmbz cmbz added the FY25 Sprint 4 FY25 Sprint 4 label Aug 14, 2024
@cmbz cmbz added the FY25 Sprint 5 FY25 sprint 5 label Aug 28, 2024
@Saixel Saixel added Size: 10 A percentage of a sprint. and removed Size: 33 A percentage of a sprint. labels Sep 11, 2024
@cmbz cmbz added the FY25 Sprint 6 FY25 Sprint 6 label Sep 11, 2024
@Saixel Saixel added the Status: Needs Input Applied to issues in need of input from someone currently unavailable label Sep 13, 2024
@cmbz cmbz added the FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) label Sep 25, 2024
@pdurbin pdurbin removed the Status: Needs Input Applied to issues in need of input from someone currently unavailable label Oct 7, 2024
@Saixel Saixel added Size: 3 A percentage of a sprint. and removed Size: 10 A percentage of a sprint. labels Oct 9, 2024
@cmbz cmbz added the FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) label Oct 9, 2024
@jggautier
Copy link
Collaborator

jggautier commented Oct 15, 2024

Hi all. In a meeting today, Sonia, Emily, Alexis, Ceilyn and I talked about this during a Zoom meeting and I was asked to write in this GitHub issue what I mentioned during the meeting.

There's evidence that other collection administrators (in addition to our colleagues at CAFE) would like their depositors and curators to be able to choose values from a list, like a drop down menu, instead of typing in terms and pasting in term URIs, as well as being able to enter their own values if nothing in the list is appropriate.

So letting collection admins adjust a field, like the keyword field, so that it suggests terms from a particular vocabulary, would be a helpful feature for other groups who manage collections.

But this can be more complex and take longer, like @Saixel wrote earlier in this issue. So in some cases where the collection is within an installation that has other collections with different needs, like collections in Harvard Dataverse, custom metadata blocks have been created, like what's being discussed in this GitHub issue.

In other cases, the collection admins ask depositors to enter metadata in a user interface that's separate from the one that the Dataverse repository uses, where a field like the keyword field has been changed to let depositors select terms suggested from a particular vocabulary and let depositors enter their own terms. And then Dataverse APIs are used to push that metadata to the Dataverse installation.

There are challenges with both of these approaches, too. In one of the GitHub issues I used to track work on one of CAFE's custom metadata blocks, I wrote about how fields in CAFE's custom metadata blocks overlap with fields in other metadata blocks, and that we'd want to resolve this design debt eventually.

@Saixel Saixel added Size: 0.5 and removed Size: 3 A percentage of a sprint. labels Oct 23, 2024
@Saixel Saixel pinned this issue Oct 23, 2024
@cmbz cmbz added the FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) label Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 4 FY25 Sprint 4 FY25 Sprint 5 FY25 sprint 5 FY25 Sprint 6 FY25 Sprint 6 FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) NIH CAFE Issues associated with the NIH CAFE project Size: 0.5 Type: Feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants