Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StudyResult.sourceDataSet extends InformationEntity.derivedFrom...I have a question about that. #159

Open
larrybabb opened this issue Jul 1, 2024 · 1 comment
Assignees
Labels
data modeling Questions, proposals, considerations concerning the core data model. help wanted Extra attention is needed question Further information is requested VA type definition Base ticket for defining and scoping a VA type

Comments

@larrybabb
Copy link
Contributor

@mbrush the InformationEntity.derivedFrom is an unordered array of InformationEntitys. However, the StudyResult.sourceDataset appears to be designed to override the array nature of it's parent derivedFrom property and make it a DataSet (not an array of DataSets).

I have referenced this here in the CohortAlleleFrequencyStudyResult schema (which is a direct copy of the sourceDataSet from the StudyResult).

While I get the idea of using derivedFrom as a representation of the dataset from which the StudyResult was attained, I think we need to weigh whether

  1. sourceDataset should be some type of RecordMetadata type
  2. InformationEntity.derivedFrom should NOT be an array but rather a single source (which in turn would have it's own derivedFrom)
  3. sourceDataset should NOT be extending derivedFrom to begin with

I'm in favor of #2.

For now, I am going to make StudyResult.sourceDataset an array of DataSet types and assume folks will only put 1 entry in the array. But this is not a reasonable final solution IMO.

@larrybabb larrybabb added help wanted Extra attention is needed question Further information is requested VA type definition Base ticket for defining and scoping a VA type data modeling Questions, proposals, considerations concerning the core data model. labels Jul 1, 2024
@mbrush
Copy link
Contributor

mbrush commented Jul 4, 2024

hmm - I actually favor the solution you implemented (make StudyResult.sourceDatasets an array of DataSets . . . and if everything in the StudyResult was derived form a single DataSet, then there will be only one member of this array).

I don't think your solution 2 above is right, because the use case for allowing multiple values here isn't to track a linear trail of 1:1 derivations, as your comments imply. The idea here is that InformationEntities can be derived from multiple direct 'source' InformationEntities. e.g a CAF StudyResult may include data about its focusAllele that was pulled from two distinct DataSets produced by a given study.

I don't think your solution 1 is right because the sourceDataSet property is about the derivation of information content found in a StudyResult, not about specific concrete serializations of the the StudyResult (which is what the RecordMetada object is for).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data modeling Questions, proposals, considerations concerning the core data model. help wanted Extra attention is needed question Further information is requested VA type definition Base ticket for defining and scoping a VA type
Projects
Development

No branches or pull requests

2 participants