Compare header and schema #127
Labels
complexity:high
Likely complex to implement
enhancement
New feature or request
function:read_resource
Function read_resource()
Update: this can now be defined in
fieldMatch
#216It is possible for an (invalid) Data Package to have discrepancies between the schema and the actual data. E.g. defining more/less columns or in a different order.
read_resource()
will silently let those through when the data types of the switched columns are compatible, which can lead to issues for the user (e.g. lat/lon are silently switched). Only when the data types are incompatible, willreadr
return a parsing issue.To avoid passing these issues silently,
read_resource()
should compare the headers of the file with the schema and raise an error if those are not exactly the same. This implements the following spec:Implementation considerations:
Only compare when
replace_null(dialect$header, TRUE)
(i.e. it is not false). It might be useful to definedialect_header
and reuse it here:frictionless-r/R/read_resource.R
Line 356 in 421c22f
The specs say that case should NOT be considered, so both the field names and col_names should be lowercased before comparing
To allow comparison, the header line of the file should be read separately from the main
read_delim()
.read_lines()
could be used, butdelim
andencoding/locale
might have to be passed too.A resource can contain multiple files (e.g.
observations_1
,observations_2
). Either all files are read and compared or only the last once, cf.add_resource()
:On a mismatch (fieldnames, different order, more or less), an error should be returned, similar to
check_schema()
:frictionless-r/R/check_schema.R
Lines 65 to 69 in 421c22f
Add a section validation to explain what we validate:
The text was updated successfully, but these errors were encountered: