[Feat] Import CSV files with column headers and correct column data types #4450

cmbarton · 2024-10-04T18:49:43Z

CSV files are the most common and widely readable file format for tabular data. But GRASS has no easy way to import them correctly.

db.in.ogr now imports CSV files with correct column headers (see #943). But it imports all columns as strings, regardless of the data in the columns. Because tables imported this way do not have any numeric columns they cannot be used to generate points or to connect with an existing spatial dataset via a common cat field

v.in.ascii can import CSV files and create vector points. It does correctly recognize data types in CSV columns--including a CAT column. But for unknown reasons it does not import the column headers. Of course, you can create new column headers, but this becomes a PITA with many columns--especially as it requires an arcane format to do so.

A grass extension v.in.csv does some of this. However, it requires a separate Python package, pyproj, to do so. It also does not automatically recognize data types in columns like v.in.ascii does, although you can specify which columns are numeric (integer or real). It also seems to assume that the CSV table only has point coordinates in latlon.

So db.in.ogr needs to recognize column data types, or allow a user to specify them, and v.in.ascii needs to recognize column headers. Ideally, perhaps, would be a combination of these--something that could import a CSV table correctly and optionally create vector points from that table, optionally identify a CAT column, and optionally allow a user to define column data types to override the automatic data types recognition.

Related issue: #943

neteler · 2024-10-04T21:43:49Z

Did you see https://grass.osgeo.org/grass-stable/manuals/db.in.ogr.html#import-csv-file (using .cvst file for column type recognition)?

cmbarton · 2024-10-04T21:51:41Z

Yes. I've done that. It is equally a pain, especially with multi-column tables and sometimes does not work correctly. Most other data analytical programs manage to open and read these ASCII text files pretty well. v.in.ascii comes very close and is only missing recognizing column names for creating points. If it had the ability to do this and the option to import the table without creating points, it would cover >95% of the cases.

rouault · 2024-10-05T16:44:22Z

The OGR CSV driver has a AUTODETECT_TYPE=YES open option

$ printf "id,float_val,my_str\n1,2.5,foo\n" > test.csv
$ ogrinfo test.csv -oo AUTODETECT_TYPE=YES  -al -q

Layer name: test
OGRFeature(test):1
  id (Integer) = 1
  float_val (Real) = 2.5
  my_str (String) = foo

cmbarton · 2024-10-05T17:31:31Z

Interesting. Could this be a flag in db.in.ogr? To test, is this something that can be entered in the ogr_doo argument?

rouault · 2024-10-05T18:11:33Z

To test, is this something that can be entered in the ogr_doo argument?

don't know the GRASS side of things, but from https://grass.osgeo.org/grass84/manuals/v.in.ogr.html, I assume this should be gdal_doo=AUTODETECT_TYPE=YES

cmbarton · 2024-10-06T02:00:49Z

Yes. This works!. It should be the default in db.in.ogr, perhaps with a flag to disable it and use a *.cvst instead.

Too bad that we can't do this with v.in.ogr, since it won't accept cvs files with defined x and y coord. columns.

But if v.in.ascii can just keep the header field, then that would be a good alternative too.

ecodiv · 2024-10-06T20:11:51Z

@cmbarton I fully agree with your points. One thing is that the column names are stored in the history of the vector layer.
I remember I used that in a Python script to rename the names of the attribute table back to the original names. It was something similar to the code below. An admittedly quick and dirty solution :-/

# Get the history, filter out the column names, split them into a list
a = list(gs.parse_command("v.info", map="AAA", flags="h").keys())[-1].split("|")

# Get a list with the column names of the vector layer imported using v.in.ascii
# and remove 'cat' from the list.
b = list(gs.parse_command("db.columns", table="AAA").keys())[1:]

# Replace the column names with the original names.
for id, name in enumerate(b):
    gs.parse_command("v.db.renamecolumn", map="AAA", column=f"{name},{a[id]}")

cmbarton · 2024-10-09T16:56:39Z

Thanks for the workaround .

cmbarton · 2024-10-25T21:15:43Z

Splitting this issue into 2 separate feature requests #4593 and #4594 because it deals with 2 different GRASS commands.

cmbarton added the enhancement New feature or request label Oct 4, 2024

This was referenced Oct 25, 2024

[Feat] db.in.ogr should recognize column data types when importing CSV files #4593

Open

[Feat] Enable v.in.ascii to read column names from ascii data table and use them for column names in attribute table created #4594

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Import CSV files with column headers and correct column data types #4450

[Feat] Import CSV files with column headers and correct column data types #4450

cmbarton commented Oct 4, 2024

neteler commented Oct 4, 2024

cmbarton commented Oct 4, 2024

rouault commented Oct 5, 2024

cmbarton commented Oct 5, 2024

rouault commented Oct 5, 2024

cmbarton commented Oct 6, 2024

ecodiv commented Oct 6, 2024

cmbarton commented Oct 9, 2024

cmbarton commented Oct 25, 2024

[Feat] Import CSV files with column headers and correct column data types #4450

[Feat] Import CSV files with column headers and correct column data types #4450

Comments

cmbarton commented Oct 4, 2024

neteler commented Oct 4, 2024

cmbarton commented Oct 4, 2024

rouault commented Oct 5, 2024

cmbarton commented Oct 5, 2024

rouault commented Oct 5, 2024

cmbarton commented Oct 6, 2024

ecodiv commented Oct 6, 2024

cmbarton commented Oct 9, 2024

cmbarton commented Oct 25, 2024