Replace Pandas with Arrow #22

otsaloma · 2023-05-31T20:14:00Z

We're notably using Pandas for DataFrame.read_csv. That could probably be replaced with pyarrow.csv.read_csv, which would allow removing Pandas from the list of dependencies, leaving it as an optional dependency only needed for the from_pandas and to_pandas methods (with Pandas imported within the method body).

Arrow seems to be a lot faster at reading CSV files and we need it anyway for reading and writing Parquet files, so it would probably allow dropping something we've never liked and have sought to replace.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace Pandas with Arrow #22

Replace Pandas with Arrow #22

otsaloma commented May 31, 2023

Replace Pandas with Arrow #22

Replace Pandas with Arrow #22

Comments

otsaloma commented May 31, 2023