Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Pandas with Arrow #22

Open
otsaloma opened this issue May 31, 2023 · 0 comments
Open

Replace Pandas with Arrow #22

otsaloma opened this issue May 31, 2023 · 0 comments

Comments

@otsaloma
Copy link
Owner

We're notably using Pandas for DataFrame.read_csv. That could probably be replaced with pyarrow.csv.read_csv, which would allow removing Pandas from the list of dependencies, leaving it as an optional dependency only needed for the from_pandas and to_pandas methods (with Pandas imported within the method body).

Arrow seems to be a lot faster at reading CSV files and we need it anyway for reading and writing Parquet files, so it would probably allow dropping something we've never liked and have sought to replace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant