Simple, Light-Weight Data Frames for Python

Dataiter's DataFrame is a class for tabular data similar to R's data.frame, implementing all common operations to manipulate data. It is under the hood a dictionary of NumPy arrays and thus capable of fast vectorized operations. You can consider it to be a light-weight alternative to Pandas with a simple and consistent API. Performance-wise Dataiter relies on NumPy and Numba and is likely to be at best comparable to Pandas.

Installation

# Latest stable version
pip install -U dataiter

# Latest development version
pip install -U git+https://github.com/otsaloma/dataiter

# Numba (optional)
pip install -U numba

Dataiter optionally uses Numba to speed up certain operations. If you have Numba installed, Dataiter will use it automatically. It's currently not a hard dependency, so you need to install it separately.

Quick Start

>>> import dataiter as di
>>> data = di.read_csv("data/listings.csv")
>>> data.filter(hood="Manhattan", guests=2).sort(price=1).head()
.
        id      hood zipcode guests    sqft price
     int64    string  string  int64 float64 int64
  ──────── ───────── ─────── ────── ─────── ─────
0 42279170 Manhattan   10013      2     nan     0
1 42384530 Manhattan   10036      2     nan     0
2 18835820 Manhattan   10021      2     nan    10
3 20171179 Manhattan   10027      2     nan    10
4 14858544 Manhattan              2     nan    15
5 31397084 Manhattan   10002      2     nan    19
6 22289683 Manhattan   10031      2     nan    20
7  7760204 Manhattan   10040      2     nan    22
8 43292527 Manhattan   10033      2     nan    22
9 43268040 Manhattan   10033      2     nan    23
.

Documentation

https://dataiter.readthedocs.io/

If you're familiar with either dplyr (R) or Pandas (Python), the comparison table in the documentation will give you a quick overview of the differences and similarities in common operations.

https://dataiter.readthedocs.io/en/stable/comparison.html

Development

To install a virtualenv for development, use

make venv

or, for a specific Python version

make PYTHON=python3.X venv

Name		Name	Last commit message	Last commit date
Latest commit History 626 Commits
.github/workflows		.github/workflows
bin		bin
data		data
dataiter		dataiter
doc		doc
tools		tools
validation		validation
.flake8		.flake8
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
AUTHORS.md		AUTHORS.md
COPYING		COPYING
Makefile		Makefile
NEWS.md		NEWS.md
README.md		README.md
benchmark-versions.sh		benchmark-versions.sh
benchmark.py		benchmark.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple, Light-Weight Data Frames for Python

Installation

Quick Start

Documentation

Development

About

Releases 59

Packages

Contributors 3

Languages

License

otsaloma/dataiter

Folders and files

Latest commit

History

Repository files navigation

Simple, Light-Weight Data Frames for Python

Installation

Quick Start

Documentation

Development

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 59

Packages 0

Contributors 3

Languages

Packages