Skip to content

Commit

Permalink
Rewrite intros to focus on data frames
Browse files Browse the repository at this point in the history
  • Loading branch information
otsaloma committed Dec 14, 2024
1 parent a9426de commit 5089a64
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 60 deletions.
44 changes: 14 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,16 @@
Python Classes for Data Manipulation
====================================
Simple, Light-Weight Data Frames for Python
===========================================

[![Test](https://github.com/otsaloma/dataiter/workflows/Test/badge.svg)](https://github.com/otsaloma/dataiter/actions)
[![Documentation Status](https://readthedocs.org/projects/dataiter/badge/?version=latest)](https://dataiter.readthedocs.io/en/latest/?badge=latest)
[![PyPI](https://img.shields.io/pypi/v/dataiter.svg)](https://pypi.org/project/dataiter/)
[![PyPI](https://img.shields.io/pypi/v/dataiter.svg)](https://pypi.org/project/dataiter)
[![Downloads](https://pepy.tech/badge/dataiter/month)](https://pepy.tech/project/dataiter)

Dataiter currently includes the following classes.

**`DataFrame`** is a class for tabular data similar to R's `data.frame`
or `pandas.DataFrame`. It is under the hood a dictionary of NumPy arrays
and thus capable of fast vectorized operations. You can consider this to
be a light-weight alternative to Pandas with a simple and consistent
API. Performance-wise Dataiter relies on NumPy and Numba and is likely
to be at best comparable to Pandas.

**`ListOfDicts`** is a class useful for manipulating data from JSON
APIs. It provides functionality similar to libraries such as
Underscore.js, with manipulation functions that iterate over the data
and return a shallow modified copy of the original. `attd.AttributeDict`
is used to provide convenient access to dictionary keys.

**`GeoJSON`** is a simple wrapper class that allows reading a GeoJSON
file into a `DataFrame` and writing a data frame to a GeoJSON file. Any
operations on the data are thus done with methods provided by the data
frame class. Geometry is read as-is into the "geometry" column, but no
special geometric operations are currently supported.
Dataiter's **`DataFrame`** is a class for tabular data similar to R's
`data.frame`, implementing all common operations to manipulate data. It
is under the hood a dictionary of NumPy arrays and thus capable of fast
vectorized operations. You can consider it to be a light-weight
alternative to Pandas with a simple and consistent API. Performance-wise
Dataiter relies on NumPy and Numba and is likely to be at best
comparable to Pandas.

## Installation

Expand All @@ -41,19 +26,18 @@ pip install -U numba
```

Dataiter optionally uses **Numba** to speed up certain operations. If
you have Numba installed and importing it succeeds, Dataiter will use it
automatically. It's currently not a hard dependency, so you need to
install it separately.
you have Numba installed, Dataiter will use it automatically. It's
currently not a hard dependency, so you need to install it separately.

## Documentation

https://dataiter.readthedocs.io/

If you're familiar with either dplyr (R) or Pandas (Python), the
comparison table in the documentation will give you a quick overview of
the differences and similarities.
the differences and similarities in common operations.

https://dataiter.readthedocs.io/en/latest/comparison.html
https://dataiter.readthedocs.io/en/stable/comparison.html

## Development

Expand Down
40 changes: 11 additions & 29 deletions doc/index.rst
Original file line number Diff line number Diff line change
@@ -1,35 +1,17 @@
Dataiter Documentation
======================

Dataiter is a Python package of classes for data manipulation. Dataiter
is intended for practical data science and data engineering work with a
focus on providing a simple and consistent API for common operations.
Currently included are the following classes.

:class:`.DataFrame`
A class for tabular data similar to R's ``data.frame`` or
``pandas.DataFrame``. It is under the hood a dictionary of NumPy
arrays and thus capable of fast vectorized operations. You can
consider this to be a light-weight alternative to Pandas with a
simple and consistent API. Performance-wise Dataiter relies on NumPy
and Numba and is likely to be at best comparable to Pandas.

:class:`.ListOfDicts`
A class useful for manipulating data from JSON APIs. It provides
functionality similar to libraries such as Underscore.js, with
manipulation functions that iterate over the data and return a
shallow modified copy of the original. ``attd.AttributeDict`` is used
to provide convenient access to dictionary keys.

:class:`.GeoJSON`
A simple wrapper class that allows reading a GeoJSON file into a
:class:`.DataFrame` and writing a data frame to a GeoJSON file. Any
operations on the data are thus done with methods provided by the
data frame class. Geometry is read as-is into the "geometry" column,
but no special geometric operations are currently supported.

.. warning:: Dataiter is still evolving and the API is subject to
breaking changes.
Dataiter's :class:`.DataFrame` is a class for tabular data similar to R's
``data.frame``, implementing all common operations to manipulate data. It is
under the hood a dictionary of NumPy arrays and thus capable of fast vectorized
operations. You can consider it to be a light-weight alternative to Pandas with
a simple and consistent API. Performance-wise Dataiter relies on NumPy and Numba
and is likely to be at best comparable to Pandas.

Additionally Dataiter includes :class:`.ListOfDicts`, a class for manipulating
hierarchical data, such as from JSON APIs or document databases, and
:class:`.GeoJSON`, a class for manipulating data from GeoJSON files in a data
frame.

.. toctree::
:maxdepth: 1
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ build-backend = "hatchling.build"
[project]
name = "dataiter"
dynamic = ["version"]
description = "Classes for data manipulation"
description = "Simple, light-weight data frames for Python"
readme = "README.md"
license = "MIT"
requires-python = ">=3.9.0"
Expand Down

0 comments on commit 5089a64

Please sign in to comment.