skrub

skrub (formerly dirty_cat) is a Python library that facilitates prepping your tables for machine learning.

If you like the package, spread the word and ⭐ this repository! You can also join the discord server.

What can skrub do?

The goal of skrub is to bridge the gap between tabular data sources and machine-learning models.

skrub provides high-level tools for joining dataframes (Joiner, AggJoiner, ...), encoding columns (MinHashEncoder, ToCategorical, ...), building a pipeline (TableVectorizer, tabular_learner, ...), and exploring interactively your data (TableReport).

>>> from skrub.datasets import fetch_employee_salaries
>>> dataset = fetch_employee_salaries()
>>> df = dataset.X
>>> y = dataset.y
>>> df.iloc[0]
gender                                                                     F
department                                                               POL
department_name                                         Department of Police
division                   MSB Information Mgmt and Tech Division Records...
assignment_category                                         Fulltime-Regular
employee_position_title                          Office Services Coordinator
date_first_hired                                                  09/22/1986
year_first_hired                                                        1986

>>> from sklearn.model_selection import cross_val_score
>>> from skrub import tabular_learner
>>> cross_val_score(tabular_learner('regressor'), df, y)
array([0.89370447, 0.89279068, 0.92282557, 0.92319094, 0.92162666])

See our examples.

Installation

skrub can easily be installed via pip or conda. For more installation information, see the installation instructions.

Contributing

The best way to support the development of skrub is to spread the word!

Also, if you already are a skrub user, we would love to hear about your use cases and challenges in the Discussions section.

To report a bug or suggest enhancements, please open an issue.

If you want to contribute directly to the library, then check the how to contribute page on the website for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 1,725 Commits
.binder		.binder
.circleci		.circleci
.github		.github
benchmarks		benchmarks
build_tools/circle		build_tools/circle
doc		doc
examples		examples
skrub		skrub
.coveragerc		.coveragerc
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGES.rst		CHANGES.rst
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE.txt		LICENSE.txt
README.rst		README.rst
RELEASE_PROCESS.rst		RELEASE_PROCESS.rst
codecov.yml		codecov.yml
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

skrub

What can skrub do?

Installation

Contributing

About

Releases

Contributors 47

Languages

License

skrub-data/skrub

Folders and files

Latest commit

History

Repository files navigation

skrub

What can skrub do?

Installation

Contributing

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Contributors 47

Languages