Skip to content

Small Python script to categorize data using RegEx.

License

Notifications You must be signed in to change notification settings

databulle/Categorize

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Categorize

Small Python script to categorize data using RegEx.

Install

Simply clone this repo:

git clone [email protected]:databulle/categorize.git  

Use

You'll need:

  • A file containing the rules you want to apply (see syntax below),
  • A file containing the items you want to categorize (one item per line).

It's quite simple:

python categorize.py -r example-rules.txt -i example-data.csv  

Default outputs to a CSV file with semicolon separator: categorized.csv.

Command line arguments:

  • -h, --help shows help message and exits
  • -r RULES, --rules RULES indicates rules file (required)
  • -i INPUT, --input INPUT indicates input file (required)
  • -o OUTPUT, --output OUTPUT indicates output file (optional, default: categorized.csv)
  • -s SEP, --sep SEP indicates output file CSV separator (optional, default: ; )

Rules syntax

This script uses standard Python RegEx syntax, and associates each pattern with a category name.
Each line must start with the regular expression pattern, followed by a semicolumn, and end with the category name:

REGEX;CATEGORY_NAME  

You'll find some examples in the example-rules.txt file.

Contributing

If you wish to contribute to this repository or to report an issue, please use GitLab.

About

Small Python script to categorize data using RegEx.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages