Skip to content

Latest commit

 

History

History
32 lines (18 loc) · 758 Bytes

README.md

File metadata and controls

32 lines (18 loc) · 758 Bytes

⊧ dupi

Dupi is an engine for identifying and exploring duplicative text in sets of documents.

Status

Dupi is in alpha/early beta development stage. Please feel free to give it a try (and file issues). We have run it on several document sets successfully, but it definitely needs more testing.

Input

Throw hundreds of thousands of textual documents at it. Or extract text from other documents and send that to dupi.

Output

Find and query for repeated chunks of text.

Tutorial

Tutorial

Design

Design Document

Library Reference

Go Reference