Skip to content
Change the repository type filter

All

    Repositories list

    • Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
      HTML
      Apache License 2.0
      7879.4k21051Updated Dec 14, 2024Dec 14, 2024
    • docs

      Public
      Documentation for all Unstructured products and libraries
      MDX
      175012Updated Dec 14, 2024Dec 14, 2024
    • A Typescript client for the Unstructured hosted API
      TypeScript
      MIT License
      124253Updated Dec 14, 2024Dec 14, 2024
    • Python
      Apache License 2.0
      126591267Updated Dec 14, 2024Dec 14, 2024
    • A Python client for the Unstructured hosted API
      Python
      MIT License
      178495Updated Dec 14, 2024Dec 14, 2024
    • HTML
      Apache License 2.0
      18235121Updated Dec 13, 2024Dec 13, 2024
    • Store Dockerfiles and Packer configs for images to use as a base to build upon
      Shell
      Apache License 2.0
      2312Updated Dec 6, 2024Dec 6, 2024
    • Python
      Apache License 2.0
      1301Updated Dec 2, 2024Dec 2, 2024
    • Reference architecture that provides a set of guidelines and best practices for implementing a central AI API gateway to empower various line-of-business units in an organization to leverage Azure AI services
      Bicep
      MIT License
      42000Updated Nov 22, 2024Nov 22, 2024
    • Python
      Apache License 2.0
      531651912Updated Oct 25, 2024Oct 25, 2024
    • Script to accompany the AWS blog post on unstructured data ETL with Unstructured Ingest library
      Python
      Apache License 2.0
      0000Updated Oct 16, 2024Oct 16, 2024
    • Pairing Technical Challenge
      TypeScript
      0000Updated Sep 4, 2024Sep 4, 2024
    • FedRAMP formatted model cards
      0100Updated Aug 29, 2024Aug 29, 2024
    • Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
      Python
      Apache License 2.0
      7.9k3100Updated Aug 23, 2024Aug 23, 2024
    • danswer

      Public
      Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
      Python
      Other
      1.4k901Updated Aug 23, 2024Aug 23, 2024
    • A Python wrapper for Google Tesseract
      Python
      Apache License 2.0
      727300Updated Aug 15, 2024Aug 15, 2024
    • JS Client Batch Processing
      JavaScript
      0000Updated Jul 31, 2024Jul 31, 2024
    • Main package repository for production Wolfi images
      C
      Other
      270000Updated Jul 10, 2024Jul 10, 2024
    • .github

      Public
      2021Updated May 30, 2024May 30, 2024
    • pipeline-sec-filings

      Public archive
      Preprocessing pipeline notebooks and API supporting text extraction from SEC documents
      Jupyter Notebook
      Apache License 2.0
      2914057Updated Jan 1, 2024Jan 1, 2024
    • Python
      Apache License 2.0
      8804Updated Oct 2, 2023Oct 2, 2023
    • Pipeline for extraction information from Army OERs
      Jupyter Notebook
      Apache License 2.0
      5816Updated Oct 1, 2023Oct 1, 2023
    • Pipeline for converting PDFs to raw text with PaddleOCR
      Jupyter Notebook
      Apache License 2.0
      62115Updated Aug 21, 2023Aug 21, 2023
    • langchain

      Public
      ⚡ Building applications with LLMs through composability ⚡
      Python
      MIT License
      16k900Updated Aug 18, 2023Aug 18, 2023
    • Python
      Apache License 2.0
      102821Updated Aug 4, 2023Aug 4, 2023
    • Terraform module that implements a web app on ECS and supports autoscaling, CI/CD, monitoring, ALB integration, and much more.
      HCL
      Apache License 2.0
      154200Updated Jul 6, 2023Jul 6, 2023
    • Terraform module which implements an ECS service which exposes a web service via ALB.
      HCL
      Apache License 2.0
      194000Updated Jul 6, 2023Jul 6, 2023
    • Pipeline for layout extraction
      Python
      Apache License 2.0
      1111Updated Jul 3, 2023Jul 3, 2023
    • Jupyter Notebook
      Apache License 2.0
      2405Updated Jul 1, 2023Jul 1, 2023
    • Preprocessing pipeline notebooks and API supporting text extraction from receipts images
      Jupyter Notebook
      Apache License 2.0
      1206Updated Jun 20, 2023Jun 20, 2023