Llama OCR

An npm library to run OCR for free with Llama 3.2 Vision.

Installation

npm i llama-ocr

Usage

import { ocr } from "llama-ocr";

const markdown = await ocr({
  filePath: "./trader-joes-receipt.jpg", // path to your image (soon PDF!)
  apiKey: process.env.TOGETHER_API_KEY, // Together AI API key
});

Hosted Demo

We have a hosted demo at LlamaOCR.com where you can try it out!

How it works

This library uses the free Llama 3.2 endpoint from Together AI to parse images and return markdown. Paid endpoints for Llama 3.2 11B and Llama 3.2 90B are also available for faster performance and higher rate limits.

You can control this with the model option which is set to Llama-3.2-90B-Vision by default but can also accept free or Llama-3.2-11B-Vision.

Roadmap

Add support for local images OCR
Add support for remote images OCR
Add support for single page PDFs
Add support for multi-page PDFs OCR (take screenshots of PDF & feed to vision model)
Add support for JSON output in addition to markdown

Credit

This project was inspired by Zerox. Go check them out!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
test		test
.gitignore		.gitignore
.npmignore		.npmignore
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama OCR

Installation

Usage

Hosted Demo

How it works

Roadmap

Credit

About

Releases

Packages

Languages

kyrolabs/llama-ocr

Folders and files

Latest commit

History

Repository files navigation

Llama OCR

Installation

Usage

Hosted Demo

How it works

Roadmap

Credit

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages