Releases: ucbepic/docetl
Releases · ucbepic/docetl
0.2.0
What's Changed
- Sample by @redhog in #92
- Outliers by @redhog in #91
- Sample (+ Outlier Functionality) Operation by @shreyashankar in #100
- #91 document > item renaming by @garuna-m6 in #103
- chore: update dependency versions by @shreyashankar in #105
- docs: add 'output' argument to ResolveOp code eg by @goutham794 in #106
- fix: edit agent for synth resolve task by @shreyashankar in #109
- fix: update python api with cluster & sample ops by @shreyashankar in #113
- docs: add sample and cluster to docs by @shreyashankar in #114
- New api by @redhog in #115
- fix: make docs work by @shreyashankar in #119
- feat: adding human in the loop for split-map-gather decomp by @shreyashankar in #120
- fix: cache partial pipeline runs by @shreyashankar in #122
- mark as flaky test by @shreyashankar in #123
- only compare distinct pairs in resolve by @shreyashankar in #124
- LinkResolveOperation by @redhog in #117
- Fix Resolve and Map progress bars by @michielree in #126
- Better auto batching for resolve LLM calls by @sushruth2003 in #128
- Merge auto batching PR by @shreyashankar in #129
- v1 of the UI! by @shreyashankar in #118
- Upgrade litellm version to v1.51.0-stable by @Tendo33 in #131
- feat: adding batching for map and filter calls by @shreyashankar in #133
- docs: link filter to map by @shreyashankar in #135
- fix: optimizer bug where the reduce operation can't be optimized with azure by @shreyashankar in #136
- fix: only call os makedirs on non empty paths by @shreyashankar in #137
- UI: add basic chat-based assistant by @shreyashankar in #139
- fix: clear and run button should also bypass cache by @shreyashankar in #140
- feat: UDFs support added by @staru09 in #138
- Merge staging into MAIN by @shreyashankar in #141
- chore, load envs from current directory by @plpycoin in #142
- feat: add optimizer in the UI by @shreyashankar in #143
- chore: Use environment variable configuration files to set host, port by @plpycoin in #146
- fix: render cells in markdown and fix resizable panels by @shreyashankar in #148
- hotfix: an error occurred when running make run-ui by @plpycoin in #152
- fix: switch crypto uuid to regular uuid by @shreyashankar in #155
- feat: provide defaults in the UI chat by @shreyashankar in #156
- fix: allow reduce_key types to be lists by @shreyashankar in #162
- fix: allow user to pass in litellm completion kwargs by @shreyashankar in #163
- Sagemaker doesn't yet support tools by @njbrake in #165
- fix: save validation and gleaning by @shreyashankar in #167
- Remove unnecessary console log statements by @shreyashankar in #168
- chore: update docs to link to paper by @shreyashankar in #174
- feature: add automatic optimization check to the UI (opt in) by @shreyashankar in #175
- fix: ts errors by @shreyashankar in #176
- fix: ts errors by @shreyashankar in #177
- feat: tie histograms to output types by @shreyashankar in #178
- Change for issue: #180 by @yogitha2023 in #181
- make test less flaky by @shreyashankar in #182
- feat: add code operations to the ui (#169) by @shreyashankar in #183
- hotfix: Import declaration conflicts with local declaration of 'Operation'. by @plpycoin in #185
- chore: make output visualizations better by @shreyashankar in #186
- chore: edit copy for the UI by @shreyashankar in #189
- feat: add pdf upload for the UI by @shreyashankar in #190
- fix sampling in second op onwards by @shreyashankar in #192
- Receive a customized parameter to specify the number of concurrent running threads for the code_* operations. by @plpycoin in #195
- chore: fix optimizer and API for user study by @shreyashankar in #196
- add python multipart to requirements by @samelamin in #197
- fix: add docs to describe how to set up .env.local by @shreyashankar in #199
- Intermediates fix by @samelamin in #200
- fix: change default settings by @shreyashankar in #202
- feat: Add azure doc intelligence to PDF upload in the UI by @shreyashankar in #203
- feat: change histograms to be bar charts for categorical columns by @shreyashankar in #204
- ensure the value is converted to a string when it's not an object by @plpycoin in #206
- feat: add docker file by @shreyashankar in #205
- feat: add basic llm call observability to the UI by @shreyashankar in #209
- feat: have global system prompt and decription by @shreyashankar in #210
- fix: small errors in user study by @shreyashankar in #211
- fix: support markdown files by @shreyashankar in #213
- Add variable descriptions that limit the number of concurrent threads by @plpycoin in #215
- feat: add column view dialog by @shreyashankar in #214
- Update reduce folding instruction to be clearer by @shreyashankar in #217
- fix: make histogram calculation and rendering less blocking by @shreyashankar in #218
- edit system prompt in prompt improvement by @shreyashankar in #223
- fix: edit system prompt for prompt rewriter by @shreyashankar in #224
- Fix cache naming by @sushruth2003 in #220
- refactor recursive optimization for map operations by @shreyashankar in #225
- feat: adding namespaces by @shreyashankar in #226
New Contributors
- @garuna-m6 made their first contribution in #103
- @goutham794 made their first contribution in #106
- @michielree made their first contribution in #126
- @sushruth2003 made their first contribution in #128
- @Tendo33 made their first contribution in #131
- @plpycoin made their first contribution in #142
- @njbrake made their first contribution in #165
- @yogitha2023 made their first contribution in #181
- @samelamin made their first contribution in #197
Full Changelog: 0.1.7...0.2.0
0.1.7
What's Changed
- Add Operation Hash and Caching Functionality by @shreyashankar in #61
- docs: improving documentation for pipeline api by @shreyashankar in #62
- refactor: adding website code by @shreyashankar in #65
- (partial) fix: add exponential backoff for rate limit errors by @shreyashankar in #66
- fix: enable gleaning llm calls to work by @shreyashankar in #70
- Added llama-index based parsers by @redhog in #71
- Merging staging to main by @shreyashankar in #74
- feat: add pdfgpt to parse PDFs by @staru09 in #67
- Merging staging to main (from add gpt_pdf) by @shreyashankar in #76
- fix: disable additional properties for gemini by @shreyashankar in #73
- Throttle by @redhog in #64
- feat: support rate limits by @shreyashankar in #79
- feat: add verbose parameter for gleaning by @shreyashankar in #80
- Parsers can now return any number of fields, and can access the whole item by @redhog in #81
- Merge staging to main (after parsers refactor) by @shreyashankar in #82
- docs: add sample parameter by @shreyashankar in #87
- Clustering by @redhog in #84
- Merge staging to main (after adding cluster operator) by @shreyashankar in #88
- feat: output to csv if user specifies a csv file by @shreyashankar in #89
- Rename internal methods by @redhog in #90
- Nits for cleaning up API. by @shreyashankar in #93
- refactor: move validation and gleaning into call llm by @shreyashankar in #98
- Staging to main by @shreyashankar in #99
- feat: add reduce operation lineage by @shreyashankar in #101
- fix: change gleaning prompt to validation_prompt by @shreyashankar in #102
New Contributors
Full Changelog: 0.1.6...0.1.7
0.1.6
What's Changed
- docs: fix resolve docs by @shreyashankar in #27
- docs: link to ollama chat by @shreyashankar in #28
- feat: show better progress bars for operations by @shreyashankar in #30
- Add batching support to map operations with configurable parameters by @orban in #16
- feat: implement batch limit in map operations by @shreyashankar in #31
- Add Dataset Class and Parsing Tools by @shreyashankar in #32
- docs: improve clarity for custom parsing by @shreyashankar in #34
- Add Azure Document Intelligence Read Tool by @shreyashankar in #36
- fix: read .env from the user's cwd and change tool schema so ollama llama models work better by @shreyashankar in #38
- Bugfix for sqlite3 operation error in cache by @redhog in #40
- fix: make diskcache reads thread-safe by @shreyashankar in #42
- fix: template in tutorial.md by @shreyashankar in #43
- RateLimit error by @redhog in #39
- feat: add paddleocr by @shreyashankar in #44
- Entrypoints by @redhog in #45
- fix: dont cache results of bad llm calls by @shreyashankar in #52
- fix: default to gpt 4o tokenizer by @shreyashankar in #57
- feat: print out LLM message history and tools when there's an InvalidOutputError by @shreyashankar in #53
- feat: don't use tool calling for ollama/OSS models if the output schema is just one param by @shreyashankar in #59
- docs: add documentation for using split gather pipeline by @shreyashankar in #60
New Contributors
Full Changelog: 0.1.5...0.1.6
0.1.5
What's Changed
- fix: add error messages if model doesnt support tool calling by @shreyashankar in #26
Full Changelog: 0.1.4...0.1.5
v0.1.4
What's Changed
- fix: manually try to parse ollama outputs, even if it is not valid json by @shreyashankar in #25
Full Changelog: 0.1.3...0.1.4
v0.1.3
What's Changed
- quality of life: show error when trying to execute resolve without blocking by @shreyashankar in #12
- Optionally persist intermediates for reduce by @shreyashankar in #14
- Add a save config method to the Python API by @shreyashankar in #15
- Remove unnecessary name parameter from parallel map operation. by @shreyashankar in #17
- Add podcast to readme by @shreyashankar in #18
- fix: remove openai client call in
utils.py
by @shreyashankar in #22 - Add Configurable Timeouts for Operations and Ollama Integration Documentation by @shreyashankar in #24
Full Changelog: 0.1.2...0.1.3
v0.1.2
This release fixes a bug where the typer dependency was missing. It also adds a Python API.
0.1.1
Full Changelog: 0.1.0...0.1.1