Releases: bespokelabsai/curator
0.1.13
What's Changed
- Fix issues around litellm, to support Gemini Flash Thinking model.
- Add support for o1.
Details
- Ryan marten patch 1 by @RyanMarten in #273
- Clean ups in llm.py by @madiator in #274
- Put the examples in respective folders and add requirements.txt everywhere by @madiator in #275
- Catch catch-all Exception since litellm doesn't throw specific error. by @madiator in #281
- feat: add o1 model structured output support by @devin-ai-integration in #284
- Bump to 0.1.13 by @madiator in #285
- Merge dev into main for 0.1.13 release. by @madiator in #286
Full Changelog: v0.1.12...0.1.13
v0.1.12
What's Changed
- [curator-viewer] enabled toast instead of alert for copy paste, and fixed streaming toast by @CharlieJCJ in #165
- Use huggingface modified pickler to fix path-dependent caching by @vutrung96 in #230
- Change rpm and tpm to have lower default and allow for manual setting by @RyanMarten in #234
- Various fixes to increase the reliability of batch processing by @vutrung96 in #231
- Graceful error handling for missing requests by @vutrung96 in #244
- OpenAIOnline - if api_key missing, directly error out by @CharlieJCJ in #237
- Increase default values for tpm/rpm, otherwise there is no progress. by @madiator in #245
- refactor: rename Prompter class to LLM by @devin-ai-integration in #242
- Rename prompter. Simplify prompt_formatter and add test. by @madiator in #246
- Raise error on failed responses by @RyanMarten in #251
- Add a SimpleLLM interface, and update documentation. by @madiator in #255
- Cool down when hitting rate limit with online processors by @RyanMarten in #256
- Gemini lower safety constraints by @CharlieJCJ in #259
- Raise on None response message by @RyanMarten in #262
- Add metadata dict + cache verification by @GeorgiosSmyrnis in #257
- Default for all online requests to 10 minutes timeout by @RyanMarten in #265
- Retry only on "max_length" and "content_filter" finish reason by @RyanMarten in #267
- Retry on response format failure by @RyanMarten in #266
- Add prism.js types to dev dependencies by @RyanMarten in #270
New Contributors
- @devin-ai-integration made their first contribution in #242
- @GeorgiosSmyrnis made their first contribution in #257
Full Changelog: v0.1.11...v0.1.12
v0.1.11
What's Changed
- Allow special tokens when encoding text for token accounting by @RyanMarten in #181
- [Package Dependency] Downgrade tiotoken and aiofiles, bump poetry package in pyproject toml by @CharlieJCJ in #183
- Re-merge main into dev by @vutrung96 in #185
- Cleanups and fix minor issues. by @madiator in #184
- Scale batch processor to managing thousands of batches by @RyanMarten in #186
- Easy way to cancel batches by @RyanMarten in #187
- litellm refactoring base online request processor by @CharlieJCJ in #188
- More retries for batch by @RyanMarten in #194
- Delete input and output files for successful batches by @RyanMarten in #195
- Add LiteLLM+instructor (for structured output) backend for curator by @CharlieJCJ in #141
- small bugfix by @RyanMarten in #199
- Increase max retry to 50 by @vutrung96 in #200
- LiteLLM missing cost handling for models by @CharlieJCJ in #210
- OnlineRequestProcessor - Fix retry only once issue by @CharlieJCJ in #202
- Implement persona-hub using Curator by @madiator in #211
- Allow user to switch keys during batch and resume by @RyanMarten in #198
- Small fix for datetime in openai request processor by @CharlieJCJ in #219
- Bump 0.1.11 pypi version by @CharlieJCJ in #221
0.1.11
by @CharlieJCJ in #220
Full Changelog: v0.1.10...v0.1.11
v0.1.10
What's Changed
- [curator-viewer] add time logging and curator viewer show distribution by @CharlieJCJ in #149
- enhanced installation UI for curator package by @lavishsaluja in #134
- add cost and token logging in openai online and batching with litellm
completion_cost
by @CharlieJCJ in #159 - Add lint checks to the repository by @vutrung96 in #168
- Use dill pickle to capture the execution context by @vutrung96 in #167
- 0.1.10 by @CharlieJCJ in #174
Full Changelog: v0.1.9.post1...v0.1.10
v0.1.9.post1
Highlights
README documentation updates, curator-viewer
minor changes, curator add configurable generation parameters related to #62.
What's Changed
- Add frequency and presence penalty generation options by @RyanMarten in #136
- add system prompt in detailed view by @CharlieJCJ in #137
- poem.to_pandas() table formatting by @CharlieJCJ in #138
- add favicon and adjusted page name by @CharlieJCJ in #139
- Update readme and add another logo with trimmed edges. by @madiator in #131
- update pyproject version by @CharlieJCJ in #144
- 0.1.9post1 by @CharlieJCJ in #140
Full Changelog: v0.1.9...v0.1.9.post1
v0.1.9
Highlights
v0.1.9 release includes improvements to batch processing functionality, UI enhancements, documentation updates, and bug fixes, alongside the addition of enabling Python Interpreter usage of bespokelabs-curator
.
What's Changed
- Fix batch mode on datasets > 50k by @RyanMarten in #109
- logo size fix in README for mobile by @lavishsaluja in #104
- Add a starter example for running UI for the first time by @CharlieJCJ in #44
- Prevent batches larger than 200MB from being sent by @RyanMarten in #113
- Terminal interactive usage of bespokelabs curator package by @CharlieJCJ in #114
- Update README and the poem example. by @madiator in #117
- fix batch pbar overcounting by @RyanMarten in #119
- 0.1.9 by @CharlieJCJ in #110
New Contributors
- @lavishsaluja made their first contribution in #104
Full Changelog: v0.1.8...v0.1.9
v0.1.8
🎯 Highlights of v0.1.8
Version 0.1.8 marks the initial release of Curator, introducing core functionalities for managing and processing LLM completions for synthetic data generation. This release establishes a foundation with two main components: a completions module for efficient batch processing with OpenAI models, and a dataset viewer for visualizing and managing completion results. Key features include batch processing support, configurable model parameters, streaming capabilities, and metadata management through SQLite integration. The release also prioritizes developer experience with Colab compatibility and robust documentation.
⚡ Completions Module
- Reorganized prompting logic (#2, #3) and improved OpenAI integration (#4, #28)
- Added configurable temperature and top-p parameters (#77)
- Implemented batch size configuration (#70)
- Added fallback token counting with tiktoken (#59)
- Improved dataset management with List objects (#9)
- Added configurable working directory support (#53)
- Fixed Colab compatibility issues (#69, #72)
- Enhanced request/response handling (#65)
🎨 Curator Viewer
- Reorganized dataset viewer architecture (#8)
- Added streaming dataset UI functionality (#14)
- Implemented file streaming for batch mode (#79)
- Added metadata SQLite database integration (#10)
- Fixed compilation errors (#23)
📚 Documentation & Packaging
- Added Apache 2.0 license (#12)
- Improved documentation and README (#1, #26)
- Properly packaged as bespokelabs-curator (#11)
- Added repository logo (#63)
- Updated API key documentation (#32)
What's Changed
- Add and update documentation. by @madiator in #1
- Refactor prompting logic to a class and add a test. by @madiator in #2
- Rename prompt_caller to prompter. by @madiator in #3
- Online processing with OpenAI by @RyanMarten in #4
- rename cache with name first, fingerprint second by @RyanMarten in #5
- Minor refactoring and cleanups. by @madiator in #6
- init commit on bespoke-dataset-viewer by @CharlieJCJ in #8
- Use List of objects instead of HF dataset and remove flatten by @vutrung96 in #9
- add metadata sqlite db by @CharlieJCJ in #10
- Various small fixes by @vutrung96 in #13
- Add apache 2.0 license by @vutrung96 in #12
- Properly package the repo into bespokelabs-curator by @vutrung96 in #11
- Fix file dependency poetry lock by @CharlieJCJ in #15
- Metadata DB existing run_hash by @CharlieJCJ in #17
- Update some references to bella and update readme. by @madiator in #16
- Merge in Ryan's abstraction by @vutrung96 in #18
- Remove OH v3 by @vutrung96 in #19
- Add request payload to GenericResponse by @vutrung96 in #20
- Add an option to keep the dataset in memory for to_huggingface() by @vutrung96 in #21
- Streaming dataset UI by @CharlieJCJ in #14
- dataset viewer compile error fix by @CharlieJCJ in #23
- Fix broken pytest by @vutrung96 in #24
- Explicitly print out data points in camel.py by @vutrung96 in #25
- [add] build for dataset viewer before releasing by @CharlieJCJ in #22
- update README documentation by @CharlieJCJ in #26
- Update README.md by @CharlieJCJ in #27
- Add OpenAIBatch backend and refactor RequestProcessor to be compatible by @RyanMarten in #28
- Add configurable logging (bespokelabs.curator) by @RyanMarten in #29
- improved readme for supplying api_key by @CharlieJCJ in #32
- Fix issues with no dataset passed to batch and logging by @RyanMarten in #36
- Remove unused dependencies: litellm and ipython. by @madiator in #37
- Rename from poetry.py to poem.py to reduce confusion with the poetry tool by @madiator in #38
- Catch TypeError when using tiktoken and fall back to hueristic token counting by @RyanMarten in #59
- Better example for generating poems. by @madiator in #43
- Alow specifiying working directory in case users want to use a different working directory for their cache by @vutrung96 in #53
- Add batch_mode as a field in metadata db by @CharlieJCJ in #56
- Merge main to dev by @CharlieJCJ in #66
- Set max line length to 80 for black by @vutrung96 in #64
- Remove the use of asyncio.run to make asyncio work in colab by @vutrung96 in #69
- Package versioning downgrade for colab by @CharlieJCJ in #67
- Add Prompter arg for batch size by @RyanMarten in #70
- GenericRequest and GenericResponse refactor by @RyanMarten in #65
- Prevent .arrow file getting in an invalid state and generate different .arrow based on parse_func by @RyanMarten in #61
- Add logo by @madiator in #63
- Fix types in jsonl files by @RyanMarten in #75
- Add temperature and top-p by @RyanMarten in #77
- Fix asyncio with nest_asyncio by @vutrung96 in #72
- [curator-viewer] file streaming when
batch=True
, new response format adaptation by @CharlieJCJ in #79 - Bump version to 0.1.7 by @vutrung96 in #83
- 0.1.7 by @vutrung96 in #81
- Pre-emptively remove invalid dataset file when prompt_func detected as invalid by @RyanMarten in #84
- Fix JSON parsing from model by @vutrung96 in #85
- Set RLIMIT_NOFILE to avoid "too many files open" errors by @RyanMarten in #73
- Followup fixing pydantic validation by @RyanMarten in #89
- Fixed and refactored sort and filter by @CharlieJCJ in #91
- better no data view and error handling in dataset viewer by @CharlieJCJ in #95
- Cleanup build script and .gitignore for build artifacts by @CharlieJCJ in #71
- Response format for batch bug fix by @RyanMarten in #97
- UI minor type fix when npm run build by @CharlieJCJ in #98
- Fix a confusing error due to asyncio.run in except block. by @vutrung96 in #99
- Explicitly close AsyncClient to avoid getting asyncio event loop is closed issues by @vutrung96 in #101
- 0.1.8 by @CharlieJCJ in #100
New Contributors
- @madiator made their first contribution in #1
- @vutrung96 made their first contribution in #9
Full Changelog: https://github.com/bespokelabsai/curator/co...