-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port over PlanSearch from independent repo #1
Commits on Jul 10, 2024
-
(Monkey 🙈) Setup basic scaffolding for complex search (#10247)
* Add query util for search * Rename query.py to queriers.py and small refactors * Build out framework for basic prompting * Fix small prompting bug with starter code; allow eval with codrm pipeline * Add chain-of-thought * Refactor into SearchModel for more inheritance * Add backtranslating task * Add timeout exception catching * Add AnthropicQuerier and refactor querier.py * Add option to toggle few shot and sys prompt * Add experiment directory and refactor * Refactor eval code into `scale_lcb_eval` * Add logs to gitignore * Add functionality for sample and private tests in `base_classes` * Initial scaffolding for parsel * Refactor `queriers.py` to take all functionality in LLMQuerier * Refactor adding args * Add caching to `queriers.py` * Add parsel parameters * Implement Parsel * Delete unnecessary lines * Add more unused code * Change SYSTEM_PROMPT in generation in backtranslate task * Add cache-file argument * Add utility `create_dataset_with_sols` to create dataset for backtranslation * Refactor utils for parsing into `parsing_utils.py` Delete and refactor more unused utils More deletions Refactor functions Further function refactoring Refactor (mostly fn prompts) Refactor `fn.py` Refactor `Test` into `base_classes.py` Refactor fn queries into `parsel_queries.py` Small bugfixes * Refactor `exec_utils.py` to outside `parsel` * Add option to requery in `queriers.py` even with caching * Add stdio test to `exec_utils.py` * Add simple filtering model * Replace returned Nones with an empty string * Add logging to simple-filter * Fix small bug in Test class * Catch strange httpcore error * Add simple idea model * Allow for len of public tests to be 0 in exec_utils * Refactor `self.queriers` and add `simple_idea_model` to eval * fix bug from copy-pasting wrong * Add bulk querying for fn impls * Account for indent in `filter_for_fn` * Change hyperparameters * Add integration tests in `scripts` * Catch APIConnectionError in queriers.py * Introduce `idea_filter_model` * Catch internal server error * Minor refactor * Refactor and fix bugs in simple filtering Made new class to take in `SearchModel`s and run filtering on top * Add `JSONDecodeError` and `UnicodeDecodeError` to `queriers.py` * Add separate temperatures for idea and code * Add utils to merge cache files * Revert gitignore * Add .gitignore files
Configuration menu - View commit details
-
Copy full SHA for 49e0a2d - Browse repository at this point
Copy the full SHA 49e0a2dView commit details
Commits on Jul 23, 2024
-
evanzwang/monkey search 2 🙈 (#10537)
* Add observation model * Increase backoff and add jitter * Fix typo * Add zero-shot to `simple_idea` and fix small bug in `filter_models` * Adjust parameters for timeout * Add CF submit util * Add plotting utils * Add partial prompts * Fix simple filter with simple idea * Refactor model selection dictionaries in `queriers.py` * Add querier_utils.py that was forgotten in previous commit * Add price tracking and exec warmup * Add num_words to backtranslate as arg * Refactor small names of querier * Change warmup to 200 * Add Path makedir to cache.json in case dir doesn't exist * Add vLLM support * Add base deepseek lite model * Fix small bugs in querier * Fix small bug from calling torch.cuda.is_bf16_supported() * Add new DeepSeek models and add base model functionality * Fix small bug where final price was not output * Generalize codeforces parsing * Change parameters for vLLM * Add pseudocode model * Add GPT-4o-mini and fix bug in vLLM inference * Add small section to backtranslate prompts * Fix small bug where None was not supported for tests * Add Python script to create taco dataset with nl solutions * Fix bugs in `create_taco_backtranslate.py` * Make requery automatically True * Fix bugs in create_taco_backtranslate * Add notebooks * Add functionality for custom local models * Rename observation to simple observation * Add no-intuition prompt to idea * Add internal querier * Fix bug in completion (no chat) code for basic * Fix base prompting
Configuration menu - View commit details
-
Copy full SHA for fd10a22 - Browse repository at this point
Copy the full SHA fd10a22View commit details
Commits on Jul 24, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c76e1d7 - Browse repository at this point
Copy the full SHA c76e1d7View commit details
Commits on Jul 26, 2024
-
Evanzwang/monkey search 3 (#10567)
* Add llama 3.1 8b and 70b as supported by scale llm engine * Add scale LLM engine requery time * Create synthetic TACO solutions from gpt-4o * Catch more llm engine errors * Fix exception catching * Add combo observation first iteration * Add more error catching * Log in combo observation and fix small bug * Fix small logging in `combo_observation_model.py` * Add second iteration of combo observation * Add automatic batching if too large query * Add pbar to querying * Fix small bug where iteration 2 is not called * Fix bug where problems is not updated to be expanded * Tweak exponential backoff * Partially change `combo_ratio_finder.ipynb` to start getting performances
Configuration menu - View commit details
-
Copy full SHA for b2c3191 - Browse repository at this point
Copy the full SHA b2c3191View commit details
Commits on Jul 29, 2024
-
Evanzwang/monkey search 4 🙉 (#10585)
* Add num_workers to code exec * Add num_shots to simple prompting * Add num words and sys prompt option to idea search * Idea prompt fixes * Add space to prompt * Change querying params * Add llama-3.1 * Add file to create datasets from generations * Add `format` arg for chat vs completion manual change
Configuration menu - View commit details
-
Copy full SHA for 37fea76 - Browse repository at this point
Copy the full SHA 37fea76View commit details
Commits on Jul 31, 2024
-
Partial GPU utilization for checkpoint code (#10634)
python research/hugh/open_weights_code/eval_all_checkpoints.py --s3_checkpoint_location=s3://scale-ml/hugh/rlxf/dpo/meta_llama3.1-8b_731-5e6_1/checkpoints/ --s3_output_location=openweightscode/eval_results/dpo_731-5e6_1 --num-gpus=4 --max_checkpoints=20 Co-authored-by: Hugh Zhang <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8fddf0f - Browse repository at this point
Copy the full SHA 8fddf0fView commit details
Commits on Aug 6, 2024
-
Unify Dataset Formats for Monkey (#10682)
* Add `fail_codes` to Problem * Unify datasets and merge generate/eval pipelines * Add exec args * Auto-detect optional features * Unify create_test_bank.py * Rename parse_dataset_utils to dataset_utils * cript to clean tests and add public tests * Increase slightly the exponential backoff * Add make dataset utilities * Add pyproject.toml for better imports * Add __init__.py * Commit dataset and misc scripts/notebooks * Add `load_json.ipynb` for completeness * Add code for taco_cleaner * Add format_taco_data.py * Remove unneeded content * Remove older create generation dataset scripts * Add code to create idea solve graphs * Fix small bugs and add `fn_args_join` * Add timeout to queriers and add better heuristics in `add_public_tests` * emove line from debugging * Improve add_public_test heuristic filters * Refactor into CompletionCache class * Fix minor keywarg typo * Small changes to `add_public_tests.py` * Change model to be customizable * [UNFINISHED] partially implement `queriers.py` into threadpool * Update `search` imports * Edit `add_public_tests.py` for better prompts * Change plot over time * Add `filter_lcb_data.py` * Add `filter_zero_public_tests.py` * Add thread pool to querier * Add new OOP-based queriers * Add notebook to parse D to F data * Add `num-gpu` to eval.py args * Small * Small bugfixes in `query_clients.py` and add `model_configs` * Add `query_clients.py` modularity to `queriers.py` and upstream * Remove unneeded large dicts * Remove needed parts of `querier_utils.py` * Add custom vLLM config demo * Fix minor merge bug * Remove default arguments at lower levels
Configuration menu - View commit details
-
Copy full SHA for f13bd7e - Browse repository at this point
Copy the full SHA f13bd7eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 61627a8 - Browse repository at this point
Copy the full SHA 61627a8View commit details -
dpo_basic_config.json contains basic DPO successful checkpoint Infra to launch RLXF sweeps (in alpha). Also infra to launch basic eval scripts in eval_all_checkpoints.py Co-authored-by: Hugh Zhang <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f6b03a6 - Browse repository at this point
Copy the full SHA f6b03a6View commit details
Commits on Aug 7, 2024
-
Evanzwang/monkey search 6 (#10691)
* Add format and merge taco notebook * Update code_exec_reqs to requery
Configuration menu - View commit details
-
Copy full SHA for 7d66158 - Browse repository at this point
Copy the full SHA 7d66158View commit details
Commits on Aug 9, 2024
-
Add SGLang and Fix Testbank Bug for Monkey -> 🦍 (#10726)
* Add SGLang querying functionality * Add example model_config * Improve SGLang and write `simultaneous_eval.sh` script * Fix combo obs to be analyzable * Small assert to make sure testbank URL is right * Switch `make_dpo_dataset.py` model config to fix * Add greater sglang * Add parsing of starter_code to fix testbank bug * Adjust minor details to `dataset_utils.py` and `query_clients.py`
Configuration menu - View commit details
-
Copy full SHA for 0922069 - Browse repository at this point
Copy the full SHA 0922069View commit details
Commits on Aug 19, 2024
-
Monkey Search Minor Features and Fixes (+ some refactors) (#10806)
* Add SGLang querying functionality * Add example model_config * Improve SGLang and write `simultaneous_eval.sh` script * Fix combo obs to be analyzable * Small assert to make sure testbank URL is right * Switch `make_dpo_dataset.py` model config to fix * Add greater sglang * Add parsing of starter_code to fix testbank bug * Adjust minor details to `dataset_utils.py` and `query_clients.py` * Add reward model functionality to `make_dpo_dataset.py` * Update configs * Add utility to choose gpus in simultaneous eval * Add DataParallel to `reward_model_utils.py` * Add PARAMS to each `query_client` and minor refactor * Add `model_config_utils.py` to add overwrite args * Add better model args for all existing search methods * Fix minor bug in `make_dpo_dataset` * Add query parameters to query logs * Relax float constraint on query client price * Fix Anthropic `query_client.py` and adjust rate limits * Reduce amounts of print in `query_clients` * Add `story` prompt method inside `one_prompt_models.py` Rename `basic_prompting.py` to `one_prompt_models.py` * Delete unneeded files * Adjust imports in `make_dpo_dataset.py` * Add `stringify` and `unstringify` to prepare for other tests * Fix small `simple_filter_models.py` bug * Add new one_prompt methods, add exec_string to Problem * Add notebooks to parse MBPP and HumanEval * Add modifications in `exec_utils.py` for exec_string * Small change can detete lol * Add `batch_apply_on_nested_list` to `python_utils.py` * Refactor `combo_observation_model.py` * Add notebooks to parse *_plus datasets * Add `completions-from-model` arg To not necessarily do completion-limit x as much problems * Small changes to `queriers.py` to accomodate for `tuple` convos
Configuration menu - View commit details
-
Copy full SHA for e1dc7b2 - Browse repository at this point
Copy the full SHA e1dc7b2View commit details
Commits on Aug 27, 2024
-
Evanzwang/monkey search 9 (#10902)
* Change `F_mbpp_plus` and `create_test_bank.py` to support exec_string * Add `parse_orig_lcb.ipynb` for creating new LCB * Update `parse_orig_lcb.ipynb` to have _C dataset * Add `map_nary_fn_on_nested_list` to `python_utils`. Add `check_similar.py` * Save test results with underscore between name and test * Add `deepseek-coder` functionality * Add LLMEngine 405b * Change `check_similar.py` to "general" instead of "specific" idea * Change querying to: idea for code -> Yes/No * Add script to run most exps * Fix `simple_filter_models.py` for non-list Test public tests * Fix Anthropic querying * Fix bug with testhash and human_eval (add fn_name to exec) * Add `search.` imports * Add utility to re-eval a previous results.json.gz file * Fix small, auxiliary bug on the test hash fn_name problem * Edit `metrics.py` to support public test filtering * Fix another auxiliary bug within scripts/re_eval_codes.py * Add graphing notebook for final results for completeness * Better infra for graphing * Fix small human_eval_plus public test issue * Change `generate_solutions` to return `list[list[str]]` * Update `check_similar.py` to take in `args.cache-file` * Update `parse_mbpp_plus` to include 3 more better tests * Catch Anthropic BadRequest * Commit notebook changes (?? unsure what changed) * Add `base_classes` change for `generate_solutions` super method change * Add SGLang completions functionality * Remove print("HI") in `query_clients.py` * Add "This NL solution is wrong" prompt in `combo_observation` * Add new changes to graph pass@k notebooks * Update `scripts/run_exps_...` * Add `baby-deepseek` models with SGLang * Fix small assertion bug in `query_clients.py` * (combo obs) Add another step to merge fixes with original nl solution * Change small type hint (you can ignore this commit) * Update model_configs and scripts * Add Fireworks and Together * Update `check_similar.py` to requery if badrequest * Comment out line that requires k to be less than min n_comp * Undo `combo_observation_model.py` enhanced 'fix' prompting * Update `run_exps` scripts * Update `graph_notebooks` * Add `caches` folder to gitignore
Configuration menu - View commit details
-
Copy full SHA for cce546b - Browse repository at this point
Copy the full SHA cce546bView commit details
Commits on Sep 21, 2024
-
Evanzwang/monkey search 10 (Paper!) (#11020)
* Augment testing using more processes * Adjust Firework parameters * Change `exec-public` arg to `exec-type` * Change `re_eval_codes.py` to reflect test execution changes * Add checkpointing to `re_eval_codes` for cache * Add right vs wrong in check similar * Catch rare case where OpenAI logprobs is null * Add graph plotters for paper push * Make `re_eval_codes.py` auto-checkpoint * Mildly change prompts * Add fireworks llama 70b * Subtly change the way n_completions work. (Should have been with the other prompt commit) * Update scripts * Add o1 * Update graphing notebooks * Update gitignore
Configuration menu - View commit details
-
Copy full SHA for 78fd7dc - Browse repository at this point
Copy the full SHA 78fd7dcView commit details
Commits on Sep 25, 2024
-
Add ablations to PlanSearch (#11188)
* Refactor combo obs with observation node, add num layers * Add arg to fix idea or no * Refactor prompt generation fns to `prompts` folder * Add --without-pseudocode flag to combo obs * Add without-idea arg * Add unincluded prompts * Change num-completion args into 2 * Add graphing utilities for ablations
Configuration menu - View commit details
-
Copy full SHA for 19caa99 - Browse repository at this point
Copy the full SHA 19caa99View commit details
Commits on Sep 26, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 339d18f - Browse repository at this point
Copy the full SHA 339d18fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 33b405a - Browse repository at this point
Copy the full SHA 33b405aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0cc7495 - Browse repository at this point
Copy the full SHA 0cc7495View commit details
Commits on Oct 4, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 560b98f - Browse repository at this point
Copy the full SHA 560b98fView commit details -
Configuration menu - View commit details
-
Copy full SHA for bc652f8 - Browse repository at this point
Copy the full SHA bc652f8View commit details -
Configuration menu - View commit details
-
Copy full SHA for c31b6e0 - Browse repository at this point
Copy the full SHA c31b6e0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 09ab36f - Browse repository at this point
Copy the full SHA 09ab36fView commit details
Commits on Oct 7, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 94a58fd - Browse repository at this point
Copy the full SHA 94a58fdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 84c1ace - Browse repository at this point
Copy the full SHA 84c1aceView commit details -
Configuration menu - View commit details
-
Copy full SHA for becdd1b - Browse repository at this point
Copy the full SHA becdd1bView commit details
Commits on Oct 8, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 55d9435 - Browse repository at this point
Copy the full SHA 55d9435View commit details -
Configuration menu - View commit details
-
Copy full SHA for 87a0eae - Browse repository at this point
Copy the full SHA 87a0eaeView commit details -
Configuration menu - View commit details
-
Copy full SHA for f3b745c - Browse repository at this point
Copy the full SHA f3b745cView commit details
Commits on Oct 18, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 782c2ba - Browse repository at this point
Copy the full SHA 782c2baView commit details