Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port over PlanSearch from independent repo #1

Merged
merged 29 commits into from
Oct 18, 2024
Merged

Port over PlanSearch from independent repo #1

merged 29 commits into from
Oct 18, 2024

Commits on Jul 10, 2024

  1. (Monkey 🙈) Setup basic scaffolding for complex search (#10247)

    * Add query util for search
    
    * Rename query.py to queriers.py and small refactors
    
    * Build out framework for basic prompting
    
    * Fix small prompting bug with starter code; allow eval with codrm
    pipeline
    
    * Add chain-of-thought
    
    * Refactor into SearchModel for more inheritance
    
    * Add backtranslating task
    
    * Add timeout exception catching
    
    * Add AnthropicQuerier and refactor querier.py
    
    * Add option to toggle few shot and sys prompt
    
    * Add experiment directory and refactor
    
    * Refactor eval code into `scale_lcb_eval`
    
    * Add logs to gitignore
    
    * Add functionality for sample and private tests in `base_classes`
    
    * Initial scaffolding for parsel
    
    * Refactor `queriers.py` to take all functionality in LLMQuerier
    
    * Refactor adding args
    
    * Add caching to `queriers.py`
    
    * Add parsel parameters
    
    * Implement Parsel
    
    * Delete unnecessary lines
    
    * Add more unused code
    
    * Change SYSTEM_PROMPT in generation in backtranslate task
    
    * Add cache-file argument
    
    * Add utility `create_dataset_with_sols` to create dataset for
    backtranslation
    
    * Refactor utils for parsing into `parsing_utils.py`
    
    Delete and refactor more unused utils
    
    More deletions
    
    Refactor functions
    
    Further function refactoring
    
    Refactor (mostly fn prompts)
    
    Refactor `fn.py`
    
    Refactor `Test` into `base_classes.py`
    
    Refactor fn queries into `parsel_queries.py`
    
    Small bugfixes
    
    * Refactor `exec_utils.py` to outside `parsel`
    
    * Add option to requery in `queriers.py` even with caching
    
    * Add stdio test to `exec_utils.py`
    
    * Add simple filtering model
    
    * Replace returned Nones with an empty string
    
    * Add logging to simple-filter
    
    * Fix small bug in Test class
    
    * Catch strange httpcore error
    
    * Add simple idea model
    
    * Allow for len of public tests to be 0 in exec_utils
    
    * Refactor `self.queriers` and add `simple_idea_model` to eval
    
    * fix bug from copy-pasting wrong
    
    * Add bulk querying for fn impls
    
    * Account for indent in `filter_for_fn`
    
    * Change hyperparameters
    
    * Add integration tests in `scripts`
    
    * Catch APIConnectionError in queriers.py
    
    * Introduce `idea_filter_model`
    
    * Catch internal server error
    
    * Minor refactor
    
    * Refactor and fix bugs in simple filtering
    Made new class to take in `SearchModel`s and run filtering on top
    
    * Add `JSONDecodeError` and `UnicodeDecodeError` to `queriers.py`
    
    * Add separate temperatures for idea and code
    
    * Add utils to merge cache files
    
    * Revert gitignore
    
    * Add .gitignore files
    evanzwang authored Jul 10, 2024
    Configuration menu
    Copy the full SHA
    49e0a2d View commit details
    Browse the repository at this point in the history

Commits on Jul 23, 2024

  1. evanzwang/monkey search 2 🙈 (#10537)

    * Add observation model
    
    * Increase backoff and add jitter
    
    * Fix typo
    
    * Add zero-shot to `simple_idea` and fix small bug in `filter_models`
    
    * Adjust parameters for timeout
    
    * Add CF submit util
    
    * Add plotting utils
    
    * Add partial prompts
    
    * Fix simple filter with simple idea
    
    * Refactor model selection dictionaries in `queriers.py`
    
    * Add querier_utils.py that was forgotten in previous commit
    
    * Add price tracking and exec warmup
    
    * Add num_words to backtranslate as arg
    
    * Refactor small names of querier
    
    * Change warmup to 200
    
    * Add Path makedir to cache.json in case dir doesn't exist
    
    * Add vLLM support
    
    * Add base deepseek lite model
    
    * Fix small bugs in querier
    
    * Fix small bug from calling torch.cuda.is_bf16_supported()
    
    * Add new DeepSeek models and add base model functionality
    
    * Fix small bug where final price was not output
    
    * Generalize codeforces parsing
    
    * Change parameters for vLLM
    
    * Add pseudocode model
    
    * Add GPT-4o-mini and fix bug in vLLM inference
    
    * Add small section to backtranslate prompts
    
    * Fix small bug where None was not supported for tests
    
    * Add Python script to create taco dataset with nl solutions
    
    * Fix bugs in `create_taco_backtranslate.py`
    
    * Make requery automatically True
    
    * Fix bugs in create_taco_backtranslate
    
    * Add notebooks
    
    * Add functionality for custom local models
    
    * Rename observation to simple observation
    
    * Add no-intuition prompt to idea
    
    * Add internal querier
    
    * Fix bug in completion (no chat) code for basic
    
    * Fix base prompting
    evanzwang authored Jul 23, 2024
    Configuration menu
    Copy the full SHA
    fd10a22 View commit details
    Browse the repository at this point in the history

Commits on Jul 24, 2024

  1. Configuration menu
    Copy the full SHA
    c76e1d7 View commit details
    Browse the repository at this point in the history

Commits on Jul 26, 2024

  1. Evanzwang/monkey search 3 (#10567)

    * Add llama 3.1 8b and 70b as supported by scale llm engine
    
    * Add scale LLM engine requery time
    
    * Create synthetic TACO solutions from gpt-4o
    
    * Catch more llm engine errors
    
    * Fix exception catching
    
    * Add combo observation first iteration
    
    * Add more error catching
    
    * Log in combo observation and fix small bug
    
    * Fix small logging in `combo_observation_model.py`
    
    * Add second iteration of combo observation
    
    * Add automatic batching if too large query
    
    * Add pbar to querying
    
    * Fix small bug where iteration 2 is not called
    
    * Fix bug where problems is not updated to be expanded
    
    * Tweak exponential backoff
    
    * Partially change `combo_ratio_finder.ipynb` to start getting
    performances
    evanzwang authored Jul 26, 2024
    Configuration menu
    Copy the full SHA
    b2c3191 View commit details
    Browse the repository at this point in the history

Commits on Jul 29, 2024

  1. Evanzwang/monkey search 4 🙉 (#10585)

    * Add num_workers to code exec
    
    * Add num_shots to simple prompting
    
    * Add num words and sys prompt option to idea search
    
    * Idea prompt fixes
    
    * Add space to prompt
    
    * Change querying params
    
    * Add llama-3.1
    
    * Add file to create datasets from generations
    
    * Add `format` arg for chat vs completion manual change
    evanzwang authored Jul 29, 2024
    Configuration menu
    Copy the full SHA
    37fea76 View commit details
    Browse the repository at this point in the history

Commits on Jul 31, 2024

  1. Partial GPU utilization for checkpoint code (#10634)

    python research/hugh/open_weights_code/eval_all_checkpoints.py --s3_checkpoint_location=s3://scale-ml/hugh/rlxf/dpo/meta_llama3.1-8b_731-5e6_1/checkpoints/ --s3_output_location=openweightscode/eval_results/dpo_731-5e6_1 --num-gpus=4 --max_checkpoints=20
    
    Co-authored-by: Hugh Zhang <[email protected]>
    hughbzhang and Hugh Zhang authored Jul 31, 2024
    Configuration menu
    Copy the full SHA
    8fddf0f View commit details
    Browse the repository at this point in the history

Commits on Aug 6, 2024

  1. Unify Dataset Formats for Monkey (#10682)

    * Add `fail_codes` to Problem
    
    * Unify datasets and merge generate/eval pipelines
    
    * Add exec args
    
    * Auto-detect optional features
    
    * Unify create_test_bank.py
    
    * Rename parse_dataset_utils to dataset_utils
    
    * cript to clean tests and add public tests
    
    * Increase slightly the exponential backoff
    
    * Add make dataset utilities
    
    * Add pyproject.toml for better imports
    
    * Add __init__.py
    
    * Commit dataset and misc scripts/notebooks
    
    * Add `load_json.ipynb` for completeness
    
    * Add code for taco_cleaner
    
    * Add format_taco_data.py
    
    * Remove unneeded content
    
    * Remove older create generation dataset scripts
    
    * Add code to create idea solve graphs
    
    * Fix small bugs and add `fn_args_join`
    
    * Add timeout to queriers and add better heuristics in `add_public_tests`
    
    * emove line from debugging
    
    * Improve add_public_test heuristic filters
    
    * Refactor into CompletionCache class
    
    * Fix minor keywarg typo
    
    * Small changes to `add_public_tests.py`
    
    * Change model to be customizable
    
    * [UNFINISHED] partially implement `queriers.py` into threadpool
    
    * Update `search` imports
    
    * Edit `add_public_tests.py` for better prompts
    
    * Change plot over time
    
    * Add `filter_lcb_data.py`
    
    * Add `filter_zero_public_tests.py`
    
    * Add thread pool to querier
    
    * Add new OOP-based queriers
    
    * Add notebook to parse D to F data
    
    * Add `num-gpu` to eval.py args
    
    * Small
    
    * Small bugfixes in `query_clients.py` and add `model_configs`
    
    * Add `query_clients.py` modularity to `queriers.py` and upstream
    
    * Remove unneeded large dicts
    
    * Remove needed parts of `querier_utils.py`
    
    * Add custom vLLM config demo
    
    * Fix minor merge bug
    
    * Remove default arguments at lower levels
    evanzwang authored Aug 6, 2024
    Configuration menu
    Copy the full SHA
    f13bd7e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    61627a8 View commit details
    Browse the repository at this point in the history
  3. Basic DPO configs. (#10684)

    dpo_basic_config.json contains basic DPO successful checkpoint
    
    Infra to launch RLXF sweeps (in alpha). Also infra to launch basic eval
    scripts in eval_all_checkpoints.py
    
    Co-authored-by: Hugh Zhang <[email protected]>
    hughbzhang and Hugh Zhang authored Aug 6, 2024
    Configuration menu
    Copy the full SHA
    f6b03a6 View commit details
    Browse the repository at this point in the history

Commits on Aug 7, 2024

  1. Evanzwang/monkey search 6 (#10691)

    * Add format and merge taco notebook
    
    * Update code_exec_reqs to requery
    evanzwang authored Aug 7, 2024
    Configuration menu
    Copy the full SHA
    7d66158 View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2024

  1. Add SGLang and Fix Testbank Bug for Monkey -> 🦍 (#10726)

    * Add SGLang querying functionality
    
    * Add example model_config
    
    * Improve SGLang and write `simultaneous_eval.sh` script
    
    * Fix combo obs to be analyzable
    
    * Small assert to make sure testbank URL is right
    
    * Switch `make_dpo_dataset.py` model config to fix
    
    * Add greater sglang
    
    * Add parsing of starter_code to fix testbank bug
    
    * Adjust minor details to `dataset_utils.py` and `query_clients.py`
    evanzwang authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    0922069 View commit details
    Browse the repository at this point in the history

Commits on Aug 19, 2024

  1. Monkey Search Minor Features and Fixes (+ some refactors) (#10806)

    * Add SGLang querying functionality
    
    * Add example model_config
    
    * Improve SGLang and write `simultaneous_eval.sh` script
    
    * Fix combo obs to be analyzable
    
    * Small assert to make sure testbank URL is right
    
    * Switch `make_dpo_dataset.py` model config to fix
    
    * Add greater sglang
    
    * Add parsing of starter_code to fix testbank bug
    
    * Adjust minor details to `dataset_utils.py` and `query_clients.py`
    
    * Add reward model functionality to `make_dpo_dataset.py`
    
    * Update configs
    
    * Add utility to choose gpus in simultaneous eval
    
    * Add DataParallel to `reward_model_utils.py`
    
    * Add PARAMS to each `query_client` and minor refactor
    
    * Add `model_config_utils.py` to add overwrite args
    
    * Add better model args for all existing search methods
    
    * Fix minor bug in `make_dpo_dataset`
    
    * Add query parameters to query logs
    
    * Relax float constraint on query client price
    
    * Fix Anthropic `query_client.py` and adjust rate limits
    
    * Reduce amounts of print in `query_clients`
    
    * Add `story` prompt method inside `one_prompt_models.py`
    
    Rename `basic_prompting.py` to `one_prompt_models.py`
    
    * Delete unneeded files
    
    * Adjust imports in `make_dpo_dataset.py`
    
    * Add `stringify` and `unstringify` to prepare for other tests
    
    * Fix small `simple_filter_models.py` bug
    
    * Add new one_prompt methods, add exec_string to Problem
    
    * Add notebooks to parse MBPP and HumanEval
    
    * Add modifications in `exec_utils.py` for exec_string
    
    * Small change can detete lol
    
    * Add `batch_apply_on_nested_list` to `python_utils.py`
    
    * Refactor `combo_observation_model.py`
    
    * Add notebooks to parse *_plus datasets
    
    * Add `completions-from-model` arg
    
    To not necessarily do completion-limit x as much problems
    
    * Small changes to `queriers.py` to accomodate for `tuple` convos
    evanzwang authored Aug 19, 2024
    Configuration menu
    Copy the full SHA
    e1dc7b2 View commit details
    Browse the repository at this point in the history

Commits on Aug 27, 2024

  1. Evanzwang/monkey search 9 (#10902)

    * Change `F_mbpp_plus` and `create_test_bank.py` to support exec_string
    
    * Add `parse_orig_lcb.ipynb` for creating new LCB
    
    * Update `parse_orig_lcb.ipynb` to have _C dataset
    
    * Add `map_nary_fn_on_nested_list` to `python_utils`. Add
    `check_similar.py`
    
    * Save test results with underscore between name and test
    
    * Add `deepseek-coder` functionality
    
    * Add LLMEngine 405b
    
    * Change `check_similar.py` to "general" instead of "specific" idea
    
    * Change querying to: idea for code -> Yes/No
    
    * Add script to run most exps
    
    * Fix `simple_filter_models.py` for non-list Test public tests
    
    * Fix Anthropic querying
    
    * Fix bug with testhash and human_eval (add fn_name to exec)
    
    * Add `search.` imports
    
    * Add utility to re-eval a previous results.json.gz file
    
    * Fix small, auxiliary bug on the test hash fn_name problem
    
    * Edit `metrics.py` to support public test filtering
    
    * Fix another auxiliary bug within scripts/re_eval_codes.py
    
    * Add graphing notebook for final results for completeness
    
    * Better infra for graphing
    
    * Fix small human_eval_plus public test issue
    
    * Change `generate_solutions` to return `list[list[str]]`
    
    * Update `check_similar.py` to take in `args.cache-file`
    
    * Update `parse_mbpp_plus` to include 3 more better tests
    
    * Catch Anthropic BadRequest
    
    * Commit notebook changes (?? unsure what changed)
    
    * Add `base_classes` change for `generate_solutions` super method change
    
    * Add SGLang completions functionality
    
    * Remove print("HI") in `query_clients.py`
    
    * Add "This NL solution is wrong" prompt in `combo_observation`
    
    * Add new changes to graph pass@k notebooks
    
    * Update `scripts/run_exps_...`
    
    * Add `baby-deepseek` models with SGLang
    
    * Fix small assertion bug in `query_clients.py`
    
    * (combo obs) Add another step to merge fixes with original nl solution
    
    * Change small type hint (you can ignore this commit)
    
    * Update model_configs and scripts
    
    * Add Fireworks and Together
    
    * Update `check_similar.py` to requery if badrequest
    
    * Comment out line that requires k to be less than min n_comp
    
    * Undo `combo_observation_model.py` enhanced 'fix' prompting
    
    * Update `run_exps` scripts
    
    * Update `graph_notebooks`
    
    * Add `caches` folder to gitignore
    evanzwang authored Aug 27, 2024
    Configuration menu
    Copy the full SHA
    cce546b View commit details
    Browse the repository at this point in the history

Commits on Sep 21, 2024

  1. Evanzwang/monkey search 10 (Paper!) (#11020)

    * Augment testing using more processes
    
    * Adjust Firework parameters
    
    * Change `exec-public` arg to `exec-type`
    
    * Change `re_eval_codes.py` to reflect test execution changes
    
    * Add checkpointing to `re_eval_codes` for cache
    
    * Add right vs wrong in check similar
    
    * Catch rare case where OpenAI logprobs is null
    
    * Add graph plotters for paper push
    
    * Make `re_eval_codes.py` auto-checkpoint
    
    * Mildly change prompts
    
    * Add fireworks llama 70b
    
    * Subtly change the way n_completions work.
    (Should have been with the other prompt commit)
    
    * Update scripts
    
    * Add o1
    
    * Update graphing notebooks
    
    * Update gitignore
    evanzwang authored Sep 21, 2024
    Configuration menu
    Copy the full SHA
    78fd7dc View commit details
    Browse the repository at this point in the history

Commits on Sep 25, 2024

  1. Add ablations to PlanSearch (#11188)

    * Refactor combo obs with observation node, add num layers
    
    * Add arg to fix idea or no
    
    * Refactor prompt generation fns to `prompts` folder
    
    * Add --without-pseudocode flag to combo obs
    
    * Add without-idea arg
    
    * Add unincluded prompts
    
    * Change num-completion args into 2
    
    * Add graphing utilities for ablations
    evanzwang authored Sep 25, 2024
    Configuration menu
    Copy the full SHA
    19caa99 View commit details
    Browse the repository at this point in the history

Commits on Sep 26, 2024

  1. Configuration menu
    Copy the full SHA
    339d18f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    33b405a View commit details
    Browse the repository at this point in the history
  3. Add gitignore

    evanzwang committed Sep 26, 2024
    Configuration menu
    Copy the full SHA
    0cc7495 View commit details
    Browse the repository at this point in the history

Commits on Oct 4, 2024

  1. Add README.md

    evanzwang committed Oct 4, 2024
    Configuration menu
    Copy the full SHA
    560b98f View commit details
    Browse the repository at this point in the history
  2. Add CodeRM submodule

    evanzwang committed Oct 4, 2024
    Configuration menu
    Copy the full SHA
    bc652f8 View commit details
    Browse the repository at this point in the history
  3. Update readme

    evanzwang committed Oct 4, 2024
    Configuration menu
    Copy the full SHA
    c31b6e0 View commit details
    Browse the repository at this point in the history
  4. Add recurse submodules

    evanzwang committed Oct 4, 2024
    Configuration menu
    Copy the full SHA
    09ab36f View commit details
    Browse the repository at this point in the history

Commits on Oct 7, 2024

  1. Configuration menu
    Copy the full SHA
    94a58fd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    84c1ace View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    becdd1b View commit details
    Browse the repository at this point in the history

Commits on Oct 8, 2024

  1. Update requirements

    evanzwang committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    55d9435 View commit details
    Browse the repository at this point in the history
  2. Change requirements

    evanzwang committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    87a0eae View commit details
    Browse the repository at this point in the history
  3. Update readme

    evanzwang committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    f3b745c View commit details
    Browse the repository at this point in the history

Commits on Oct 18, 2024

  1. Merge branch 'main'

    evanzwang committed Oct 18, 2024
    Configuration menu
    Copy the full SHA
    782c2ba View commit details
    Browse the repository at this point in the history