# GPT EVAL: Evaluate your model with OpenAI API

## Quick Start

1. Prepare your model and the baseline model.
    - your model: Huggingface and Megatron-LM format models are supported, other models will be supported in future releases
    - baseline model: Huggingface, Megatron-LM or OpenAI model
    > Evaluating Megatron-LM models requires a customized Megatron-LM which is provided in [`thirdparty`](../../../thirdparty/).

2. Generate answers using [`answer_generator.py`](answer_generator.py) for both your model and the baseline model.
    1. Prepare the benchmark dataset. The toolkit has provided Vicuna Bench([`config/question.jsonl`](./config/question.jsonl)), and you can create custom dataset to generate answers. The custom datasets must be a single file in jsonl format, and each json object in it contains 3 attributes:
        - question_id: int type
        - text: the specific content of the question, string type
        - category: the type of the question, string type

    2. Build the config file (`config.yaml`). The format of the file is as follows:
        ```yaml
        answer_generation:
          model_name: <str>
          question_file: <str>  # path of the benchmark dataset file
          answer_file: <str>    # path of the answer file generated by the model
          batch_size: <int>     # batch size when generating answers
          max_tokens: <int>     # maximum token size for each generated answer
          temperature: <float>
          # Choose one of the following configurations according to your model type
          # Config for huggingface
          huggingface:
            model_path: <str> # path of your model
            tokenizer_path: <str> # path of your tokenizer
          # Config for megatron-lm
          megatron:
            megatron_home: <str>    # root dir of Megatron-LM code
            process_num: <int>      # number of processes to run megatron
            checkpoint_path: <str>  # megatron checkpoint dir path
            tokenizer_type: <str>   # only support 'gpt2' and 'sentencepiece' for now
            vocab_path: <str>       # path to the vocab file for gpt2 tokenizer
            merge_path: <str>       # path to the merge file for gpt2 tokenizer
            tokenizer_path: <str>   # path to the tokenizer model for sentencepiece tokenizer
            iteration: <int>        # iteration of the checkpoint to load
          # Config for openai
          openai:
            openai_organization: <str>
            openai_api_key: <str>
            model: <str> # the type of model，e.g., gpt-3.5-turbo
            max_retry: <int> # the maximum number of retries when api access fails
        ```
    3. Run the script.
        ```shell
        python answer_generator.py --config <path to config.yaml>
        ```

3. Get OpenAI API evaluation results via [`gpt_evaluator.py`](gpt_evaluator.py).
    1. Prepare dependencies. Make sure the following files are ready:
        - question_file: the benchmark dataset file in previous step
        - answer_file: the answer file of your model in previous step
        - baseline_file: the answer file of the baseline model in previous step
        - prompt_file: a file contains multiple prompt templates, the toolkit has provided a sample file ([`config/prompt.jsonl`](config/prompt.jsonl))
        - reviewer_file: a file contains multiple reviewer templates (including the model type and other parameters used in the OpenAI api request)，the toolkit has provided a sample file ([`config/reviewer.json`](config/reviewer.jsonl))
    2. Build the config file (`config.yaml`). The format of the file is as follows:
        ```yaml
        gpt_evaluation:
          openai_organization: <str>
          openai_api_key: <str>
          question_file: <str>
          answer_file: <str>
          baseline_file: <str>
          prompt_file: <str>
          reviewer_file: <str>
          result_file: <str>    # path of the evaluation result
        ```
    3. Run the script.
        ```shell
        python gpt_evaluator.py --config <path to config.yaml>
        ```