add answer_relevance

by huaiyu-zhu - opened Nov 4

base: refs/heads/main

←

from: refs/pr/5

Discussion Files changed

+589149

-0

Files changed (18) hide show

answer_relevance_classifier_lora/README.md +363 -0
answer_relevance_classifier_lora/adapter_config.json +33 -0
answer_relevance_classifier_lora/adapter_model.safetensors +3 -0
answer_relevance_classifier_lora/added_tokens.json +9 -0
answer_relevance_classifier_lora/merges.txt +0 -0
answer_relevance_classifier_lora/special_tokens_map.json +39 -0
answer_relevance_classifier_lora/tokenizer.json +0 -0
answer_relevance_classifier_lora/tokenizer_config.json +235 -0
answer_relevance_classifier_lora/vocab.json +0 -0
answer_relevance_rewriter_lora/README.md +372 -0
answer_relevance_rewriter_lora/adapter_config.json +33 -0
answer_relevance_rewriter_lora/adapter_model.safetensors +3 -0
answer_relevance_rewriter_lora/added_tokens.json +9 -0
answer_relevance_rewriter_lora/merges.txt +0 -0
answer_relevance_rewriter_lora/special_tokens_map.json +39 -0
answer_relevance_rewriter_lora/tokenizer.json +0 -0
answer_relevance_rewriter_lora/tokenizer_config.json +235 -0
answer_relevance_rewriter_lora/vocab.json +0 -0

answer_relevance_classifier_lora/README.md ADDED Viewed

	@@ -0,0 +1,363 @@

+---
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-generation
+library_name: peft
+library_name: transformers
+---
+# Intrinsics for Answer Relevance Classifier
+## Model Summary
+This is a RAG-specific intrinsic for answer relevance classification task.
+The model takes as input a multi-turn conversation ending with assistant response,
+and provides a classification of whether the assistant's response is relevant to the
+user's final inquiry, as well as categorization of the relevance and reasoning for the conclusions.
+We provide two intrinsics implemented as LoRA adapters (LoRA/aLoRA) trained over
+Granite-3.3-2b-instruct, Granite-3.3-8b-instruct.
+- **Developer:** IBM Research
+- **Model type:** LoRA and aLoRA adapter for
+  [ibm-granite/granite-3.3-2b-instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct),
+  [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
+- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+## Intended use
+This rag specific intrinsics is intended to be used to post-process the generated assistant response.
+- The binary classification of relevance can be used to determine if the assistance response is suitable
+to be given to the user, or a rewrite to a more relevant response is necessary.
+- The category and the analysis providing reasoning for the conclusion can be used to
+be incorporated into prompt for the answer relevance rewriter, indicating specific directions
+the rewrite must take to overcome the perceived deficiency in relevance.
+**Model input**: The input to the answer relevance classifier intrinsic is an
+OpenAI-compatible chat completion request, containing a list of conversation
+turns that can alternate between the `user` and `assistant` role and ending with
+a `assistant` turn.
+**Model output**: The output of the answer relevance classifier intrinsic is the result of the
+original chat completion request formatted as a JSON object of the following schema
+    {
+        answer_relevance_analysis: <Free text analysis of whether and in which ways the assistant response is relevant or not>
+        answer_relevance_category: <One of a set of labels>
+        answer_relevance_likelihood: <float between 0.0 and 1.0>
+    }
+The set of labels for `answer_relevance_category` are:
+    "Pertinent",
+    "Pertinent with relevant extra",
+    "Excessive unnecessary information",
+    "Unduly restrictive",
+    "Too vague or generic",
+    "Contextual misalignment",
+    "Misinterpreted inquiry",
+    "No attempt"
+Please see the code snippets in the Quickstart Example section below for
+examples that illustrate the intrinsic's input/output.
+## Quickstart Example
+To run the answer relevance classifier intrinsics through granite-common, you can either (a)
+use an OpenAI-compatible inference backend, such as vLLM or (b) use the Hugging
+Face transformers library. We provide below instructions for each of the two
+approaches. Note that running inference using vLLM or another scalable
+OpenAI-compatible inference backend should be significantly faster than using
+the Hugging Face transformers library directly.
+### Using an OpenAI-Compatible Inference Backend
+To run the intrinsic using an OpenAI-compatible inference backend, such as vLLM,
+follow the steps below.
+1.  Install the granite-common library:
+        pip install git+https://github.com/ibm-granite/granite-common.git
+        pip install granite_common[nltk]
+2.  Install the Hugging Face CLI:
+        pip install -U "huggingface_hub[cli]"
+3.  Install vLLM:
+        pip install vllm
+4.  Download the intrinsics library:
+        hf download ibm-granite/rag-intrinsics-lib --local-dir ./rag-intrinsics-lib
+5.  Edit the vLLM startup script found in `./rag-intrinsics-lib/run_vllm.sh`
+    using your favorite editor:
+    Edit the constants `BASE_MODEL_NAME` and `BASE_MODEL_ORG` depending on the
+    base model on which the desired LoRA adapter has been trained. Optionally,
+    edit the constant `PORT` to change the port on which vLLM will run. Save the
+    modified file and exit the editor.
+6.  Start vLLM through the startup script. The first time you run the script,
+    you may have to change the permissions to allow execution:
+        cd rag-intrinsics-lib
+        chmod u+x ./run_vllm.sh
+        ./run_vllm.sh &
+7.  Run the following code snippet:
+        import json
+        import openai
+        import granite_common
+        intrinsic_name = "answer_relevance_classifier"
+        # Change the following constant to select a different base model
+        base_model_name = "granite-3.3-8b-instruct"
+        # Change the following constants as needed to reflect the location of the vLLM server
+        # The selected port should be identical to the one you specified in the vLLM startup script
+        openai_base_url = "http://localhost:55555/v1"
+        openai_api_key = "rag_intrinsics_1234"
+        # Fetch IO configuration file from Hugging Face Hub
+        io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
+            intrinsic_name, base_model_name
+        )
+        # Instantiate input/output processors
+        rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
+        result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
+        # Sample request
+        request_json = {
+            "messages": [
+                {
+                "role": "user",
+                "content": "Who attended the meeting?"
+                },
+                {
+                "role": "assistant",
+                "content": "Many people attended the meeting."
+                }
+            ],
+            "extra_body": {
+                "documents": [
+                {
+                    "doc_id": "1",
+                    "text": "Meeting attendees: Alice, Bob, Carol."
+                },
+                {
+                    "doc_id": "2",
+                    "text": "Meeting time: 9:00 am to 11:00 am."
+                }
+                ]
+            }
+         }
+        # Add other parameters
+        request_json["model"] = intrinsic_name
+        request_json["temperature"] = 0.0
+        # Apply input processor
+        intrinsic_kwargs = {}
+        rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
+        # Run inference
+        client = openai.OpenAI(base_url=openai_base_url, api_key=openai_api_key)
+        chat_completion = client.chat.completions.create(**rewritten_request.model_dump())
+        # Apply output processor
+        processed_chat_completion = result_processor.transform(
+            chat_completion, rewritten_request
+        )
+        # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
+        parsed_contents = json.loads(processed_chat_completion.choices[0].message.content)
+        print("JSON output:")
+        print(json.dumps(parsed_contents, indent=2))
+### Using the Hugging Face Transformers Library
+To run the intrinsic using the Hugging Face transformers library directly,
+follow the steps below.
+1.  Install the granite-common library:
+        pip install git+https://github.com/ibm-granite/granite-common.git
+        pip install granite_common[nltk]
+2.  Install the Hugging Face CLI:
+        pip install -U "huggingface_hub[cli]"
+3.  Install PEFT:
+        pip install peft
+4.  Install xgrammar:
+        pip install xgrammar
+5.  Run the following code snippet:
+        import json
+        import granite_common.util
+        import peft
+        intrinsic_name = "answer_relevance_classifier"
+        # Change the following constant to select a different base model
+        base_model_name = "granite-3.3-8b-instruct"
+        use_cuda = True  # Set to False to use default PyTorch device for this machine + model
+        # Fetch IO configuration file from Hugging Face Hub
+        io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
+            intrinsic_name, base_model_name
+        )
+        # Fetch LoRA directory from Hugging Face Hub
+        lora_dir = granite_common.intrinsics.util.obtain_lora(
+            intrinsic_name, base_model_name
+        )
+        # Instantiate input/output processors
+        rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
+        result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
+        # Sample request
+        request_json = {
+            "messages": [
+                {
+                "role": "user",
+                "content": "Who attended the meeting?"
+                },
+                {
+                "role": "assistant",
+                "content": "Many people attended the meeting."
+                }
+            ],
+            "extra_body": {
+                "documents": [
+                {
+                    "doc_id": "1",
+                    "text": "Meeting attendees: Alice, Bob, Carol."
+                },
+                {
+                    "doc_id": "2",
+                    "text": "Meeting time: 9:00 am to 11:00 am."
+                }
+                ]
+            }
+         }
+        # Add additional parameters
+        request_json["model"] = intrinsic_name
+        request_json["temperature"] = 0.0
+        # Apply input processor
+        intrinsic_kwargs = {}
+        rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
+        # Load the base model and merge LoRA weights
+        model, tokenizer = granite_common.util.load_transformers_lora(lora_dir)
+        if use_cuda:
+            model = model.cuda()
+        # Convert the chat completion request into a the Transformers library's proprietary
+        # format.
+        generate_input, other_input = (
+            granite_common.util.chat_completion_request_to_transformers_inputs(
+                rewritten_request,
+                tokenizer,
+                model,
+            )
+        )
+        # Use the Transformers library's APIs to generate one or more completions,
+        # then convert those completions into OpenAI-compatible chat completion
+        responses = granite_common.util.generate_with_transformers(
+            tokenizer, model, generate_input, other_input
+        )
+        # Apply output processor
+        transformed_responses = result_processor.transform(responses, rewritten_request)
+        # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
+        parsed_contents = json.loads(transformed_responses.choices[0].message.content)
+        print("JSON output:")
+        print(json.dumps(parsed_contents, indent=2))
+## Training Details
+### Training Data
+The training data is created in the following process
+1. Take the synthetic rag-data-granite dataset, consisting of conversations between user and assistant.
+2. Replace the assistant response by running granite-3.2-intrinsics at temperature 1.0.
+3. Produce answer_relevance_classifier target output using mixtral-large with prompts with in-context examples.
+The conversation created in steps 1 and 2 are taken as training input.  The json string from step 3
+is taken as train target output.
+#### Training Hyperparameters
+The LoRA adapter was fine-tuned using PEFT under the following regime: rank =
+32, learning rate = 3.0e-06, number of epochs = 50.
+## Evaluation
+### Answer Relevance Classifier
+We evaluated the model on test data set generated by the same procedure as the training process,
+using GPT-4o as judge.
+The following table presents results comparing baselines and frontier models
+on the answer relevance classification task. The LoRAs perform on par with frontier models
+of much larger size and outperforms frontier models of comparable size.
+|                         | Not relevant        |        |       | Relevant       |        |       |
+|:------------------------|:----------|:-------|:------|:----------|:-------|:------|
+|                         | precision | recall | f1    | precision | recall | f1    |
+| mixtral-8x22b-v0.1      | 0.934     | 0.592  | 0.725 | 0.886     | 0.880  | 0.883 |
+| llama-3.3-70b           | 0.895     | 0.829  | 0.861 | 0.898     | 0.939  | 0.918 |
+| gpt-oss-20b             | 0.747     | 0.745  | 0.746 | 0.969     | 0.782  | 0.865 |
+| gpt-4o                  | 0.775     | 0.945  | 0.852 | 0.974     | 0.690  | 0.808 |
+| gpt-4o-mini             | 0.818     | 0.921  | 0.866 | 0.948     | 0.872  | 0.908 |
+|                         |           |        |       |           |        |       |
+| granite-3.3-2b/lora     | 0.743     | 0.861  | 0.798 | 0.909     | 0.806  | 0.855 |
+| granite-3.3-2b/alora    | 0.761     | 0.821  | 0.790 | 0.894     | 0.833  | 0.862 |
+| granite-3.3-8b/lora     | 0.783     | 0.900  | 0.837 | 0.931     | 0.842  | 0.884 |
+| granite-3.3-8b/alora    | 0.793     | 0.879  | 0.834 | 0.919     | 0.856  | 0.886 |
+### Comparing the Answer Relevance Classifier Intrinsics vs. Vanilla Granite Models
+We compare the performance of Granite 3.3-2b, Granite 3.3-8b Instruct
+vs. answer relevance classifier intrinsics implemented as LoRA adapters.
+It is seen that the LoRAs significantly out perform the base models.
+|                         | Not relevant        |        |       | Relevant       |        |       |
+|:------------------------|:----------|:-------|:------|:----------|:-------|:------|
+|                         | precision | recall | f1    | precision | recall | f1    |
+| granite-3.3-2b          |           |        |       |           |        |       |
+| granite-3.3-2b/lora     | 0.743     | 0.861  | 0.798 | 0.909     | 0.806  | 0.855 |
+| granite-3.3-2b/alora    | 0.761     | 0.821  | 0.790 | 0.894     | 0.833  | 0.862 |
+|                         |           |        |       |           |        |       |
+| granite-3.3-8b          | 0.798     | 0.542  | 0.646 | 0.813     | 0.770  | 0.791 |
+| granite-3.3-8b/lora     | 0.783     | 0.900  | 0.837 | 0.931     | 0.842  | 0.884 |
+| granite-3.3-8b/alora    | 0.793     | 0.879  | 0.834 | 0.919     | 0.856  | 0.886 |
+## Model Card Authors
+[Huaiyu Zhu](mailto:[email protected])
+### Framework versions
+- PEFT 0.14.0

answer_relevance_classifier_lora/adapter_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "ibm-granite/granite-3.3-8b-instruct",
+  "bias": "none",
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

answer_relevance_classifier_lora/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:27c8cf16a5ea98e110df54e433c43432ee8f18373f673293989e1deacee29cd7
+size 94404160

answer_relevance_classifier_lora/added_tokens.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "<|end_of_cite|>": 49156,
+  "<|end_of_plugin|>": 49158,
+  "<|end_of_role|>": 49153,
+  "<|start_of_cite|>": 49155,
+  "<|start_of_plugin|>": 49157,
+  "<|start_of_role|>": 49152,
+  "<|tool_call|>": 49154
+}

answer_relevance_classifier_lora/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

answer_relevance_classifier_lora/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "additional_special_tokens": [
+    "<|start_of_role|>",
+    "<|end_of_role|>",
+    "<|tool_call|>",
+    "<|start_of_cite|>",
+    "<|end_of_cite|>",
+    "<|start_of_plugin|>",
+    "<|end_of_plugin|>"
+  ],
+  "bos_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

answer_relevance_classifier_lora/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

answer_relevance_classifier_lora/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,235 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|end_of_text|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<fim_prefix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<fim_middle>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<fim_suffix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<fim_pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "5": {
+      "content": "<filename>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "6": {
+      "content": "<gh_stars>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "7": {
+      "content": "<issue_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "8": {
+      "content": "<issue_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "9": {
+      "content": "<issue_closed>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "10": {
+      "content": "<jupyter_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "11": {
+      "content": "<jupyter_text>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "12": {
+      "content": "<jupyter_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "13": {
+      "content": "<jupyter_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "14": {
+      "content": "<empty_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "15": {
+      "content": "<commit_before>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "16": {
+      "content": "<commit_msg>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "17": {
+      "content": "<commit_after>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "18": {
+      "content": "<reponame>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49152": {
+      "content": "<|start_of_role|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49153": {
+      "content": "<|end_of_role|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49154": {
+      "content": "<|tool_call|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49155": {
+      "content": "<|start_of_cite|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49156": {
+      "content": "<|end_of_cite|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49157": {
+      "content": "<|start_of_plugin|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49158": {
+      "content": "<|end_of_plugin|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|start_of_role|>",
+    "<|end_of_role|>",
+    "<|tool_call|>",
+    "<|start_of_cite|>",
+    "<|end_of_cite|>",
+    "<|start_of_plugin|>",
+    "<|end_of_plugin|>"
+  ],
+  "bos_token": "<|end_of_text|>",
+  "chat_template": "{# Alias tools -> available_tools #}\n{%- if tools and not available_tools -%}\n    {%- set available_tools = tools -%}\n{%- endif -%}\n{%- if messages[0]['role'] == 'system' %}\n     {%- set system_message = messages[0]['content'] %}\n     {%- set loop_messages = messages[1:] %}\n {%- else %}\n     {%- set system_message = \"Knowledge Cutoff Date: April 2024.\nToday's Date: \" + strftime_now('%B %d, %Y') + \".\nYou are Granite, developed by IBM.\" %}\n     {%- if available_tools and documents %}\n         {%- set system_message = system_message + \" You are a helpful assistant with access to the following tools. When a tool is required to answer the user's query, respond only with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\nWrite the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n     {%- elif available_tools %}\n         {%- set system_message = system_message + \" You are a helpful assistant with access to the following tools. When a tool is required to answer the user's query, respond only with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\" %}\n     {%- elif documents %}\n         {%- set system_message = system_message + \" Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n    {%- elif thinking %}\n    {%- set system_message = system_message + \" You are a helpful AI assistant.\nRespond to every user query in a comprehensive and detailed way. You can write down your thoughts and reasoning process before responding. In the thought process, engage in a comprehensive cycle of analysis, summarization, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. In the response section, based on various attempts, explorations, and reflections from the thoughts section, systematically present the final solution that you deem correct. The response should summarize the thought process. Write your thoughts between <think></think> and write your response between <response></response> for each user query.\" %}\n     {%- else %}\n         {%- set system_message = system_message + \" You are a helpful AI assistant.\" %}\n     {%- endif %}\n     {%- if 'citations' in controls and documents %}\n         {%- set system_message = system_message + '\nUse the symbols <|start_of_cite|> and <|end_of_cite|> to indicate when a fact comes from a document in the search result, e.g <|start_of_cite|> {document_id: 1}my fact <|end_of_cite|> for a fact from document 1. Afterwards, list all the citations with their corresponding documents in an ordered list.' %}\n     {%- endif %}\n     {%- if 'hallucinations' in controls and documents %}\n         {%- set system_message = system_message + '\nFinally, after the response is written, include a numbered list of sentences from the response with a corresponding risk value that are hallucinated and not based in the documents.' %}\n     {%- endif %}\n     {%- set loop_messages = messages %}\n {%- endif %}\n {{- '<|start_of_role|>system<|end_of_role|>' + system_message + '<|end_of_text|>\n' }}\n {%- if available_tools %}\n     {{- '<|start_of_role|>available_tools<|end_of_role|>' }}\n     {{- available_tools | tojson(indent=4) }}\n     {{- '<|end_of_text|>\n' }}\n {%- endif %}\n {%- if documents %}\n     {%- for document in documents %}\n         {{- '<|start_of_role|>document {\"document_id\": \"' + document['doc_id'] | string + '\"}<|end_of_role|>\n' }}\n         {{- document['text'] }}\n         {{- '<|end_of_text|>\n' }}\n              {%- endfor %}\n {%- endif %}\n {%- for message in loop_messages %}\n     {{- '<|start_of_role|>' + message['role'] + '<|end_of_role|>' + message['content'] + '<|end_of_text|>\n' }}\n     {%- if loop.last and add_generation_prompt %}\n         {{- '<|start_of_role|>assistant' }}\n             {%- if controls %}\n                 {{- ' ' + controls | tojson()}}\n             {%- endif %}\n         {{- '<|end_of_role|>' }}\n     {%- endif %}\n {%- endfor %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|end_of_text|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 9223372036854775807,
+  "pad_token": "<|end_of_text|>",
+  "padding_side": "left",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|end_of_text|>",
+  "vocab_size": 49152
+}

answer_relevance_classifier_lora/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

answer_relevance_rewriter_lora/README.md ADDED Viewed

	@@ -0,0 +1,372 @@

+---
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-generation
+library_name: peft
+library_name: transformers
+---
+# Intrinsics for Answer Relevance Rewriter
+## Model Summary
+This is a RAG-specific intrinsic for answer relevance rewrite task.
+The model takes as input the chat completion from answer relevance classifier output
+consisting of conversation as well as answer_relevance_classification, together with grounding documents,
+and provides a rewritten assistant response that is more relevant to the user's final inquiry.
+We provide two intrinsics implemented as LoRA adapters (LoRA/aLoRA) trained over
+Granite-3.3-2b-instruct, Granite-3.3-8b-instruct.
+- **Developer:** IBM Research
+- **Model type:** LoRA and aLoRA adapter for
+  [ibm-granite/granite-3.3-2b-instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct),
+  [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
+- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+## Intended use
+This rag specific intrinsics is intended to be used to post-process the generated assistant response.
+It should be used following the answer relevance classifier intrinsic, and should be applied to
+the cases where the `answer_relevance_likelihood` is below a certain threshold according to application criteria.
+For cases where the assistant answer is deemed not relevant (where `answer_relevance_likelihood` is below a
+given threshold), the answer relevance rewriter intrinsic can be used to rewrite the assistant response
+into a more relevant response.   It takes as input the chat completion
+from answer relevance classifier output and the grounding documents.  Its output is of the form
+    {
+        answer_relevance_rewrite: <Rewritten response>
+    }
+The rewriter is instructed to only correct deficiencies in relevance as identified by the classifier,
+and ensure the rewritten response is grounded in the conversation and given documents.
+**Model input**: The input to the answer relevance rewriter intrinsic is an
+OpenAI-compatible chat completion request, containing a list of conversation
+turns that can alternate between the `user` and `assistant` role and ending with
+a `assistant` turn, plus two additional turns:
+- A conversation between user and assistant ending with assistant response
+- An additional user turn with content "answer_relevance"
+**Model output**: The output of the answer relevance rewriter intrinsic is the result of the
+original chat completion request formatted as a JSON object of the following schema
+    {
+        answer_relevance_rewrite: <Rewritten response>
+    }
+Please see the code snippets in the Quickstart Example section below for
+examples that illustrate the intrinsic's input/output.
+## Quickstart Example
+To run the answer relevance rewriter intrinsics through granite-common, you can either (a)
+use an OpenAI-compatible inference backend, such as vLLM or (b) use the Hugging
+Face transformers library. We provide instructions for each of the two
+approaches below. Note that running inference using vLLM or another scalable
+OpenAI-compatible inference backend should be significantly faster than using
+the Hugging Face transformers library directly.
+### Using an OpenAI-Compatible Inference Backend
+To run the intrinsic using an OpenAI-compatible inference backend, such as vLLM,
+follow the steps below.
+1.  Install the granite-common library:
+        pip install git+https://github.com/ibm-granite/granite-common.git
+        pip install granite_common[nltk]
+2.  Install the Hugging Face CLI:
+        pip install -U "huggingface_hub[cli]"
+3.  Install vLLM:
+        pip install vllm
+4.  Download the intrinsics library:
+        hf download ibm-granite/rag-intrinsics-lib --local-dir ./rag-intrinsics-lib
+5.  Edit the vLLM startup script found in `./rag-intrinsics-lib/run_vllm.sh`
+    using your favorite editor:
+    Edit the constants `BASE_MODEL_NAME` and `BASE_MODEL_ORG` depending on the
+    base model on which the desired LoRA adapter has been trained. Optionally,
+    edit the constant `PORT` to change the port on which vLLM will run. Save the
+    modified file and exit the editor.
+6.  Start vLLM through the startup script. The first time you run the script,
+    you may have to change the permissions to allow execution:
+        cd rag-intrinsics-lib
+        chmod u+x ./run_vllm.sh
+        ./run_vllm.sh &
+7.  Run the following code snippet:
+        import json
+        import openai
+        import granite_common
+        intrinsic_name = "answer_relevance_classifier"
+        # Change the following constant to select a different base model
+        base_model_name = "granite-3.3-8b-instruct"
+        # Change the following constants as needed to reflect the location of the vLLM server
+        # The selected port should be identical to the one you specified in the vLLM startup script
+        openai_base_url = "http://localhost:55555/v1"
+        openai_api_key = "rag_intrinsics_1234"
+        # Fetch IO configuration file from Hugging Face Hub
+        io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
+            intrinsic_name, base_model_name
+        )
+        # Instantiate input/output processors
+        rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
+        result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
+        # Sample request
+        request_json = {
+            "messages": [
+                {
+                "role": "user",
+                "content": "Who attended the meeting?"
+                },
+                {
+                "role": "assistant",
+                "content": "Many people attended the meeting."
+                }
+            ],
+            "extra_body": {
+                "documents": [
+                {
+                    "doc_id": "1",
+                    "text": "Meeting attendees: Alice, Bob, Carol."
+                },
+                {
+                    "doc_id": "2",
+                    "text": "Meeting time: 9:00 am to 11:00 am."
+                }
+                ]
+            }
+         }
+        # Add other parameters
+        request_json["model"] = intrinsic_name
+        request_json["temperature"] = 0.0
+        # Apply input processor
+        intrinsic_kwargs = {
+            "answer_relevance_category": "No attempt",
+            "answer_relevance_analysis": "The inquiry asks for the attendees of the meeting. The response provides a vague and non-specific answer that does not address the inquiry.",
+            "correction_method": "providing a relevant response if an inquiry should be answered, or providing a short response if the last user utterance contains no inquiry"
+        }
+        rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
+        # Run inference
+        client = openai.OpenAI(base_url=openai_base_url, api_key=openai_api_key)
+        chat_completion = client.chat.completions.create(**rewritten_request.model_dump())
+        # Apply output processor
+        processed_chat_completion = result_processor.transform(
+            chat_completion, rewritten_request
+        )
+        # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
+        parsed_contents = json.loads(processed_chat_completion.choices[0].message.content)
+        print("JSON output:")
+        print(json.dumps(parsed_contents, indent=2))
+### Using the Hugging Face Transformers Library
+To run the intrinsic using the Hugging Face transformers library directly,
+follow the steps below.
+1.  Install the granite-common library:
+        pip install git+https://github.com/ibm-granite/granite-common.git
+        pip install granite_common[nltk]
+2.  Install the Hugging Face CLI:
+        pip install -U "huggingface_hub[cli]"
+3.  Install PEFT:
+        pip install peft
+4.  Install xgrammar:
+        pip install xgrammar
+5.  Run the following code snippet:
+        import json
+        import granite_common.util
+        import peft
+        intrinsic_name = "answer_relevance_rewriter"
+        # Change the following constant to select a different base model
+        base_model_name = "granite-3.3-8b-instruct"
+        use_cuda = True  # Set to False to use default PyTorch device for this machine + model
+        # Fetch IO configuration file from Hugging Face Hub
+        io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
+            intrinsic_name, base_model_name
+        )
+        # Fetch LoRA directory from Hugging Face Hub
+        lora_dir = granite_common.intrinsics.util.obtain_lora(
+            intrinsic_name, base_model_name
+        )
+        # Instantiate input/output processors
+        rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
+        result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
+        # Sample request
+        request_json = {
+            "messages": [
+                {
+                "role": "user",
+                "content": "Who attended the meeting?"
+                },
+                {
+                "role": "assistant",
+                "content": "Many people attended the meeting."
+                }
+            ],
+            "extra_body": {
+                "documents": [
+                {
+                    "doc_id": "1",
+                    "text": "Meeting attendees: Alice, Bob, Carol."
+                },
+                {
+                    "doc_id": "2",
+                    "text": "Meeting time: 9:00 am to 11:00 am."
+                }
+                ]
+            }
+         }
+        # Add additional parameters
+        request_json["model"] = intrinsic_name
+        request_json["temperature"] = 0.0
+        # Apply input processor
+        intrinsic_kwargs = {
+            "answer_relevance_category": "No attempt",
+            "answer_relevance_analysis": "The inquiry asks for the attendees of the meeting. The response provides a vague and non-specific answer that does not address the inquiry.",
+            "correction_method": "providing a relevant response if an inquiry should be answered, or providing a short response if the last user utterance contains no inquiry"
+        }
+        rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
+        # Load the base model and merge LoRA weights
+        model, tokenizer = granite_common.util.load_transformers_lora(lora_dir)
+        if use_cuda:
+            model = model.cuda()
+        # Convert the chat completion request into a the Transformers library's proprietary
+        # format.
+        generate_input, other_input = (
+            granite_common.util.chat_completion_request_to_transformers_inputs(
+                rewritten_request,
+                tokenizer,
+                model,
+            )
+        )
+        # Use the Transformers library's APIs to generate one or more completions,
+        # then convert those completions into OpenAI-compatible chat completion
+        responses = granite_common.util.generate_with_transformers(
+            tokenizer, model, generate_input, other_input
+        )
+        # Apply output processor
+        transformed_responses = result_processor.transform(responses, rewritten_request)
+        # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
+        parsed_contents = json.loads(transformed_responses.choices[0].message.content)
+        print("JSON output:")
+        print(json.dumps(parsed_contents, indent=2))
+## Training Details
+### Training Data
+The training data is created in the following process
+1. Take the synthetic rag-data-granite dataset, consisting of conversations between user and assistant.
+2. Replace the assistant response by running granite-3.2-intrinsics at temperature 1.0.
+3. Produce answer_relevance_rewriter target output using mixtral-large with prompts with in-context examples.
+The conversation created in steps 1 and 2 are taken as training input.  The json string from step 3
+is taken as train target output.
+#### Training Hyperparameters
+The LoRA adapter was fine-tuned using PEFT under the following regime: rank =
+32, learning rate = 1.0e-04, number of epochs = 5.
+## Evaluation
+### Answer Relevance Rewriter
+We evaluated the model on test data set generated by the same procedure as the training process,
+using GPT-4o as judge.
+The following table presents results comparing baselines and frontier models
+on the answer relevance rewrite task. The data sets consists of those classified as irrelevant by
+mixtral-large.  The evaluations are first divided into two parts, those that are truly irrelevant,
+for which we measure the rate of rewrite becoming relevant, and those that are false irrelevant,
+for which we measure the rate of rewrite becoming irrelevant.  Then the overall rate flipping
+irrelevant to relevant and flipping relevant to irrelevant are calculated, and well as the net gain
+of relevance and resulting final relevance.
+The LoRAs out perform the best of frontier models
+|                      | True irrelevant <br> flip to relevant | False irrelevant <br> flip to irrelevant| Overall <br> flip irrelevant <br> to relevant | Overall <br> flip relevant <br> to irrelevant| net gain | Result <br>relevance |
+|:---------------------|:--------------|:---------|:------------------------------|:---------|:---------|:--------------|
+| mixtral-8x22b-v0.1   | 0.416         | 0.101    | 0.286                         | 0.032    | 0.254    | 0.566         |
+| llama-3.3-70b        | 0.804         | 0.041    | 0.554                         | 0.013    | 0.541    | 0.853         |
+| gpt-oss-20b          | 0.902         | 0.034    | 0.621                         | 0.011    | 0.610    | 0.922         |
+| gpt-4o               | 0.960         | 0.014    | 0.661                         | 0.004    | 0.657    | 0.968         |
+| gpt-4o-mini          | 0.758         | 0.027    | 0.522                         | 0.008    | 0.514    | 0.825         |
+|                      |               |          |                               |          |          |               |
+| granite-3.3-2b/lora  | 0.972         | 0.027    | 0.669                         | 0.008    | 0.661    | 0.973         |
+| granite-3.3-2b/alora | 0.972         | 0.007    | 0.669                         | 0.002    | 0.667    | 0.979         |
+| granite-3.3-8b/lora  | 0.969         | 0.014    | 0.667                         | 0.004    | 0.663    | 0.975         |
+| granite-3.3-8b/alora | 0.966         | 0.027    | 0.665                         | 0.008    | 0.657    | 0.968         |
+|                      |               |          |                               |          |          |               |
+### Comparing the Answer Relevance Rewriter Intrinsics vs. Vanilla Granite Models
+We compare the performance of Granite 3.3-2b, Granite 3.3-8b Instruct
+vs. answer relevance rewriter intrinsics implemented as LoRA adapters.
+It is seen that the LoRAs significantly out perform the base models.
+|                      | True irrelevant <br> flip to relevant | False irrelevant <br> flip to irrelevant| Overall <br> flip irrelevant <br> to relevant | Overall <br> flip relevant <br> to irrelevant| net gain | Result relevance |
+|:---------------------|:--------------|:---------|:------------------------------|:---------|:---------|:--------------|
+| granite-3.3-2b       | 0.346         | 0.169    | 0.238                         | 0.053    | 0.185    | 0.497         |
+| granite-3.3-2b/lora  | 0.972         | 0.027    | 0.669                         | 0.008    | 0.661    | 0.973         |
+| granite-3.3-2b/alora | 0.972         | 0.007    | 0.669                         | 0.002    | 0.667    | 0.979         |
+|                      |               |          |                               |          |          |               |
+| granite-3.3-8b       | 0.266         | 0.277    | 0.183                         | 0.086    | 0.097    | 0.408         |
+| granite-3.3-8b/lora  | 0.969         | 0.014    | 0.667                         | 0.004    | 0.663    | 0.975         |
+| granite-3.3-8b/alora | 0.966         | 0.027    | 0.665                         | 0.008    | 0.657    | 0.968         |
+|                      |               |          |                               |          |          |               |
+## Model Card Authors
+[Huaiyu Zhu](mailto:[email protected])
+### Framework versions
+- PEFT 0.14.0

answer_relevance_rewriter_lora/adapter_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "ibm-granite/granite-3.3-8b-instruct",
+  "bias": "none",
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "k_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

answer_relevance_rewriter_lora/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:74db6518798528a2cb2ecd620825a95db228c54a87393e88a5832218e77044ce
+size 94404160

answer_relevance_rewriter_lora/added_tokens.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "<|end_of_cite|>": 49156,
+  "<|end_of_plugin|>": 49158,
+  "<|end_of_role|>": 49153,
+  "<|start_of_cite|>": 49155,
+  "<|start_of_plugin|>": 49157,
+  "<|start_of_role|>": 49152,
+  "<|tool_call|>": 49154
+}

answer_relevance_rewriter_lora/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

answer_relevance_rewriter_lora/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "additional_special_tokens": [
+    "<|start_of_role|>",
+    "<|end_of_role|>",
+    "<|tool_call|>",
+    "<|start_of_cite|>",
+    "<|end_of_cite|>",
+    "<|start_of_plugin|>",
+    "<|end_of_plugin|>"
+  ],
+  "bos_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

answer_relevance_rewriter_lora/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

answer_relevance_rewriter_lora/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,235 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|end_of_text|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<fim_prefix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<fim_middle>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<fim_suffix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<fim_pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "5": {
+      "content": "<filename>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "6": {
+      "content": "<gh_stars>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "7": {
+      "content": "<issue_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "8": {
+      "content": "<issue_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "9": {
+      "content": "<issue_closed>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "10": {
+      "content": "<jupyter_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "11": {
+      "content": "<jupyter_text>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "12": {
+      "content": "<jupyter_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "13": {
+      "content": "<jupyter_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "14": {
+      "content": "<empty_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "15": {
+      "content": "<commit_before>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "16": {
+      "content": "<commit_msg>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "17": {
+      "content": "<commit_after>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "18": {
+      "content": "<reponame>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49152": {
+      "content": "<|start_of_role|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49153": {
+      "content": "<|end_of_role|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49154": {
+      "content": "<|tool_call|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49155": {
+      "content": "<|start_of_cite|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49156": {
+      "content": "<|end_of_cite|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49157": {
+      "content": "<|start_of_plugin|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49158": {
+      "content": "<|end_of_plugin|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|start_of_role|>",
+    "<|end_of_role|>",
+    "<|tool_call|>",
+    "<|start_of_cite|>",
+    "<|end_of_cite|>",
+    "<|start_of_plugin|>",
+    "<|end_of_plugin|>"
+  ],
+  "bos_token": "<|end_of_text|>",
+  "chat_template": "{# Alias tools -> available_tools #}\n{%- if tools and not available_tools -%}\n    {%- set available_tools = tools -%}\n{%- endif -%}\n{%- if messages[0]['role'] == 'system' %}\n     {%- set system_message = messages[0]['content'] %}\n     {%- set loop_messages = messages[1:] %}\n {%- else %}\n     {%- set system_message = \"Knowledge Cutoff Date: April 2024.\nToday's Date: \" + strftime_now('%B %d, %Y') + \".\nYou are Granite, developed by IBM.\" %}\n     {%- if available_tools and documents %}\n         {%- set system_message = system_message + \" You are a helpful assistant with access to the following tools. When a tool is required to answer the user's query, respond only with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\nWrite the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n     {%- elif available_tools %}\n         {%- set system_message = system_message + \" You are a helpful assistant with access to the following tools. When a tool is required to answer the user's query, respond only with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\" %}\n     {%- elif documents %}\n         {%- set system_message = system_message + \" Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n    {%- elif thinking %}\n    {%- set system_message = system_message + \" You are a helpful AI assistant.\nRespond to every user query in a comprehensive and detailed way. You can write down your thoughts and reasoning process before responding. In the thought process, engage in a comprehensive cycle of analysis, summarization, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. In the response section, based on various attempts, explorations, and reflections from the thoughts section, systematically present the final solution that you deem correct. The response should summarize the thought process. Write your thoughts between <think></think> and write your response between <response></response> for each user query.\" %}\n     {%- else %}\n         {%- set system_message = system_message + \" You are a helpful AI assistant.\" %}\n     {%- endif %}\n     {%- if 'citations' in controls and documents %}\n         {%- set system_message = system_message + '\nUse the symbols <|start_of_cite|> and <|end_of_cite|> to indicate when a fact comes from a document in the search result, e.g <|start_of_cite|> {document_id: 1}my fact <|end_of_cite|> for a fact from document 1. Afterwards, list all the citations with their corresponding documents in an ordered list.' %}\n     {%- endif %}\n     {%- if 'hallucinations' in controls and documents %}\n         {%- set system_message = system_message + '\nFinally, after the response is written, include a numbered list of sentences from the response with a corresponding risk value that are hallucinated and not based in the documents.' %}\n     {%- endif %}\n     {%- set loop_messages = messages %}\n {%- endif %}\n {{- '<|start_of_role|>system<|end_of_role|>' + system_message + '<|end_of_text|>\n' }}\n {%- if available_tools %}\n     {{- '<|start_of_role|>available_tools<|end_of_role|>' }}\n     {{- available_tools | tojson(indent=4) }}\n     {{- '<|end_of_text|>\n' }}\n {%- endif %}\n {%- if documents %}\n     {%- for document in documents %}\n         {{- '<|start_of_role|>document {\"document_id\": \"' + document['doc_id'] | string + '\"}<|end_of_role|>\n' }}\n         {{- document['text'] }}\n         {{- '<|end_of_text|>\n' }}\n              {%- endfor %}\n {%- endif %}\n {%- for message in loop_messages %}\n     {{- '<|start_of_role|>' + message['role'] + '<|end_of_role|>' + message['content'] + '<|end_of_text|>\n' }}\n     {%- if loop.last and add_generation_prompt %}\n         {{- '<|start_of_role|>assistant' }}\n             {%- if controls %}\n                 {{- ' ' + controls | tojson()}}\n             {%- endif %}\n         {{- '<|end_of_role|>' }}\n     {%- endif %}\n {%- endfor %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|end_of_text|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 9223372036854775807,
+  "pad_token": "<|end_of_text|>",
+  "padding_side": "left",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|end_of_text|>",
+  "vocab_size": 49152
+}

answer_relevance_rewriter_lora/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff