answer_relevance_classifier_lora/README.md ADDED
@@ -0,0 +1,363 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ library_name: peft
7
+ library_name: transformers
8
+ ---
9
+
10
+ # Intrinsics for Answer Relevance Classifier
11
+
12
+ ## Model Summary
13
+ This is a RAG-specific intrinsic for answer relevance classification task.
14
+ The model takes as input a multi-turn conversation ending with assistant response,
15
+ and provides a classification of whether the assistant's response is relevant to the
16
+ user's final inquiry, as well as categorization of the relevance and reasoning for the conclusions.
17
+
18
+
19
+ We provide two intrinsics implemented as LoRA adapters (LoRA/aLoRA) trained over
20
+ Granite-3.3-2b-instruct, Granite-3.3-8b-instruct.
21
+
22
+ - **Developer:** IBM Research
23
+ - **Model type:** LoRA and aLoRA adapter for
24
+ [ibm-granite/granite-3.3-2b-instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct),
25
+ [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
26
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
27
+
28
+ ## Intended use
29
+ This rag specific intrinsics is intended to be used to post-process the generated assistant response.
30
+
31
+ - The binary classification of relevance can be used to determine if the assistance response is suitable
32
+ to be given to the user, or a rewrite to a more relevant response is necessary.
33
+ - The category and the analysis providing reasoning for the conclusion can be used to
34
+ be incorporated into prompt for the answer relevance rewriter, indicating specific directions
35
+ the rewrite must take to overcome the perceived deficiency in relevance.
36
+
37
+ **Model input**: The input to the answer relevance classifier intrinsic is an
38
+ OpenAI-compatible chat completion request, containing a list of conversation
39
+ turns that can alternate between the `user` and `assistant` role and ending with
40
+ a `assistant` turn.
41
+
42
+ **Model output**: The output of the answer relevance classifier intrinsic is the result of the
43
+ original chat completion request formatted as a JSON object of the following schema
44
+
45
+ {
46
+ answer_relevance_analysis: <Free text analysis of whether and in which ways the assistant response is relevant or not>
47
+ answer_relevance_category: <One of a set of labels>
48
+ answer_relevance_likelihood: <float between 0.0 and 1.0>
49
+ }
50
+
51
+ The set of labels for `answer_relevance_category` are:
52
+ "Pertinent",
53
+ "Pertinent with relevant extra",
54
+ "Excessive unnecessary information",
55
+ "Unduly restrictive",
56
+ "Too vague or generic",
57
+ "Contextual misalignment",
58
+ "Misinterpreted inquiry",
59
+ "No attempt"
60
+
61
+ Please see the code snippets in the Quickstart Example section below for
62
+ examples that illustrate the intrinsic's input/output.
63
+
64
+ ## Quickstart Example
65
+
66
+ To run the answer relevance classifier intrinsics through granite-common, you can either (a)
67
+ use an OpenAI-compatible inference backend, such as vLLM or (b) use the Hugging
68
+ Face transformers library. We provide below instructions for each of the two
69
+ approaches. Note that running inference using vLLM or another scalable
70
+ OpenAI-compatible inference backend should be significantly faster than using
71
+ the Hugging Face transformers library directly.
72
+
73
+ ### Using an OpenAI-Compatible Inference Backend
74
+
75
+ To run the intrinsic using an OpenAI-compatible inference backend, such as vLLM,
76
+ follow the steps below.
77
+
78
+ 1. Install the granite-common library:
79
+
80
+ pip install git+https://github.com/ibm-granite/granite-common.git
81
+ pip install granite_common[nltk]
82
+
83
+ 2. Install the Hugging Face CLI:
84
+
85
+ pip install -U "huggingface_hub[cli]"
86
+
87
+ 3. Install vLLM:
88
+
89
+ pip install vllm
90
+
91
+ 4. Download the intrinsics library:
92
+
93
+ hf download ibm-granite/rag-intrinsics-lib --local-dir ./rag-intrinsics-lib
94
+
95
+ 5. Edit the vLLM startup script found in `./rag-intrinsics-lib/run_vllm.sh`
96
+ using your favorite editor:
97
+
98
+ Edit the constants `BASE_MODEL_NAME` and `BASE_MODEL_ORG` depending on the
99
+ base model on which the desired LoRA adapter has been trained. Optionally,
100
+ edit the constant `PORT` to change the port on which vLLM will run. Save the
101
+ modified file and exit the editor.
102
+
103
+ 6. Start vLLM through the startup script. The first time you run the script,
104
+ you may have to change the permissions to allow execution:
105
+
106
+ cd rag-intrinsics-lib
107
+ chmod u+x ./run_vllm.sh
108
+ ./run_vllm.sh &
109
+
110
+ 7. Run the following code snippet:
111
+
112
+ import json
113
+ import openai
114
+ import granite_common
115
+
116
+ intrinsic_name = "answer_relevance_classifier"
117
+
118
+ # Change the following constant to select a different base model
119
+ base_model_name = "granite-3.3-8b-instruct"
120
+
121
+ # Change the following constants as needed to reflect the location of the vLLM server
122
+ # The selected port should be identical to the one you specified in the vLLM startup script
123
+ openai_base_url = "http://localhost:55555/v1"
124
+ openai_api_key = "rag_intrinsics_1234"
125
+
126
+ # Fetch IO configuration file from Hugging Face Hub
127
+ io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
128
+ intrinsic_name, base_model_name
129
+ )
130
+
131
+ # Instantiate input/output processors
132
+ rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
133
+ result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
134
+
135
+ # Sample request
136
+ request_json = {
137
+ "messages": [
138
+ {
139
+ "role": "user",
140
+ "content": "Who attended the meeting?"
141
+ },
142
+ {
143
+ "role": "assistant",
144
+ "content": "Many people attended the meeting."
145
+ }
146
+ ],
147
+ "extra_body": {
148
+ "documents": [
149
+ {
150
+ "doc_id": "1",
151
+ "text": "Meeting attendees: Alice, Bob, Carol."
152
+ },
153
+ {
154
+ "doc_id": "2",
155
+ "text": "Meeting time: 9:00 am to 11:00 am."
156
+ }
157
+ ]
158
+ }
159
+ }
160
+
161
+ # Add other parameters
162
+ request_json["model"] = intrinsic_name
163
+ request_json["temperature"] = 0.0
164
+
165
+ # Apply input processor
166
+ intrinsic_kwargs = {}
167
+ rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
168
+
169
+ # Run inference
170
+ client = openai.OpenAI(base_url=openai_base_url, api_key=openai_api_key)
171
+ chat_completion = client.chat.completions.create(**rewritten_request.model_dump())
172
+
173
+ # Apply output processor
174
+ processed_chat_completion = result_processor.transform(
175
+ chat_completion, rewritten_request
176
+ )
177
+
178
+ # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
179
+ parsed_contents = json.loads(processed_chat_completion.choices[0].message.content)
180
+ print("JSON output:")
181
+ print(json.dumps(parsed_contents, indent=2))
182
+
183
+ ### Using the Hugging Face Transformers Library
184
+
185
+ To run the intrinsic using the Hugging Face transformers library directly,
186
+ follow the steps below.
187
+
188
+ 1. Install the granite-common library:
189
+
190
+ pip install git+https://github.com/ibm-granite/granite-common.git
191
+ pip install granite_common[nltk]
192
+
193
+ 2. Install the Hugging Face CLI:
194
+
195
+ pip install -U "huggingface_hub[cli]"
196
+
197
+ 3. Install PEFT:
198
+
199
+ pip install peft
200
+
201
+ 4. Install xgrammar:
202
+
203
+ pip install xgrammar
204
+
205
+ 5. Run the following code snippet:
206
+
207
+ import json
208
+ import granite_common.util
209
+ import peft
210
+
211
+ intrinsic_name = "answer_relevance_classifier"
212
+
213
+ # Change the following constant to select a different base model
214
+ base_model_name = "granite-3.3-8b-instruct"
215
+
216
+ use_cuda = True # Set to False to use default PyTorch device for this machine + model
217
+
218
+ # Fetch IO configuration file from Hugging Face Hub
219
+ io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
220
+ intrinsic_name, base_model_name
221
+ )
222
+
223
+ # Fetch LoRA directory from Hugging Face Hub
224
+ lora_dir = granite_common.intrinsics.util.obtain_lora(
225
+ intrinsic_name, base_model_name
226
+ )
227
+
228
+ # Instantiate input/output processors
229
+ rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
230
+ result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
231
+
232
+ # Sample request
233
+ request_json = {
234
+ "messages": [
235
+ {
236
+ "role": "user",
237
+ "content": "Who attended the meeting?"
238
+ },
239
+ {
240
+ "role": "assistant",
241
+ "content": "Many people attended the meeting."
242
+ }
243
+ ],
244
+ "extra_body": {
245
+ "documents": [
246
+ {
247
+ "doc_id": "1",
248
+ "text": "Meeting attendees: Alice, Bob, Carol."
249
+ },
250
+ {
251
+ "doc_id": "2",
252
+ "text": "Meeting time: 9:00 am to 11:00 am."
253
+ }
254
+ ]
255
+ }
256
+ }
257
+
258
+ # Add additional parameters
259
+ request_json["model"] = intrinsic_name
260
+ request_json["temperature"] = 0.0
261
+
262
+ # Apply input processor
263
+ intrinsic_kwargs = {}
264
+ rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
265
+
266
+ # Load the base model and merge LoRA weights
267
+ model, tokenizer = granite_common.util.load_transformers_lora(lora_dir)
268
+ if use_cuda:
269
+ model = model.cuda()
270
+
271
+ # Convert the chat completion request into a the Transformers library's proprietary
272
+ # format.
273
+ generate_input, other_input = (
274
+ granite_common.util.chat_completion_request_to_transformers_inputs(
275
+ rewritten_request,
276
+ tokenizer,
277
+ model,
278
+ )
279
+ )
280
+
281
+ # Use the Transformers library's APIs to generate one or more completions,
282
+ # then convert those completions into OpenAI-compatible chat completion
283
+ responses = granite_common.util.generate_with_transformers(
284
+ tokenizer, model, generate_input, other_input
285
+ )
286
+
287
+ # Apply output processor
288
+ transformed_responses = result_processor.transform(responses, rewritten_request)
289
+
290
+ # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
291
+ parsed_contents = json.loads(transformed_responses.choices[0].message.content)
292
+ print("JSON output:")
293
+ print(json.dumps(parsed_contents, indent=2))
294
+
295
+ ## Training Details
296
+
297
+ ### Training Data
298
+
299
+ The training data is created in the following process
300
+ 1. Take the synthetic rag-data-granite dataset, consisting of conversations between user and assistant.
301
+ 2. Replace the assistant response by running granite-3.2-intrinsics at temperature 1.0.
302
+ 3. Produce answer_relevance_classifier target output using mixtral-large with prompts with in-context examples.
303
+ The conversation created in steps 1 and 2 are taken as training input. The json string from step 3
304
+ is taken as train target output.
305
+
306
+ #### Training Hyperparameters
307
+
308
+ The LoRA adapter was fine-tuned using PEFT under the following regime: rank =
309
+ 32, learning rate = 3.0e-06, number of epochs = 50.
310
+
311
+ ## Evaluation
312
+
313
+ ### Answer Relevance Classifier
314
+
315
+ We evaluated the model on test data set generated by the same procedure as the training process,
316
+ using GPT-4o as judge.
317
+
318
+
319
+ The following table presents results comparing baselines and frontier models
320
+ on the answer relevance classification task. The LoRAs perform on par with frontier models
321
+ of much larger size and outperforms frontier models of comparable size.
322
+
323
+ | | Not relevant | | | Relevant | | |
324
+ |:------------------------|:----------|:-------|:------|:----------|:-------|:------|
325
+ | | precision | recall | f1 | precision | recall | f1 |
326
+ | mixtral-8x22b-v0.1 | 0.934 | 0.592 | 0.725 | 0.886 | 0.880 | 0.883 |
327
+ | llama-3.3-70b | 0.895 | 0.829 | 0.861 | 0.898 | 0.939 | 0.918 |
328
+ | gpt-oss-20b | 0.747 | 0.745 | 0.746 | 0.969 | 0.782 | 0.865 |
329
+ | gpt-4o | 0.775 | 0.945 | 0.852 | 0.974 | 0.690 | 0.808 |
330
+ | gpt-4o-mini | 0.818 | 0.921 | 0.866 | 0.948 | 0.872 | 0.908 |
331
+ | | | | | | | |
332
+ | granite-3.3-2b/lora | 0.743 | 0.861 | 0.798 | 0.909 | 0.806 | 0.855 |
333
+ | granite-3.3-2b/alora | 0.761 | 0.821 | 0.790 | 0.894 | 0.833 | 0.862 |
334
+ | granite-3.3-8b/lora | 0.783 | 0.900 | 0.837 | 0.931 | 0.842 | 0.884 |
335
+ | granite-3.3-8b/alora | 0.793 | 0.879 | 0.834 | 0.919 | 0.856 | 0.886 |
336
+
337
+
338
+ ### Comparing the Answer Relevance Classifier Intrinsics vs. Vanilla Granite Models
339
+
340
+ We compare the performance of Granite 3.3-2b, Granite 3.3-8b Instruct
341
+ vs. answer relevance classifier intrinsics implemented as LoRA adapters.
342
+ It is seen that the LoRAs significantly out perform the base models.
343
+
344
+ | | Not relevant | | | Relevant | | |
345
+ |:------------------------|:----------|:-------|:------|:----------|:-------|:------|
346
+ | | precision | recall | f1 | precision | recall | f1 |
347
+ | granite-3.3-2b | | | | | | |
348
+ | granite-3.3-2b/lora | 0.743 | 0.861 | 0.798 | 0.909 | 0.806 | 0.855 |
349
+ | granite-3.3-2b/alora | 0.761 | 0.821 | 0.790 | 0.894 | 0.833 | 0.862 |
350
+ | | | | | | | |
351
+ | granite-3.3-8b | 0.798 | 0.542 | 0.646 | 0.813 | 0.770 | 0.791 |
352
+ | granite-3.3-8b/lora | 0.783 | 0.900 | 0.837 | 0.931 | 0.842 | 0.884 |
353
+ | granite-3.3-8b/alora | 0.793 | 0.879 | 0.834 | 0.919 | 0.856 | 0.886 |
354
+
355
+
356
+
357
+ ## Model Card Authors
358
+
359
+ [Huaiyu Zhu](mailto:[email protected])
360
+
361
+ ### Framework versions
362
+
363
+ - PEFT 0.14.0
answer_relevance_classifier_lora/adapter_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "ibm-granite/granite-3.3-8b-instruct",
5
+ "bias": "none",
6
+ "eva_config": null,
7
+ "exclude_modules": null,
8
+ "fan_in_fan_out": false,
9
+ "inference_mode": true,
10
+ "init_lora_weights": true,
11
+ "layer_replication": null,
12
+ "layers_pattern": null,
13
+ "layers_to_transform": null,
14
+ "loftq_config": {},
15
+ "lora_alpha": 32,
16
+ "lora_bias": false,
17
+ "lora_dropout": 0.05,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 32,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "k_proj",
27
+ "q_proj",
28
+ "v_proj"
29
+ ],
30
+ "task_type": "CAUSAL_LM",
31
+ "use_dora": false,
32
+ "use_rslora": false
33
+ }
answer_relevance_classifier_lora/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27c8cf16a5ea98e110df54e433c43432ee8f18373f673293989e1deacee29cd7
3
+ size 94404160
answer_relevance_classifier_lora/added_tokens.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<|end_of_cite|>": 49156,
3
+ "<|end_of_plugin|>": 49158,
4
+ "<|end_of_role|>": 49153,
5
+ "<|start_of_cite|>": 49155,
6
+ "<|start_of_plugin|>": 49157,
7
+ "<|start_of_role|>": 49152,
8
+ "<|tool_call|>": 49154
9
+ }
answer_relevance_classifier_lora/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
answer_relevance_classifier_lora/special_tokens_map.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|start_of_role|>",
4
+ "<|end_of_role|>",
5
+ "<|tool_call|>",
6
+ "<|start_of_cite|>",
7
+ "<|end_of_cite|>",
8
+ "<|start_of_plugin|>",
9
+ "<|end_of_plugin|>"
10
+ ],
11
+ "bos_token": {
12
+ "content": "<|end_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "eos_token": {
19
+ "content": "<|end_of_text|>",
20
+ "lstrip": false,
21
+ "normalized": false,
22
+ "rstrip": false,
23
+ "single_word": false
24
+ },
25
+ "pad_token": {
26
+ "content": "<|end_of_text|>",
27
+ "lstrip": false,
28
+ "normalized": false,
29
+ "rstrip": false,
30
+ "single_word": false
31
+ },
32
+ "unk_token": {
33
+ "content": "<|end_of_text|>",
34
+ "lstrip": false,
35
+ "normalized": false,
36
+ "rstrip": false,
37
+ "single_word": false
38
+ }
39
+ }
answer_relevance_classifier_lora/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
answer_relevance_classifier_lora/tokenizer_config.json ADDED
@@ -0,0 +1,235 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<|end_of_text|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<fim_prefix>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "<fim_middle>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "3": {
30
+ "content": "<fim_suffix>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "4": {
38
+ "content": "<fim_pad>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "5": {
46
+ "content": "<filename>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "6": {
54
+ "content": "<gh_stars>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "7": {
62
+ "content": "<issue_start>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "8": {
70
+ "content": "<issue_comment>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "9": {
78
+ "content": "<issue_closed>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "10": {
86
+ "content": "<jupyter_start>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "11": {
94
+ "content": "<jupyter_text>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "12": {
102
+ "content": "<jupyter_code>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "13": {
110
+ "content": "<jupyter_output>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "14": {
118
+ "content": "<empty_output>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": true
124
+ },
125
+ "15": {
126
+ "content": "<commit_before>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": true
132
+ },
133
+ "16": {
134
+ "content": "<commit_msg>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": true
140
+ },
141
+ "17": {
142
+ "content": "<commit_after>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": true
148
+ },
149
+ "18": {
150
+ "content": "<reponame>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": true
156
+ },
157
+ "49152": {
158
+ "content": "<|start_of_role|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": true
164
+ },
165
+ "49153": {
166
+ "content": "<|end_of_role|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": true
172
+ },
173
+ "49154": {
174
+ "content": "<|tool_call|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": true
180
+ },
181
+ "49155": {
182
+ "content": "<|start_of_cite|>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": true
188
+ },
189
+ "49156": {
190
+ "content": "<|end_of_cite|>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": true
196
+ },
197
+ "49157": {
198
+ "content": "<|start_of_plugin|>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": true
204
+ },
205
+ "49158": {
206
+ "content": "<|end_of_plugin|>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": true
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|start_of_role|>",
216
+ "<|end_of_role|>",
217
+ "<|tool_call|>",
218
+ "<|start_of_cite|>",
219
+ "<|end_of_cite|>",
220
+ "<|start_of_plugin|>",
221
+ "<|end_of_plugin|>"
222
+ ],
223
+ "bos_token": "<|end_of_text|>",
224
+ "chat_template": "{# Alias tools -> available_tools #}\n{%- if tools and not available_tools -%}\n {%- set available_tools = tools -%}\n{%- endif -%}\n{%- if messages[0]['role'] == 'system' %}\n {%- set system_message = messages[0]['content'] %}\n {%- set loop_messages = messages[1:] %}\n {%- else %}\n {%- set system_message = \"Knowledge Cutoff Date: April 2024.\nToday's Date: \" + strftime_now('%B %d, %Y') + \".\nYou are Granite, developed by IBM.\" %}\n {%- if available_tools and documents %}\n {%- set system_message = system_message + \" You are a helpful assistant with access to the following tools. When a tool is required to answer the user's query, respond only with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\nWrite the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n {%- elif available_tools %}\n {%- set system_message = system_message + \" You are a helpful assistant with access to the following tools. When a tool is required to answer the user's query, respond only with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\" %}\n {%- elif documents %}\n {%- set system_message = system_message + \" Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n {%- elif thinking %}\n {%- set system_message = system_message + \" You are a helpful AI assistant.\nRespond to every user query in a comprehensive and detailed way. You can write down your thoughts and reasoning process before responding. In the thought process, engage in a comprehensive cycle of analysis, summarization, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. In the response section, based on various attempts, explorations, and reflections from the thoughts section, systematically present the final solution that you deem correct. The response should summarize the thought process. Write your thoughts between <think></think> and write your response between <response></response> for each user query.\" %}\n {%- else %}\n {%- set system_message = system_message + \" You are a helpful AI assistant.\" %}\n {%- endif %}\n {%- if 'citations' in controls and documents %}\n {%- set system_message = system_message + '\nUse the symbols <|start_of_cite|> and <|end_of_cite|> to indicate when a fact comes from a document in the search result, e.g <|start_of_cite|> {document_id: 1}my fact <|end_of_cite|> for a fact from document 1. Afterwards, list all the citations with their corresponding documents in an ordered list.' %}\n {%- endif %}\n {%- if 'hallucinations' in controls and documents %}\n {%- set system_message = system_message + '\nFinally, after the response is written, include a numbered list of sentences from the response with a corresponding risk value that are hallucinated and not based in the documents.' %}\n {%- endif %}\n {%- set loop_messages = messages %}\n {%- endif %}\n {{- '<|start_of_role|>system<|end_of_role|>' + system_message + '<|end_of_text|>\n' }}\n {%- if available_tools %}\n {{- '<|start_of_role|>available_tools<|end_of_role|>' }}\n {{- available_tools | tojson(indent=4) }}\n {{- '<|end_of_text|>\n' }}\n {%- endif %}\n {%- if documents %}\n {%- for document in documents %}\n {{- '<|start_of_role|>document {\"document_id\": \"' + document['doc_id'] | string + '\"}<|end_of_role|>\n' }}\n {{- document['text'] }}\n {{- '<|end_of_text|>\n' }}\n {%- endfor %}\n {%- endif %}\n {%- for message in loop_messages %}\n {{- '<|start_of_role|>' + message['role'] + '<|end_of_role|>' + message['content'] + '<|end_of_text|>\n' }}\n {%- if loop.last and add_generation_prompt %}\n {{- '<|start_of_role|>assistant' }}\n {%- if controls %}\n {{- ' ' + controls | tojson()}}\n {%- endif %}\n {{- '<|end_of_role|>' }}\n {%- endif %}\n {%- endfor %}",
225
+ "clean_up_tokenization_spaces": true,
226
+ "eos_token": "<|end_of_text|>",
227
+ "errors": "replace",
228
+ "extra_special_tokens": {},
229
+ "model_max_length": 9223372036854775807,
230
+ "pad_token": "<|end_of_text|>",
231
+ "padding_side": "left",
232
+ "tokenizer_class": "GPT2Tokenizer",
233
+ "unk_token": "<|end_of_text|>",
234
+ "vocab_size": 49152
235
+ }
answer_relevance_classifier_lora/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
answer_relevance_rewriter_lora/README.md ADDED
@@ -0,0 +1,372 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ library_name: peft
7
+ library_name: transformers
8
+ ---
9
+
10
+ # Intrinsics for Answer Relevance Rewriter
11
+
12
+ ## Model Summary
13
+ This is a RAG-specific intrinsic for answer relevance rewrite task.
14
+ The model takes as input the chat completion from answer relevance classifier output
15
+ consisting of conversation as well as answer_relevance_classification, together with grounding documents,
16
+ and provides a rewritten assistant response that is more relevant to the user's final inquiry.
17
+
18
+
19
+ We provide two intrinsics implemented as LoRA adapters (LoRA/aLoRA) trained over
20
+ Granite-3.3-2b-instruct, Granite-3.3-8b-instruct.
21
+
22
+ - **Developer:** IBM Research
23
+ - **Model type:** LoRA and aLoRA adapter for
24
+ [ibm-granite/granite-3.3-2b-instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct),
25
+ [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
26
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
27
+
28
+ ## Intended use
29
+ This rag specific intrinsics is intended to be used to post-process the generated assistant response.
30
+ It should be used following the answer relevance classifier intrinsic, and should be applied to
31
+ the cases where the `answer_relevance_likelihood` is below a certain threshold according to application criteria.
32
+
33
+ For cases where the assistant answer is deemed not relevant (where `answer_relevance_likelihood` is below a
34
+ given threshold), the answer relevance rewriter intrinsic can be used to rewrite the assistant response
35
+ into a more relevant response. It takes as input the chat completion
36
+ from answer relevance classifier output and the grounding documents. Its output is of the form
37
+
38
+ {
39
+ answer_relevance_rewrite: <Rewritten response>
40
+ }
41
+
42
+ The rewriter is instructed to only correct deficiencies in relevance as identified by the classifier,
43
+ and ensure the rewritten response is grounded in the conversation and given documents.
44
+
45
+ **Model input**: The input to the answer relevance rewriter intrinsic is an
46
+ OpenAI-compatible chat completion request, containing a list of conversation
47
+ turns that can alternate between the `user` and `assistant` role and ending with
48
+ a `assistant` turn, plus two additional turns:
49
+ - A conversation between user and assistant ending with assistant response
50
+ - An additional user turn with content "answer_relevance"
51
+
52
+ **Model output**: The output of the answer relevance rewriter intrinsic is the result of the
53
+ original chat completion request formatted as a JSON object of the following schema
54
+
55
+ {
56
+ answer_relevance_rewrite: <Rewritten response>
57
+ }
58
+
59
+ Please see the code snippets in the Quickstart Example section below for
60
+ examples that illustrate the intrinsic's input/output.
61
+
62
+ ## Quickstart Example
63
+
64
+ To run the answer relevance rewriter intrinsics through granite-common, you can either (a)
65
+ use an OpenAI-compatible inference backend, such as vLLM or (b) use the Hugging
66
+ Face transformers library. We provide instructions for each of the two
67
+ approaches below. Note that running inference using vLLM or another scalable
68
+ OpenAI-compatible inference backend should be significantly faster than using
69
+ the Hugging Face transformers library directly.
70
+
71
+ ### Using an OpenAI-Compatible Inference Backend
72
+
73
+ To run the intrinsic using an OpenAI-compatible inference backend, such as vLLM,
74
+ follow the steps below.
75
+
76
+ 1. Install the granite-common library:
77
+
78
+ pip install git+https://github.com/ibm-granite/granite-common.git
79
+ pip install granite_common[nltk]
80
+
81
+ 2. Install the Hugging Face CLI:
82
+
83
+ pip install -U "huggingface_hub[cli]"
84
+
85
+ 3. Install vLLM:
86
+
87
+ pip install vllm
88
+
89
+ 4. Download the intrinsics library:
90
+
91
+ hf download ibm-granite/rag-intrinsics-lib --local-dir ./rag-intrinsics-lib
92
+
93
+ 5. Edit the vLLM startup script found in `./rag-intrinsics-lib/run_vllm.sh`
94
+ using your favorite editor:
95
+
96
+ Edit the constants `BASE_MODEL_NAME` and `BASE_MODEL_ORG` depending on the
97
+ base model on which the desired LoRA adapter has been trained. Optionally,
98
+ edit the constant `PORT` to change the port on which vLLM will run. Save the
99
+ modified file and exit the editor.
100
+
101
+ 6. Start vLLM through the startup script. The first time you run the script,
102
+ you may have to change the permissions to allow execution:
103
+
104
+ cd rag-intrinsics-lib
105
+ chmod u+x ./run_vllm.sh
106
+ ./run_vllm.sh &
107
+
108
+ 7. Run the following code snippet:
109
+
110
+ import json
111
+ import openai
112
+ import granite_common
113
+
114
+ intrinsic_name = "answer_relevance_classifier"
115
+
116
+ # Change the following constant to select a different base model
117
+ base_model_name = "granite-3.3-8b-instruct"
118
+
119
+ # Change the following constants as needed to reflect the location of the vLLM server
120
+ # The selected port should be identical to the one you specified in the vLLM startup script
121
+ openai_base_url = "http://localhost:55555/v1"
122
+ openai_api_key = "rag_intrinsics_1234"
123
+
124
+ # Fetch IO configuration file from Hugging Face Hub
125
+ io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
126
+ intrinsic_name, base_model_name
127
+ )
128
+
129
+ # Instantiate input/output processors
130
+ rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
131
+ result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
132
+
133
+ # Sample request
134
+ request_json = {
135
+ "messages": [
136
+ {
137
+ "role": "user",
138
+ "content": "Who attended the meeting?"
139
+ },
140
+ {
141
+ "role": "assistant",
142
+ "content": "Many people attended the meeting."
143
+ }
144
+ ],
145
+ "extra_body": {
146
+ "documents": [
147
+ {
148
+ "doc_id": "1",
149
+ "text": "Meeting attendees: Alice, Bob, Carol."
150
+ },
151
+ {
152
+ "doc_id": "2",
153
+ "text": "Meeting time: 9:00 am to 11:00 am."
154
+ }
155
+ ]
156
+ }
157
+ }
158
+
159
+ # Add other parameters
160
+ request_json["model"] = intrinsic_name
161
+ request_json["temperature"] = 0.0
162
+
163
+ # Apply input processor
164
+ intrinsic_kwargs = {
165
+ "answer_relevance_category": "No attempt",
166
+ "answer_relevance_analysis": "The inquiry asks for the attendees of the meeting. The response provides a vague and non-specific answer that does not address the inquiry.",
167
+ "correction_method": "providing a relevant response if an inquiry should be answered, or providing a short response if the last user utterance contains no inquiry"
168
+ }
169
+ rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
170
+
171
+ # Run inference
172
+ client = openai.OpenAI(base_url=openai_base_url, api_key=openai_api_key)
173
+ chat_completion = client.chat.completions.create(**rewritten_request.model_dump())
174
+
175
+ # Apply output processor
176
+ processed_chat_completion = result_processor.transform(
177
+ chat_completion, rewritten_request
178
+ )
179
+
180
+ # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
181
+ parsed_contents = json.loads(processed_chat_completion.choices[0].message.content)
182
+ print("JSON output:")
183
+ print(json.dumps(parsed_contents, indent=2))
184
+
185
+ ### Using the Hugging Face Transformers Library
186
+
187
+ To run the intrinsic using the Hugging Face transformers library directly,
188
+ follow the steps below.
189
+
190
+ 1. Install the granite-common library:
191
+
192
+ pip install git+https://github.com/ibm-granite/granite-common.git
193
+ pip install granite_common[nltk]
194
+
195
+ 2. Install the Hugging Face CLI:
196
+
197
+ pip install -U "huggingface_hub[cli]"
198
+
199
+ 3. Install PEFT:
200
+
201
+ pip install peft
202
+
203
+ 4. Install xgrammar:
204
+
205
+ pip install xgrammar
206
+
207
+ 5. Run the following code snippet:
208
+
209
+ import json
210
+ import granite_common.util
211
+ import peft
212
+
213
+ intrinsic_name = "answer_relevance_rewriter"
214
+
215
+ # Change the following constant to select a different base model
216
+ base_model_name = "granite-3.3-8b-instruct"
217
+
218
+ use_cuda = True # Set to False to use default PyTorch device for this machine + model
219
+
220
+ # Fetch IO configuration file from Hugging Face Hub
221
+ io_yaml_file = granite_common.intrinsics.util.obtain_io_yaml(
222
+ intrinsic_name, base_model_name
223
+ )
224
+
225
+ # Fetch LoRA directory from Hugging Face Hub
226
+ lora_dir = granite_common.intrinsics.util.obtain_lora(
227
+ intrinsic_name, base_model_name
228
+ )
229
+
230
+ # Instantiate input/output processors
231
+ rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
232
+ result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
233
+
234
+ # Sample request
235
+ request_json = {
236
+ "messages": [
237
+ {
238
+ "role": "user",
239
+ "content": "Who attended the meeting?"
240
+ },
241
+ {
242
+ "role": "assistant",
243
+ "content": "Many people attended the meeting."
244
+ }
245
+ ],
246
+ "extra_body": {
247
+ "documents": [
248
+ {
249
+ "doc_id": "1",
250
+ "text": "Meeting attendees: Alice, Bob, Carol."
251
+ },
252
+ {
253
+ "doc_id": "2",
254
+ "text": "Meeting time: 9:00 am to 11:00 am."
255
+ }
256
+ ]
257
+ }
258
+ }
259
+
260
+ # Add additional parameters
261
+ request_json["model"] = intrinsic_name
262
+ request_json["temperature"] = 0.0
263
+
264
+ # Apply input processor
265
+ intrinsic_kwargs = {
266
+ "answer_relevance_category": "No attempt",
267
+ "answer_relevance_analysis": "The inquiry asks for the attendees of the meeting. The response provides a vague and non-specific answer that does not address the inquiry.",
268
+ "correction_method": "providing a relevant response if an inquiry should be answered, or providing a short response if the last user utterance contains no inquiry"
269
+ }
270
+ rewritten_request = rewriter.transform(request_json, **intrinsic_kwargs)
271
+
272
+ # Load the base model and merge LoRA weights
273
+ model, tokenizer = granite_common.util.load_transformers_lora(lora_dir)
274
+ if use_cuda:
275
+ model = model.cuda()
276
+
277
+ # Convert the chat completion request into a the Transformers library's proprietary
278
+ # format.
279
+ generate_input, other_input = (
280
+ granite_common.util.chat_completion_request_to_transformers_inputs(
281
+ rewritten_request,
282
+ tokenizer,
283
+ model,
284
+ )
285
+ )
286
+
287
+ # Use the Transformers library's APIs to generate one or more completions,
288
+ # then convert those completions into OpenAI-compatible chat completion
289
+ responses = granite_common.util.generate_with_transformers(
290
+ tokenizer, model, generate_input, other_input
291
+ )
292
+
293
+ # Apply output processor
294
+ transformed_responses = result_processor.transform(responses, rewritten_request)
295
+
296
+ # Verify that the contents of the completion is valid JSON and pretty-print the JSON.
297
+ parsed_contents = json.loads(transformed_responses.choices[0].message.content)
298
+ print("JSON output:")
299
+ print(json.dumps(parsed_contents, indent=2))
300
+
301
+ ## Training Details
302
+
303
+ ### Training Data
304
+
305
+ The training data is created in the following process
306
+ 1. Take the synthetic rag-data-granite dataset, consisting of conversations between user and assistant.
307
+ 2. Replace the assistant response by running granite-3.2-intrinsics at temperature 1.0.
308
+ 3. Produce answer_relevance_rewriter target output using mixtral-large with prompts with in-context examples.
309
+ The conversation created in steps 1 and 2 are taken as training input. The json string from step 3
310
+ is taken as train target output.
311
+
312
+ #### Training Hyperparameters
313
+
314
+ The LoRA adapter was fine-tuned using PEFT under the following regime: rank =
315
+ 32, learning rate = 1.0e-04, number of epochs = 5.
316
+
317
+ ## Evaluation
318
+
319
+ ### Answer Relevance Rewriter
320
+
321
+ We evaluated the model on test data set generated by the same procedure as the training process,
322
+ using GPT-4o as judge.
323
+
324
+
325
+ The following table presents results comparing baselines and frontier models
326
+ on the answer relevance rewrite task. The data sets consists of those classified as irrelevant by
327
+ mixtral-large. The evaluations are first divided into two parts, those that are truly irrelevant,
328
+ for which we measure the rate of rewrite becoming relevant, and those that are false irrelevant,
329
+ for which we measure the rate of rewrite becoming irrelevant. Then the overall rate flipping
330
+ irrelevant to relevant and flipping relevant to irrelevant are calculated, and well as the net gain
331
+ of relevance and resulting final relevance.
332
+
333
+ The LoRAs out perform the best of frontier models
334
+
335
+ | | True irrelevant <br> flip to relevant | False irrelevant <br> flip to irrelevant| Overall <br> flip irrelevant <br> to relevant | Overall <br> flip relevant <br> to irrelevant| net gain | Result <br>relevance |
336
+ |:---------------------|:--------------|:---------|:------------------------------|:---------|:---------|:--------------|
337
+ | mixtral-8x22b-v0.1 | 0.416 | 0.101 | 0.286 | 0.032 | 0.254 | 0.566 |
338
+ | llama-3.3-70b | 0.804 | 0.041 | 0.554 | 0.013 | 0.541 | 0.853 |
339
+ | gpt-oss-20b | 0.902 | 0.034 | 0.621 | 0.011 | 0.610 | 0.922 |
340
+ | gpt-4o | 0.960 | 0.014 | 0.661 | 0.004 | 0.657 | 0.968 |
341
+ | gpt-4o-mini | 0.758 | 0.027 | 0.522 | 0.008 | 0.514 | 0.825 |
342
+ | | | | | | | |
343
+ | granite-3.3-2b/lora | 0.972 | 0.027 | 0.669 | 0.008 | 0.661 | 0.973 |
344
+ | granite-3.3-2b/alora | 0.972 | 0.007 | 0.669 | 0.002 | 0.667 | 0.979 |
345
+ | granite-3.3-8b/lora | 0.969 | 0.014 | 0.667 | 0.004 | 0.663 | 0.975 |
346
+ | granite-3.3-8b/alora | 0.966 | 0.027 | 0.665 | 0.008 | 0.657 | 0.968 |
347
+ | | | | | | | |
348
+
349
+ ### Comparing the Answer Relevance Rewriter Intrinsics vs. Vanilla Granite Models
350
+
351
+ We compare the performance of Granite 3.3-2b, Granite 3.3-8b Instruct
352
+ vs. answer relevance rewriter intrinsics implemented as LoRA adapters.
353
+ It is seen that the LoRAs significantly out perform the base models.
354
+ | | True irrelevant <br> flip to relevant | False irrelevant <br> flip to irrelevant| Overall <br> flip irrelevant <br> to relevant | Overall <br> flip relevant <br> to irrelevant| net gain | Result relevance |
355
+ |:---------------------|:--------------|:---------|:------------------------------|:---------|:---------|:--------------|
356
+ | granite-3.3-2b | 0.346 | 0.169 | 0.238 | 0.053 | 0.185 | 0.497 |
357
+ | granite-3.3-2b/lora | 0.972 | 0.027 | 0.669 | 0.008 | 0.661 | 0.973 |
358
+ | granite-3.3-2b/alora | 0.972 | 0.007 | 0.669 | 0.002 | 0.667 | 0.979 |
359
+ | | | | | | | |
360
+ | granite-3.3-8b | 0.266 | 0.277 | 0.183 | 0.086 | 0.097 | 0.408 |
361
+ | granite-3.3-8b/lora | 0.969 | 0.014 | 0.667 | 0.004 | 0.663 | 0.975 |
362
+ | granite-3.3-8b/alora | 0.966 | 0.027 | 0.665 | 0.008 | 0.657 | 0.968 |
363
+ | | | | | | | |
364
+
365
+
366
+ ## Model Card Authors
367
+
368
+ [Huaiyu Zhu](mailto:[email protected])
369
+
370
+ ### Framework versions
371
+
372
+ - PEFT 0.14.0
answer_relevance_rewriter_lora/adapter_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "ibm-granite/granite-3.3-8b-instruct",
5
+ "bias": "none",
6
+ "eva_config": null,
7
+ "exclude_modules": null,
8
+ "fan_in_fan_out": false,
9
+ "inference_mode": true,
10
+ "init_lora_weights": true,
11
+ "layer_replication": null,
12
+ "layers_pattern": null,
13
+ "layers_to_transform": null,
14
+ "loftq_config": {},
15
+ "lora_alpha": 32,
16
+ "lora_bias": false,
17
+ "lora_dropout": 0.05,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 32,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "q_proj",
27
+ "k_proj",
28
+ "v_proj"
29
+ ],
30
+ "task_type": "CAUSAL_LM",
31
+ "use_dora": false,
32
+ "use_rslora": false
33
+ }
answer_relevance_rewriter_lora/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:74db6518798528a2cb2ecd620825a95db228c54a87393e88a5832218e77044ce
3
+ size 94404160
answer_relevance_rewriter_lora/added_tokens.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<|end_of_cite|>": 49156,
3
+ "<|end_of_plugin|>": 49158,
4
+ "<|end_of_role|>": 49153,
5
+ "<|start_of_cite|>": 49155,
6
+ "<|start_of_plugin|>": 49157,
7
+ "<|start_of_role|>": 49152,
8
+ "<|tool_call|>": 49154
9
+ }
answer_relevance_rewriter_lora/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
answer_relevance_rewriter_lora/special_tokens_map.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|start_of_role|>",
4
+ "<|end_of_role|>",
5
+ "<|tool_call|>",
6
+ "<|start_of_cite|>",
7
+ "<|end_of_cite|>",
8
+ "<|start_of_plugin|>",
9
+ "<|end_of_plugin|>"
10
+ ],
11
+ "bos_token": {
12
+ "content": "<|end_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "eos_token": {
19
+ "content": "<|end_of_text|>",
20
+ "lstrip": false,
21
+ "normalized": false,
22
+ "rstrip": false,
23
+ "single_word": false
24
+ },
25
+ "pad_token": {
26
+ "content": "<|end_of_text|>",
27
+ "lstrip": false,
28
+ "normalized": false,
29
+ "rstrip": false,
30
+ "single_word": false
31
+ },
32
+ "unk_token": {
33
+ "content": "<|end_of_text|>",
34
+ "lstrip": false,
35
+ "normalized": false,
36
+ "rstrip": false,
37
+ "single_word": false
38
+ }
39
+ }
answer_relevance_rewriter_lora/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
answer_relevance_rewriter_lora/tokenizer_config.json ADDED
@@ -0,0 +1,235 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<|end_of_text|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<fim_prefix>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "<fim_middle>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "3": {
30
+ "content": "<fim_suffix>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "4": {
38
+ "content": "<fim_pad>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "5": {
46
+ "content": "<filename>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "6": {
54
+ "content": "<gh_stars>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "7": {
62
+ "content": "<issue_start>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "8": {
70
+ "content": "<issue_comment>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "9": {
78
+ "content": "<issue_closed>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "10": {
86
+ "content": "<jupyter_start>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "11": {
94
+ "content": "<jupyter_text>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "12": {
102
+ "content": "<jupyter_code>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "13": {
110
+ "content": "<jupyter_output>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "14": {
118
+ "content": "<empty_output>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": true
124
+ },
125
+ "15": {
126
+ "content": "<commit_before>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": true
132
+ },
133
+ "16": {
134
+ "content": "<commit_msg>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": true
140
+ },
141
+ "17": {
142
+ "content": "<commit_after>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": true
148
+ },
149
+ "18": {
150
+ "content": "<reponame>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": true
156
+ },
157
+ "49152": {
158
+ "content": "<|start_of_role|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": true
164
+ },
165
+ "49153": {
166
+ "content": "<|end_of_role|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": true
172
+ },
173
+ "49154": {
174
+ "content": "<|tool_call|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": true
180
+ },
181
+ "49155": {
182
+ "content": "<|start_of_cite|>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": true
188
+ },
189
+ "49156": {
190
+ "content": "<|end_of_cite|>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": true
196
+ },
197
+ "49157": {
198
+ "content": "<|start_of_plugin|>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": true
204
+ },
205
+ "49158": {
206
+ "content": "<|end_of_plugin|>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": true
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|start_of_role|>",
216
+ "<|end_of_role|>",
217
+ "<|tool_call|>",
218
+ "<|start_of_cite|>",
219
+ "<|end_of_cite|>",
220
+ "<|start_of_plugin|>",
221
+ "<|end_of_plugin|>"
222
+ ],
223
+ "bos_token": "<|end_of_text|>",
224
+ "chat_template": "{# Alias tools -> available_tools #}\n{%- if tools and not available_tools -%}\n {%- set available_tools = tools -%}\n{%- endif -%}\n{%- if messages[0]['role'] == 'system' %}\n {%- set system_message = messages[0]['content'] %}\n {%- set loop_messages = messages[1:] %}\n {%- else %}\n {%- set system_message = \"Knowledge Cutoff Date: April 2024.\nToday's Date: \" + strftime_now('%B %d, %Y') + \".\nYou are Granite, developed by IBM.\" %}\n {%- if available_tools and documents %}\n {%- set system_message = system_message + \" You are a helpful assistant with access to the following tools. When a tool is required to answer the user's query, respond only with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\nWrite the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n {%- elif available_tools %}\n {%- set system_message = system_message + \" You are a helpful assistant with access to the following tools. When a tool is required to answer the user's query, respond only with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\" %}\n {%- elif documents %}\n {%- set system_message = system_message + \" Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n {%- elif thinking %}\n {%- set system_message = system_message + \" You are a helpful AI assistant.\nRespond to every user query in a comprehensive and detailed way. You can write down your thoughts and reasoning process before responding. In the thought process, engage in a comprehensive cycle of analysis, summarization, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. In the response section, based on various attempts, explorations, and reflections from the thoughts section, systematically present the final solution that you deem correct. The response should summarize the thought process. Write your thoughts between <think></think> and write your response between <response></response> for each user query.\" %}\n {%- else %}\n {%- set system_message = system_message + \" You are a helpful AI assistant.\" %}\n {%- endif %}\n {%- if 'citations' in controls and documents %}\n {%- set system_message = system_message + '\nUse the symbols <|start_of_cite|> and <|end_of_cite|> to indicate when a fact comes from a document in the search result, e.g <|start_of_cite|> {document_id: 1}my fact <|end_of_cite|> for a fact from document 1. Afterwards, list all the citations with their corresponding documents in an ordered list.' %}\n {%- endif %}\n {%- if 'hallucinations' in controls and documents %}\n {%- set system_message = system_message + '\nFinally, after the response is written, include a numbered list of sentences from the response with a corresponding risk value that are hallucinated and not based in the documents.' %}\n {%- endif %}\n {%- set loop_messages = messages %}\n {%- endif %}\n {{- '<|start_of_role|>system<|end_of_role|>' + system_message + '<|end_of_text|>\n' }}\n {%- if available_tools %}\n {{- '<|start_of_role|>available_tools<|end_of_role|>' }}\n {{- available_tools | tojson(indent=4) }}\n {{- '<|end_of_text|>\n' }}\n {%- endif %}\n {%- if documents %}\n {%- for document in documents %}\n {{- '<|start_of_role|>document {\"document_id\": \"' + document['doc_id'] | string + '\"}<|end_of_role|>\n' }}\n {{- document['text'] }}\n {{- '<|end_of_text|>\n' }}\n {%- endfor %}\n {%- endif %}\n {%- for message in loop_messages %}\n {{- '<|start_of_role|>' + message['role'] + '<|end_of_role|>' + message['content'] + '<|end_of_text|>\n' }}\n {%- if loop.last and add_generation_prompt %}\n {{- '<|start_of_role|>assistant' }}\n {%- if controls %}\n {{- ' ' + controls | tojson()}}\n {%- endif %}\n {{- '<|end_of_role|>' }}\n {%- endif %}\n {%- endfor %}",
225
+ "clean_up_tokenization_spaces": true,
226
+ "eos_token": "<|end_of_text|>",
227
+ "errors": "replace",
228
+ "extra_special_tokens": {},
229
+ "model_max_length": 9223372036854775807,
230
+ "pad_token": "<|end_of_text|>",
231
+ "padding_side": "left",
232
+ "tokenizer_class": "GPT2Tokenizer",
233
+ "unk_token": "<|end_of_text|>",
234
+ "vocab_size": 49152
235
+ }
answer_relevance_rewriter_lora/vocab.json ADDED
The diff for this file is too large to render. See raw diff