Evaluating Google Gemini models in lm-eval harness for Masterarbeit
Context: 240129-1833 Writing evaluation code for my Masterarbeit
Problem: Gemini models (240202-1911 Using Google Bard to generate CBT stories for Masterarbeit) are not directly supported.
Options:
- Implement it
- Write a local proxy thing for it
- Find an existing local proxy thing
LiteLLM
Basics
from litellm import completion
import os
b = breakpoint
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)
print(response)
b()
# cohere call
response = completion(model="gemini-pro", messages=messages)
print(response)
As local proxy
litellm --model gpt3.5-turbo
Runs on localhost:8000
As mentioned in the README, this works:
def run_proxy():
import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
For gemini-pro, I get
openai.RateLimitError: Error code: 429 - {
BUT I’m generating stories in the bg as well, so that would be reasonable.
Benchmark LLMs - LM Harness, FastEval, Flask | liteLLM
export OPENAI_API_BASE=http://0.0.0.0:8000
python3 -m lm_eval \
--model openai-completions \
--model_args engine=davinci \
--task crows_pairs_english_age
I think it ignores the env variable
openai.NotFoundError: Error code: 404 - {'error': {'message': 'This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?', 'type': 'invalid_request_error', 'param': 'model', 'code': None}}
Feels relevant:Add Logits to OpenAI ChatCompletions model · Issue #1196 · EleutherAI/lm-evaluation-harness
This is the model implementation in lm-eval: lm-evaluation-harness/lm_eval/models/openai_completions.py at main · EleutherAI/lm-evaluation-harness
This runs but again ignores my proxy
python3 -m lm_eval --tasks low_test --model openai-chat-completions --model_args base_url=http://0.0.0.0:8000 --include ./resources --model_args model=gpt-3.5-turbo
Another ignored proxy, but — oh damn! a nice value for letters in words by gpt3!
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|--------|------:|------|-----:|-----------|-----:|---|-----:|
|low_test| 1|none | 3|exact_match|0.7222|± |0.1086|
Anyway generation done, new gemini attempt, still:
litellm.llms.vertex_ai.VertexAIError: Your default credentials were not found. To set up Application Default Credentials, see https://cloud.google.com/docs/authentication/external/set-up-adc for more information.
Gemini - Google AI Studio | liteLLM: My bad, I needed the gemini/ part. This works for basic proxying!
> litellm --model "gemini/gemini-pro"
Now again back to eval-lm.
THIS WORKED! Again skipped bits because safety but still
> python3 -m lm_eval --tasks low_test --model local-chat-completions --model_args base_url=http://0.0.0.0:8000 --include ./resources
OK! So next steps:
- find a way to configure it through config, include safety bits