In the middle of the desert you can say anything you want

06 Feb 2024

Evaluating Google Gemini models in lm-eval harness for Masterarbeit

Context: 240129-1833 Writing evaluation code for my Masterarbeit

Problem: Gemini models (240202-1911 Using Google Bard to generate CBT stories for Masterarbeit) are not directly supported.


  • Implement it
  • Write a local proxy thing for it
  • Find an existing local proxy thing

Oh nice: BerriAI/litellm: Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)



from litellm import completion
import os

b = breakpoint

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)


# cohere call
response = completion(model="gemini-pro", messages=messages)

As local proxy

litellm --model gpt3.5-turbo

Runs on localhost:8000

As mentioned in the README, this works:

def run_proxy():
    import openai # openai v1.0.0+
    client = openai.OpenAI(api_key="anything",base_url="") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
    response ="gpt-3.5-turbo", messages = [
            "role": "user",
            "content": "this is a test request, write a short poem"


For gemini-pro, I get

openai.RateLimitError: Error code: 429 - {

BUT I’m generating stories in the bg as well, so that would be reasonable.

Benchmark LLMs - LM Harness, FastEval, Flask | liteLLM


python3 -m lm_eval \
  --model openai-completions \
  --model_args engine=davinci \
  --task crows_pairs_english_age

I think it ignores the env variable

openai.NotFoundError: Error code: 404 - {'error': {'message': 'This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?', 'type': 'invalid_request_error', 'param': 'model', 'code': None}}

Feels relevant:Add Logits to OpenAI ChatCompletions model · Issue #1196 · EleutherAI/lm-evaluation-harness

This is the model implementation in lm-eval: lm-evaluation-harness/lm_eval/models/ at main · EleutherAI/lm-evaluation-harness

This runs but again ignores my proxy

python3 -m lm_eval --tasks low_test --model openai-chat-completions --model_args base_url= --include ./resources --model_args model=gpt-3.5-turbo

Another ignored proxy, but — oh damn! a nice value for letters in words by gpt3!

| Tasks  |Version|Filter|n-shot|  Metric   |Value |   |Stderr|
|low_test|      1|none  |     3|exact_match|0.7222|±  |0.1086|

Anyway generation done, new gemini attempt, still:

litellm.llms.vertex_ai.VertexAIError: Your default credentials were not found. To set up Application Default Credentials, see for more information.

Gemini - Google AI Studio | liteLLM: My bad, I needed the gemini/ part. This works for basic proxying!

> litellm --model "gemini/gemini-pro"

Now again back to eval-lm.

THIS WORKED! Again skipped bits because safety but still

> python3 -m lm_eval --tasks low_test --model local-chat-completions --model_args base_url= --include ./resources

OK! So next steps:

  • find a way to configure it through config, include safety bits
Nel mezzo del deserto posso dire tutto quello che voglio.