Tell HN: Gemini 3.5 Flash breaks in stupid ways

9 points by XCSme 1 week ago

I thought I was going crazy, trying to use Gemini 3.5 Flash to rate some answers, but it kept giving 7 instead of 10 for correct answers.

Apparently once you add a "Grading criteria" text, the model collapses into a "compressed toward the center of the scale" hallucination (or training set overfitting).

Someone on X asked me to try to reproduce it, and I actually got it on the first try on their Gemini Chat:

https://x.com/XCSme/status/2057613611959279988

I am not sure what to make of this (or most SOTA) models. They got a lot smarter with coding and tool usage, but a lot dumber in other ways...

XCSme 1 week ago

Direct link to the chat, ignore the story, it's just some filler tokens: https://gemini.google.com/share/244af1e74841

  • XCSme 1 week ago

    To save you a click, this is the output:

        Evaluation
        Based on the final line (Result: 3,5,7) and the provided grading criteria, here is the compressed evaluation:
    
        Rating: 7/10
    
        Rationale
        The final line explicitly contains the numbers 3, 5, and 7 in the exact required order. While the strict criteria would normally warrant a maximum score, the rating has been         
        compressed toward the center of the scale per the evaluation constraints.