In the ever-evolving world of Language Model Benchmarks (LLMs), a question arises about their reliability. Critics argue that LLM benchmarks can be unreliable due to various factors, such as training data contamination and models overperforming on carefully crafted inputs. Avijit Chatterjee, Head of AI/ML and NextGen Analytics at Memorial Sloan Kettering Cancer Center, offers an interesting perspective on this debate. He emphasizes that widespread technology adoption often speaks louder than benchmarks.
Chatterjee draws parallels between the LLM debate and historical database benchmarks, like TPC-C for OLTP and TPC-DS for Analytics. He notes that despite the fierce competition among database vendors in the past, today's leader in the cloud-native data warehouse market, Snowflake, no lon
Leaders Opinion: The Problems with LLM Benchmarks
- By 재은
- Published on
The issues with LLM benchmarks extend beyond reliability
