Resources for Greek Financial LLMs
Plutus
Plutus is a benchmark for evaluating LLMs in Greek financial applications, covering tasks like named entity recognition, question answering, and summarization. It addresses low-resource challenges with domain-specific datasets for systematic assessment.
Plutus is the first comprehensive benchmark designed to evaluate large language models (LLMs) in Greek financial applications. Developed by The Fin AI, it addresses the challenges of low-resource Greek financial NLP by defining five core tasks: numeric and textual named entity recognition, question answering, abstractive summarization, and topic classification.
To ensure fairness and prevent overfitting, Plutus incorporates diverse, high-quality Greek financial datasets, rigorously annotated by expert native speakers. It enables systematic and reproducible model evaluations, helping developers refine their models while providing financial professionals with clear insights into AI performance.
Plutus also introduces Plutus-8B, the first Greek financial LLM, fine-tuned on domain-specific Greek data. Evaluations on 22 LLMs highlight the limitations of cross-lingual transfer and the need for financial expertise in Greek-trained models. By assessing strengths and weaknesses, Plutus offers actionable insights to build more accurate and trustworthy Greek financial AI systems, fostering multilingual inclusivity in financial NLP.
Benchmark Scope
Plutus provides a comprehensive evaluation framework for Greek financial AI, addressing the challenges of low-resource financial NLP. It includes:
Five core financial NLP tasks, covering numeric and textual named entity recognition, question answering, abstractive summarization, and topic classification.
Three novel, high-quality Greek financial datasets, rigorously annotated by expert native Greek speakers, alongside two existing financial resources.
The first Greek financial LLM, Plutus-8B, fine-tuned on Greek domain-specific data to enhance financial reasoning and linguistic adaptation.
Comprehensive evaluation of 22 LLMs, revealing limitations in cross-lingual transfer and the necessity of Greek financial expertise in model training.
Public release of Plutus-ben and Plutus-8B, promoting reproducible research and fostering multilingual inclusivity in financial NLP.
Open Greek Financial LLM Leaderboard
The Plutus Living Leaderboard continuously tracks and updates performance results across key Greek financial NLP tasks, providing real-time, transparent evaluation of LLMs. It enables systematic benchmarking of models on five core tasks using rigorously curated Greek financial datasets, ensuring fair assessment and progress tracking for Greek financial AI development.
Supported Tasks
Plutus assesses large language models across key financial NLP tasks tailored to the Greek financial domain. Each task is designed to evaluate specific capabilities, ensuring a comprehensive assessment of model performance in real-world financial applications.
Numeric NER
Identify and classify numerical financial entities, such as monetary values, percentages, dates, and quantities.
Greek financial documents often contain complex numerical expressions, including currency values (€, $), ratios, financial percentages, and temporal markers.
Numeric entities can appear in different formats, requiring models to distinguish between monetary amounts, percentages, and other numerical references.
Textual NER
Detect and categorize textual financial entities, including companies, organizations, people, and locations in Greek financial reports and regulatory filings.
Greek financial texts contain long-form company names, regulatory bodies, and financial institutions, requiring robust entity recognition.
Named entities often include abbreviations, acronyms, and regional variations, making accurate entity linking crucial.
Question Answering
Test a model’s comprehension and reasoning over Greek financial texts through multiple-choice QA tasks.
Requires deep financial knowledge, including regulatory compliance, investment principles, and risk assessment.
Greek financial texts use technical jargon and domain-specific terminology, demanding models with strong contextual understanding.
Abstractive Summarization
Generate concise, coherent, and informative summaries of Greek financial documents, such as earnings reports, market analyses, and regulatory filings.
Greek financial texts contain long, complex sentences with numerical data and industry-specific details.
Requires models to maintain factual accuracy and coherence while condensing information.
Topic Classification
Categorize Greek financial news articles and reports into predefined topics, such as banking, investments, taxation, and regulatory affairs.
Greek financial news often blends multiple topics in a single article, requiring strong contextual disambiguation.
Models must differentiate between similar-sounding financial categories, such as "corporate finance" vs. "public finance".
Current Best Models and Surprising Results
Plutus-8B, the first Greek financial LLM, stands out as the best-performing model, surpassing both open-source and proprietary models in all five core financial NLP tasks. Its domain-specific fine-tuning enables it to excel in numeric and textual named entity recognition, question answering, summarization, and topic classification. Notably, Plutus-8B outperforms GPT-4 by 15.38% in numeric NER and demonstrates stronger financial reasoning than larger multilingual models. These results highlight the importance of domain adaptation in improving LLM performance for Greek financial applications.
Among general-purpose models, GPT-4 and Qwen2.5-32B deliver strong results in question answering and summarization, but struggle with Greek-specific financial terminology and numerical comprehension. Surprisingly, even Greek-trained models like Meltemi-7B and Llama-Krikri-8B fall short in financial tasks, showing that linguistic adaptation alone is insufficient without financial expertise. Additionally, larger models such as Qwen2.5-72B fail to consistently outperform smaller ones, indicating that scaling alone does not guarantee better performance in Greek financial NLP.
One of the most striking findings is the difficulty models face in numerical reasoning, a crucial aspect of financial AI. Even top-tier models like GPT-4o fail to accurately extract and interpret financial figures, reinforcing the need for better numeric representation in LLMs. Plutus-8B, in contrast, demonstrates superior performance in numerical entity recognition, proving that domain-specific training is essential for overcoming these challenges. As Plutus continues to benchmark and refine financial LLMs, it paves the way for more reliable and specialized AI models in Greek financial applications.
Try it now!
Partnerships
We would like to thank our partners, including the NACTEM and Archemedes RC, for their generous support in making the Plutus possible.
Greek Working Group Contributors
The University of Manchester
Yale University
The National Center of Text Mining, UK
Archimedes RC, GR