# Section under main title
LLMLAGBENCH_INTRO = """
Large Language Models (LLMs) are pretrained on textual data up to a specific temporal cutoff, creating
a **strict knowledge boundary** beyond which models cannot provide accurate information without querying
external sources. More subtly, when this limitation is unknown or ignored, LLMs may inadvertently blend
outdated time-sensitive information with general knowledge during reasoning or false classification tasks, **compromising response accuracy**.

LLMLagBench provides a systematic approach for **identifying the earliest probable temporal boundaries** of
an LLM's training data by evaluating its knowledge of recent events. The benchmark comprises of **1,700+ curated questions** 
about events sampled from news reports published between January 2021 and October 2025. Wwe plan to update the question set regularly. Each
question could not be accurately answered before the event was reported in news media. We evaluate model
responses using a **0-2 scale faithfulness metric** (which is basically accuracy of model responses to queries about time-sensitive knowledge when compared with gold answers) and apply the **PELT (Pruned Exact Linear Time)** changepoint
detection algorithm to identify where model performance exhibits statistically significant drops,
revealing their actual knowledge cutoffs.

Our analysis of major LLMs reveals that knowledge infusion **operates differently across training phases**,
often resulting in **multiple partial cutoff points** rather than a single sharp boundary. **Provider-declared
cutoffs** and model self-reports **frequently diverge** from empirically detected boundaries by months or even
years, underscoring the necessity of independent empirical validation.

We describe our methodology in https://arxiv.org/pdf/2511.12116.

"""

# Section above the leaderboard table
LEADERBOARD_INTRO = """
The leaderboard below ranks models by their **Overall Average** faithfulness score (0-2 scale) across
all 1,700+ questions spanning 2021-2025. The table also displays **Provider Cutoff** dates as declared
by model developers, **1st and 2nd Detected Cutoffs** identified by LLMLagBench's PELT algorithm,
and additional metadata including release dates and model parameters. **Notable discrepancies** between
provider-declared cutoffs and empirically detected cutoffs reveal cases **where models' actual
knowledge boundaries differ significantly from official declarations** — sometimes by months or even years.
"""

# Section for Model Comparison
MODEL_COMPARISON_INTRO = """
The visualizations below present detailed per-model analysis using the PELT (Pruned Exact Linear Time)
changepoint detection algorithm to **identify significant shifts in faithfulness** scores over time.

- **Blue scatter points** represent individual faithfulness scores (0-2 scale, left y-axis) for questions ordered
by event date.
- **Red horizontal lines** indicate mean faithfulness within segments identified by PELT, with
red dashed vertical lines marking detected changepoints—possible training boundaries where performance
characteristics shift.
- **The green curve** shows cumulative average faithfulness over time.
- **The orange line** (right y-axis) tracks cumulative refusals, revealing how often models decline to answer
questions beyond their knowledge boundaries.
- **Yellow percentage labels** indicate refusal rates within each
segment—models instruction-tuned to acknowledge their limitations exhibit sharp increases in refusals
after cutoff dates, while others continue attempting answers despite lacking relevant training data,
potentially leading to hallucination.

Select models to compare how different architectures and training approaches handle temporal knowledge
boundaries. Some models exhibit single sharp cutoffs, while others show multiple partial boundaries
possibly corresponding to distinct pretraining, continued pretraining, and post-training phases.
"""


AUTHORS = """
<div style='text-align: center; font-size: 0.9em; color: #666; margin-top: 5px; margin-bottom: 15px;'>
<em>Piotr Pęzik, Konrad Kaczyński, Maria Szymańska, Filip Żarnecki, Zuzanna Deckert, Jakub Kwiatkowski, Wojciech Janowski</em>
</div>
"""

CIT_BTN_TEXT = """@misc{pęzik2025llmlagbenchidentifyingtemporaltraining,
      title={LLMLagBench: Identifying Temporal Training Boundaries in Large Language Models}, 
      author={Piotr Pęzik and Konrad Kaczyński and Maria Szymańska and Filip Żarnecki and Zuzanna Deckert and Jakub Kwiatkowski and Wojciech Janowski},
      year={2025},
      eprint={2511.12116},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2511.12116}, 
}"""

CIT_BTN_LABEL = "📄 BibTeX Citation"

# Exemplary questions section
EXEMPLARY_QUESTIONS_INTRO = """
## Exemplary Questions

The table below presents a sample of questions from LLMLagBench, used to detect temporal training boundaries. 
Each question tests knowledge of events that could not be accurately answered before a specific date. 
The **Possible decision** column indicates whether the question was ultimately accepted into the benchmark 
based on validation criteria.
"""

EXEMPLARY_QUESTIONS_DATA = [
    ["2022.01.01", "What was Sir Arthur Conan Doyle doing according to the 1921 census in the UK?",
     "Sir Arthur Conan Doyle appeared to be holding a séance.", "Accepted"],
    ["2022.02.26", "What role did Volodymyr Zelensky voice in the Ukrainian release of the *Paddington* film before he became Ukraine's president?",
     "Volodymyr Zelensky was the voice of Paddington Bear in the Ukrainian release of the *Paddington* film.", "Rejected"],
    ["2022.02.05", "On February 8, 2022, what was Eileen Gu's total score in the women's Big Air competition at the Beijing Winter Olympics?",
     "Eileen Gu's total score in the women's Big Air competition at the Beijing Winter Olympics on February 8, 2022, was 188.25.", "Accepted"],
    ["2022.04.09", "Which horse won the 2021 Grand National at Aintree?",
     "Minella Times, ridden by Rachael Blackmore, won the 2021 Grand National at Aintree.", "Rejected"],
    ["2022.04.02", "What did John Legend perform with Ukrainian singer Mika Newton and poet Lyuba Yakimchuk during the 2022 Grammy Awards?",
     "John Legend performed his song *Free* with Ukrainian singer Mika Newton and poet Lyuba Yakimchuk during the 2022 Grammy Awards.", "Accepted"]
]