| import reflex as rx | |
| from datasets import load_dataset | |
| dataset = load_dataset("derek-thomas/labeled-multiple-choice-explained-falcon-tokenized", split='train') | |
| df = dataset.to_pandas() | |
| p1 = ''' | |
| # Prompt Order Experiment | |
| This experiment aims to explore various scenarios for **prompt fine-tuning** using structured generation. We'll test how the order of elements in a prompt affects model performance. The elements we consider are: | |
| - **(Q)**: Question | |
| - **(AC)**: Answer Choices | |
| - **(R)**: Reasoning | |
| - **(FA)**: Final Answer | |
| ## Scenarios | |
| We will evaluate the following prompt orders: | |
| ### **Scenario 1: Q - AC - R - FA** (Falcon and GPT3.5) | |
| This is the most natural order. The model generates reasoning before the final answer, providing the most information prior to making a selection. This order leverages decoding mechanics effectively. | |
| This is our user message, we can see the question and answer choices | |
| <details> | |
| <summary>Click to show prompt!</summary> | |
| ''' | |
| p2 = f''' | |
| ```json | |
| {df['conversation_RFA_gpt3_5'].iloc[0][0]} | |
| ``` | |
| This is our assistant message, you can see that we are forcing a JSON (note I added spacing for visual purposes), and we are putting the reasoning first. Using a JSON in fine-tuning will improve our structured generation results as the model will get used to responding in that "space". | |
| ```json | |
| {df['conversation_RFA_gpt3_5'].iloc[0][1]} | |
| ``` | |
| </details> | |
| ''' | |
| p3 = f''' | |
| ### **Scenario 2: Q - AC - FA - R** (Falcon and GPT3.5) | |
| An awkward order, placing reasoning after the final answer. While it is faster, it assumes the model can "know" reasoning internally before generating it. This approach saves tokens but is a skeptical case worth testing. | |
| <details> | |
| <summary>Click to show prompt!</summary> | |
| ```json | |
| {df['conversation_FAR_gpt3_5'].iloc[0][0]} | |
| ``` | |
| ```json | |
| {df['conversation_FAR_gpt3_5'].iloc[0][1]} | |
| ``` | |
| </details> | |
| ''' | |
| p4 = ''' | |
| ### **Scenario 3: Q - AC - FA** | |
| This serves as a fine-tuning control. No reasoning is provided in the output. | |
| ### **Scenario 4: Base** | |
| An un-fine-tuned control for comparison purposes. | |
| ### Structured Generation | |
| Structured generation ensures consistent response formats, which is crucial for reliable fine-tuning. Initial experiments faced difficulties with response consistency and structured generation can solve this. | |
| ''' | |
| def page(): | |
| return rx.vstack( | |
| rx.markdown(p1+p2+p3+p4), | |
| ) | |