Spaces:

derek-thomas
/

prompt-order-experiment

Paused

App Files Files Community

prompt-order-experiment / prompt_order_exeriment /pages /index.py

derek-thomas

Spacing

367c6b5 verified 11 months ago

raw

history blame contribute delete

2.39 kB

	import reflex as rx
	from datasets import load_dataset

	dataset = load_dataset("derek-thomas/labeled-multiple-choice-explained-falcon-tokenized", split='train')
	df = dataset.to_pandas()


	p1 = '''
	# Prompt Order Experiment
	This experiment aims to explore various scenarios for prompt fine-tuning using structured generation. We'll test how the order of elements in a prompt affects model performance. The elements we consider are:
	- (Q): Question
	- (AC): Answer Choices
	- (R): Reasoning
	- (FA): Final Answer

	## Scenarios
	We will evaluate the following prompt orders:

	### Scenario 1: Q - AC - R - FA (Falcon and GPT3.5)

	This is the most natural order. The model generates reasoning before the final answer, providing the most information prior to making a selection. This order leverages decoding mechanics effectively.

	This is our user message, we can see the question and answer choices

	<details>
	<summary>Click to show prompt!</summary>
	'''
	p2 = f'''

	```json
	{df['conversation_RFA_gpt3_5'].iloc[0][0]}
	```

	This is our assistant message, you can see that we are forcing a JSON (note I added spacing for visual purposes), and we are putting the reasoning first. Using a JSON in fine-tuning will improve our structured generation results as the model will get used to responding in that "space".
	```json
	{df['conversation_RFA_gpt3_5'].iloc[0][1]}
	```
	</details>
	'''

	p3 = f'''

	### Scenario 2: Q - AC - FA - R (Falcon and GPT3.5)

	An awkward order, placing reasoning after the final answer. While it is faster, it assumes the model can "know" reasoning internally before generating it. This approach saves tokens but is a skeptical case worth testing.

	<details>
	<summary>Click to show prompt!</summary>

	```json
	{df['conversation_FAR_gpt3_5'].iloc[0][0]}
	```

	```json
	{df['conversation_FAR_gpt3_5'].iloc[0][1]}
	```
	</details>
	'''
	p4 = '''

	### Scenario 3: Q - AC - FA

	This serves as a fine-tuning control. No reasoning is provided in the output.

	### Scenario 4: Base

	An un-fine-tuned control for comparison purposes.

	### Structured Generation
	Structured generation ensures consistent response formats, which is crucial for reliable fine-tuning. Initial experiments faced difficulties with response consistency and structured generation can solve this.
	'''


	def page():
	return rx.vstack(
	rx.markdown(p1+p2+p3+p4),
	)