I’m new to AI models and exploring how to use LLaMA for a specific task. I have a dataset with two columns, A and B, each containing 8-bit binary numbers (around 5,000 rows). My goal is to train a model that can:
Predict B when given A as input.
Perform bulk predictions by processing a file with multiple A values and outputting the corresponding B values.
(Bonus) Maybe Generate synthetic A-B pairs to expand my dataset.
I understand that LLaMA (or just LLMs in general) might not be the best-suited model for this structured binary task, but I want to approach this as an LLM learning exercise. Given that, how should I fine-tune LLaMA for this? Are there any specific preprocessing steps or training strategies that would make it work better for this use case? Any advice or resources would be greatly appreciated! I have been stuck on this for quite some time.