How to Teach a Multimodal Model to Recognize Compass Directions Using Prompt Engineering?

Question

I have the following compass images:

Image 1: Image 2: For a human, it is straightforward to infer the directions from these images. For example:

In Image 1, the directions are:

Top → North Bottom → South Left → West Right → East In Image 2, the compass is rotated, so the directions change accordingly:

Top → South (The rest of the directions follow based on the rotation) The Problem I am using Google's Gemini multimodal model, but it struggles to correctly interpret the compass direction from the images. The model's responses are inconsistent and often incorrect.

My Goal I want to achieve accurate direction recognition solely through prompt engineering (without additional training or external processing). I believe this should be possible with well-structured prompts.

Questions How can I craft an effective prompt to guide the model in correctly reading the compass orientation? Are there prompt engineering techniques that could improve accuracy? (e.g., step-by-step reasoning, few-shot examples, role-based prompting, etc.) If prompt engineering alone is insufficient, what alternative strategies can I try? (e.g., pre-processing the image, extracting features programmatically before passing it to the model) I would appreciate any insights or suggestions!

How to Teach a Multimodal Model to Recognize Compass Directions Using Prompt Engineering?

0 Answers0