
Ankit Mishra
5 Minutes read
Death to Prompting! Long Live Programming!
Working with Large Language Models (LLM) usually starts with prompt engineering, which involves designing inputs that guide a model to produce meaningful results. This is typically the first and most crucial step in shaping the behavior of an LLM for a given task. However, this often turns into a frustrating trial-and-error process, where one keeps tweaking and refining prompts to extract the desired output. But what if there was a better way? What if you could program LLMs rather than just prompting them?
DSPy introduces a cleaner, more scalable approach to building LLM-powered applications by turning prompt engineering into repeatable, programmable workflows. This is why programmers and AI engineers should pay attention: DSPy make LLM more predictable and easier to maintain.
Stanford NLP brings you DSPy, a framework that is transforming how we build AI applications. Declarative Self-Improving Python (DSPy), allows users to program with foundation models, shifting away from brittle, hand-crafted prompts toward a more systematic and robust approach.
The Limitations with Prompting
Prompt engineering is often inconsistent and time-consuming. Even when you manage to get the correct output, the prompts are usually fragile and can easily fail if the model or the data changes.
A major limitation lies in the lack of modularity and reusability. Prompts are difficult to maintain and even harder to debug. As a result, building complex AI applications purely through prompts becomes both challenging and error-prone.
Short example of the pain point: You may spend hours turning a prompt for a specific model and dataset, only to discover the same prompt breaks when you change the model for slightly alter the input format.
DSPy to the Rescue
This is where DSPys creates a major shift from prompt tweaking toward structured, modular programming.
By shifting focus from prompting to programming, DSPy solves the problem of fragile, prompt-dependent applications. When a component in the pipeline changes, you can re-optimize the program for the task by recompiling the entire pipeline rather than re-engineering prompts manually.
DSPy allows you to focus on the “what” of your AI application while the framework determines the “how”. This is powered by several powerful abstractions such as:
- Signatures: Define input-output types for modules, clearly specifying expected behavior.
- Modules: Reusable components that interact with LLMs, such as chain-of-thought or ReAct modules, used to build complex pipelines.
- Optimizers: Algorithms that automatically adjust prompts and module weights to improve accuracy based on defined metrics.
- Evaluator: Automatically assesses program performance using your selected evaluation metrics.
Each of these pieces makes the overall program easier to reason about, test, and maintain.
How DSPy functions
DSPy works by taking Python code from the user and breaking it down into smaller tasks that can be easily understood by a language model. This process is similar to what a traditional complier does when translating high-level code into machine instructions. DSPy uses the program, the data, and the validation metric to refine the program for the assigned task.
One of its most notable capabilities is its ability to generate and refine prompts automatically. It uses a method called bootstrapping, where the framework relies on a small number of labeled examples to create a much larger set of synthetic examples.
But before going too abstract, let’s look at a simple, practical example of how DSPy actually works in action.
First, we import the DSPy library and connect to an LLM, in this case Groq’s Llama 3.1, using dspy.LM with a model ID and API key. The temperature is set to 0 to ensure predictable, consistent responses, ideal for factual accuracy. The main task is defined via dspy.Predict using a human-readable signature “question -> answer: float”, specifying that the model takes a question and returns a float answer.
This creates a simple module, simple_math, where any input question reliably produces a structured numeric answer, demonstrating how DSPy simplifies prompting with clear input-output definitions.
Now that the simple_math module is defined, we call it like a function. We pass our probability problem in the question argument, which matches our input field in our signature. DSPy handles all backend work required to format this into a proper prompt for the llama-3.1-8b-instant model. It sends this query to the model and, since our signature specified “answer: float”, it will return the result in a structured numeric format, which gets printed out by print().
When we execute the code, the LLM returns a response that DSPy wraps in a class called Prediction. Look closely at this image, and one can see it isn’t just a raw string but is instead a structured key-value pair: answer=0.0. That result proves to us that the language model obeyed the signature we provided it stating the output field was called answer and it had a data type of float.
Of course, the format is correct, but the answer itself is incorrect for this probability question; it should be 1/36, or 0.0277. This shows a very common output for smaller models when using simple, zero-shot prompts: it effectively provides a baseline, where the basic instruction was followed while the reasoning was flawed. DSPy’s advanced features, including optimization and multi-stage prompting, are designed to solve exactly this type of issue.
The model’s wrong answer can be corrected by replacing the dspy.Predict with dspy.ChainOfThought. This simple change automatically prompts the language model to first explain its step-by-step reasoning before producing the final answer. As shown by the output, this forces a more logical process that provides the correct calculation (1/36) for the accurate answer. This shows how easily DSPy can improve the performance of a model on such complex tasks.
We can even see the history logs. This history log shows why DSPy is both powerful and model-agnostic: It shows the detailed, structured prompt that DSPy constructs automatically from your simple signature. As opposed to crafting this elaborate prompt yourself, it’s automatically built by DSPy, making sure to tell the LLM explicitly what the input should be-what is the question? the fields of the output (reasoning, answer), and how exactly the response should come. This automation is key; it abstracts away low-level prompting so that you can get reliable, structured data out of any supported LLM without having to change code.
This walkthrough illustrates DSPy’s philosophy: focus on program structure, not manual prompts, to get better LLM results. A simple dspy.Predict gave a wrong answer, but switching to dspy.ChainOfThought enabled proper reasoning. DSPy automatically handles complex prompt generation, and its modular, declarative approach allows building reliable, sophisticated applications across different LLMs with minimal effort.
The Benefits of DSPy
The advantages that DSPy offers over the traditional prompt engineering techniques are numerous including:
- Trustworthiness: The declarative style of DSPy is the main cause of more dependable and predictable LLM behavior.
- Simplified development: DSPy takes the difficulties of prompt engineering out of the way here by letting you concentrate on the higher-level logic of your application.
- Flexibility: DSPy code can work with any LLM, thus, you won’t have to change your code every time you want to use a different LLM.
- Capacity for growth: The existence of modular architecture at DSPy allows one to carry out large-scale AI applications both easy and maintenance friendly.
Use Cases of DSPy
DSPy is a versatile technology that can be integrally built in a wide area of AI applications such as:
- Question answering: With the help of DSPy, one may construct complex systems for question answering that in turn can fetch data from multiple resources.
- Text summarization: Here DSPy can do the work by extracting brief yet comprehensive informative summary out of the voluminous text.
- Code generation: By the means of DSPy, it is possible to write artificial code in a diverse number of languages of different programming.
- Language translation: A task that can be accomplished by the DSPy unit is to set up a reliable and efficient machine translation system.
- Chatbots and Conversational AI: DSPy-enabled chatbots and conversational AI agents can be developed to possess intelligence beyond the human level.
Conclusion
DSPy represents a new way of building with LLMs. It gives developers the structure and reliability of programming while preserving the creative power of large models. For anyone frustrated with fragile prompt or scaling AI applications, DSPy offers a cleaner and more powerful alternative.
Explore the DSPy framework and experiment with building your first optimized module.
References
Related Insights


The AI Developer’s Guide to Data Formats: TOON vs. JSON and Beyond


Closed-Loop Energy & Carbon Optimization for Manufacturing Lines

