Key takeaways:
- Less is more when it comes to prompts: GPT performs better when not over-specified.
- The OpenAI chat API is sufficient for most use cases, and premature abstraction should be avoided.
- Streaming API and variable-speed output enhance user experience.
- GPT struggles with producing the null hypothesis, causing hallucinations and lack of confidence.
- Context windows for input and output are different, and RAG/embeddings are mostly useless for B2B applications.
Summary:
In this blog post, Ken Kantzer shares lessons learned from using half a billion GPT tokens in a B2B context, focusing on summarize/analyze-extract features. The key insights include:
-
Less is more with prompts: Over-specifying prompts can confuse GPT. Instead, it's better to rely on GPT's existing knowledge and use more vague language. This leads to better generalization and higher-order delegation.
-
The OpenAI chat API is sufficient: Using the chat API, along with error handling and token-length estimation, has proven to be flexible and efficient for the author's B2B use case. Premature abstraction, like using Langchain, is not necessary.
-
Streaming API and variable-speed output enhance user experience: Users react positively to variable-speed output, making it feel like a significant UX innovation.
-
GPT struggles with producing the null hypothesis: Prompting GPT to return an empty output when it doesn't find anything often results in hallucinations and a lack of confidence. It's better to fix bugs and avoid sending prompts when there's no relevant text.
-
Context windows for input and output are different: GPT-4 has a larger context window for input (128k tokens) than output (4k tokens). This can be problematic when asking GPT to return a list of JSON objects, as it often can't provide more than 10 items.
-
RAG/embeddings are mostly useless for B2B applications: Vector databases and RAG are primarily useful for search, not for B2B use cases like summarize/analyze-extract. The author suggests using a normal completion prompt to convert a user's search into a faceted-search or a more complex query instead.
source: Lessons after a half-billion GPT tokens