Lessons Learned from Half a Billion GPT Tokens: Key Insights for B2B Use Cases

Key takeaways:

Less is more when it comes to prompts: GPT performs better when not over-specified.
The OpenAI chat API is sufficient for most use cases, and premature abstraction should be avoided.
Streaming API and variable-speed output enhance user experience.
GPT struggles with producing the null hypothesis, causing hallucinations and lack of confidence.
Context windows for input and output are different, and RAG/embeddings are mostly useless for B2B applications.

Summary:

In this blog post, Ken Kantzer shares lessons learned from using half a billion GPT tokens in a B2B context, focusing on summarize/analyze-extract features. The key insights include:

Less is more with prompts: Over-specifying prompts can confuse GPT. Instead, it's better to rely on GPT's existing knowledge and use more vague language. This leads to better generalization and higher-order delegation.
The OpenAI chat API is sufficient: Using the chat API, along with error handling and token-length estimation, has proven to be flexible and efficient for the author's B2B use case. Premature abstraction, like using Langchain, is not necessary.
Streaming API and variable-speed output enhance user experience: Users react positively to variable-speed output, making it feel like a significant UX innovation.
GPT struggles with producing the null hypothesis: Prompting GPT to return an empty output when it doesn't find anything often results in hallucinations and a lack of confidence. It's better to fix bugs and avoid sending prompts when there's no relevant text.
Context windows for input and output are different: GPT-4 has a larger context window for input (128k tokens) than output (4k tokens). This can be problematic when asking GPT to return a list of JSON objects, as it often can't provide more than 10 items.
RAG/embeddings are mostly useless for B2B applications: Vector databases and RAG are primarily useful for search, not for B2B use cases like summarize/analyze-extract. The author suggests using a normal completion prompt to convert a user's search into a faceted-search or a more complex query instead.

source: Lessons after a half-billion GPT tokens

last updated: 2024-04-20