Vision-based Web Scraping with the New GPT-4o model in Make.com

Key Takeaways #

GPT-4 Omni allows for vision-based web scraping, extracting data from images.
Vision-based scraping is more robust than traditional HTML/CSS scraping, as it's not affected by design changes.
The cost of vision-based scraping is decreasing with models like GPT-4 Omni and Anthropic Claude 3 Haiku.

Vision-Based Web Scraping with GPT-4 Omni #

Problem: Traditional web scraping methods using HTML and CSS are fragile, break when websites update, and are not suitable for image-based data extraction.
Solution: Vision-based scraping utilizes models like GPT-4 Omni that can interpret images and extract data, overcoming the limitations of traditional methods.
Benefits:
- Robust: Not susceptible to website design changes.
- Flexible: Can extract data from images and screenshots.
- Affordable: Pricing for vision-based models is decreasing.

Implementation Steps #

Get a screenshot of the target webpage: Use a third-party service like Dumpling AI for this step.
Prepare the prompt in OpenAI playground:
- Define what data you want to extract and the desired format (JSON).
- Specify the image URL as input.
Copy the prompt into Make.com:
- Use the "Make an API Call" option in the ChatGPT module.
- Replace the hardcoded screenshot URL with a variable to dynamically fetch it from Dumpling AI.
- Configure the URL to and the method to .
Process the JSON response: Convert the data into a usable format for your application.

Example: Scraping Crypto Data from CoinMarketCap #

Data to scrape: Bitcoin and Ethereum prices, market cap, and the Fear & Greed Index.
Process:
1. Take a screenshot of CoinMarketCap using Dumpling AI.
2. Use the screenshot URL in an OpenAI playground prompt to extract the desired data.
3. Copy the prompt into Make.com, replacing the hardcoded URL with a variable.
4. Trigger the API request and retrieve the data in JSON format.
5. Parse the JSON response to extract the desired data for your application.

Applications and Use Cases #

Scraping websites with frequently changing designs.
Extracting data from images or documents that are not readily available in text format.
Automating tasks involving data extraction from visual sources, such as invoices or receipts.

"This is more of a building block rather than a project that you would sell to your customers or a project you would build yourself."

Conclusion #

Vision-based web scraping using GPT-4 Omni opens new possibilities for extracting data from visual sources. This technology can significantly enhance the robustness and flexibility of your web scraping workflows and create opportunities for innovative automation solutions.

Summary for: Youtube