January 24, 2016 deals development php development 0
sale finder site

How to build a deal finder website with a crawler

Google_SpiderHave you ever dreamed about building a discount finder website yourself? Well if you have, I’ve got some good news for you.  With the help and advice from Everyday is Black Friday,  we’ll put together some nice articles about how to effectively build a deal finder website that can scan the internet autonomously and find the best deals from all major retailers. You can use that data to build a website, to build an app, or just to have some valuable data for a market research. In the first post, I’ll explain the basics of a deal finder website, I’ll explain the structure of the software you’ll need to build, and will highlight some of the challenges you might encounter along the way.

So let’s begin with the beginning. You want to build a deal finder website, but you don’t know where to start from. First of all, we need to figure out the basic structure and write down exactly what we want to do.

In the case of Everyday is Black Friday, the goal was to check the retailers’ products prices  every day, and save that info in an ever-growing database. We’ll have to compare today’s price with yesterday’s and for this we need to save them in a database. But before that, we need a clever way of  identifying product pages on any retailer website. This is quite a big challenge as every website has a different structure, and there is no certain pattern or rule that you can use for all websites.

So here is the main structure of the website:

  • Spider 1 – Build a web spider that can extract all links from a website, and then save them into a database. Make sure there are no duplicate links saved. Everyday is black Friday currently has more than 9 million links in the database, out of which about 2 million was identified as product, the rest is either duplicate links, trash links or 404 pages.
  • Spider 2 – Spider 2 would check whether the page is a product page or not, and then save it in another table in the database as a product link, along with the product title and also, the product image, while marking the non-products non-product pages. Non product pages are usually the homepage, category pages,  blog posts, etc.
  • Spider 3 – Spider 3 will perform the routine, 24h price check. It will take a product from the database, and then check it’s price. And then it will do this again in 24 hours, and so on. This is one of the most resource-consuming  tasks of the software, as it usually check hundreds of prices every minute, so make sure your server can support this. Whenever the Spider 3 notices that today’s price is lower than yesterday’s it will mark that product as a deal or discount, and it will display it on the front end of the website along with a link towards the retailer website, where users can then browse the deals.

I have just described the main structure of a deal finder software that you can build yourself. This might seem very simple at first, but actually doing it and making this work is a really complex operation and it can take up to 1 year of development. There are many issues along the way, and you’ll have to drink loads of coffee while developing the software 🙂

From PHP bugs to unknown issues that seem never ending, a deal finder website like this can be one of the biggest challenges a programmer can have.

In the next articles I’ll uncover the first steps that you need to take to make this a reality, includin server settings and configuration, hosting recommendation, and the first line of code that you’ll write.

I hope you enjoyed the intro of my “how to build a deal finder website” tutorial 🙂