fbpx

What is Screenscraping?

It’s programmatically gathering data from a website. A software developer writes a bit of code, to;

  1. Startup a browser
  2. Click around the website as if its a normal user
  3. Download html / download data dependent on the particular website

 

Why would you consider it?

  • When you need the data from a particular website automated into your reporting or workflow
  • When API data isn’t available for that particular website

What are the pros?

  • Usually the cost is free
  • Handy to get data when API data isn’t available

 

What are the cons and why would I bother using an API then?

An API is a contract, it’s usually alot more reliable, and robust. So the systems that you build off an API will run far more reliably.

A website can be changed by the owner at any stage, websites are not as reliable as APIs (usually) and therefore your downstream systems need to manage this carefully.

So should I use screenscraping as a data-gathering technique?

Absolutely, if you manage expectations and you get a proper bit of code written to do it.

Any tips on getting it done?

  • Don’t use a hand cranked script running from someones desktop, it won’t work reliably
  • Do make sure the screenscraping code is running from a server
  • Do make sure there are notifications on both success and error paths
  • Do make sure there are retries built in

 

If you have questions on your datalake and how to make it more reliable and valuable, get in touch, we do this for a living!

Download the Perfect Practice KPI Cheatsheet

Download the Perfect Practice KPI Cheatsheet

Join our mailing list to receive the latest news and updates from our team.

You have Successfully Subscribed!

Subscribe To Our Newsletter

Join our mailing list to receive the latest news and updates from our team.

You have Successfully Subscribed!