What is Screenscraping? The good, bad and the ugly

What is Screenscraping?

It’s programmatically gathering data from a website. A software developer writes a bit of code, to;

Startup a browser
Click around the website as if its a normal user
Download html / download data dependent on the particular website

Why would you consider it?

When you need the data from a particular website automated into your reporting or workflow
When API data isn’t available for that particular website

What are the pros?

Usually the cost is free
Handy to get data when API data isn’t available

What are the cons and why would I bother using an API then?

An API is a contract, it’s usually alot more reliable, and robust. So the systems that you build off an API will run far more reliably.

A website can be changed by the owner at any stage, websites are not as reliable as APIs (usually) and therefore your downstream systems need to manage this carefully.

So should I use screenscraping as a data-gathering technique?

Absolutely, if you manage expectations and you get a proper bit of code written to do it.

Any tips on getting it done?

Don’t use a hand cranked script running from someones desktop, it won’t work reliably
Do make sure the screenscraping code is running from a server
Do make sure there are notifications on both success and error paths
Do make sure there are retries built in

If you have questions on your datalake and how to make it more reliable and valuable, get in touch, we do this for a living!

What is Screenscraping? The good, bad and the ugly

Ready to Automate Your Financial Consolidation?