4

TL; DR;

Is generating automated requests to a government site illegal in some way?


There is a government site (EU/Ireland) that provides a WebChat service as a contact option. When the chat is active there is a green button, otherwise, it is grey. It is almost impossible to find the green button (although sometimes it is green) so I decided to test the real availability of this service.

My idea was to create a script that accesses the website, checks the status of the button and then logs it. My goal was not to DDoS the server or something similar so I thought a request every minute will suffice for getting some statistics.

Could this be considered illegal? I know it might depend on each country's laws but if someone in the field could provide some insight or directions is really appreciated.


Even more Details:

The website is always up even if the chat is not available, so it is not only testing an HTTP request. What I found is that the HTML is loaded without the "button" (it's not a button tag, it's actually a clickable image), then a javascript script connects to the chat service and depending on the status, it loads the "button" to access the chat or to notify it is unavailable. This is an <img> tag. What I do is read the "alt" property description where the status can be identified. Then I put that in a cron job, log the time and the status. Finally, at the end of the day, I summarise the data into a chart showing the percentage of time the chat was really active. The purpose of this experiment is to prove that the customer service is awful with actual data.

1 Answers1

14

What you are doing is commonly referred to "web scrapers" and they are legal in the EU.

What you cannot do is extract personal data. Since the data you are aggregating is non-personal data, whether or not a chat button is available, it should be fine.

EDIT

As some of the commenters said - it's legal, but many websites detect scrapping. To (try to) avoid being blocked by the server, make it act human. Something like. I check once every 15 minutes with 3 minutes +/- is probably enough. That also is probably what a human would do using the website so it should strengthen your argument the website is unavailable.

sevensevens
  • 1,544
  • 2
  • 16
  • 22