6

Many server administrators want their server to be used only by humans and not by retrieval programs like wget. One way to block such programs is to use log analysis. Log analysis identifies retrieval programs by looking for statistically significant similarities among the requests, often through timing.

Whenever I try to use wget to download packages through a shell script (one similar to those created by synaptic, mostly they are actually created by synaptic), only a few packages are downloaded and most of the packages fail to download due to connection refusal.

So I strongly think that the most probable reason why the connection is refused is that Ubuntu servers use log analysis to block programs.

Do Ubuntu servers use log analysis to block (package retrieval) programs?

EDIT:
I executed some scripts which contained packages of small size (i.e., they would get downloaded in less time). Such scripts work properly as expected. The error comes up with packages that are large in size (consequently they take more time).

jtd
  • 2,385

2 Answers2

3

wget has an option, --random-wait, that is designed to avert log analysis blocking. From the docs:

--random-wait

Some web sites may perform log analysis to identify retrieval programs such as Wget by looking for statistically significant similarities in the time between requests. This option causes the time between requests to vary between 0.5 and 1.5 * wait seconds, where wait was specified using the --wait option, in order to mask Wget's presence from such analysis.

A 2001 article in a publication devoted to development on a popular consumer platform provided code to perform this analysis on the fly. Its author suggested blocking at the class C address level to ensure automated retrieval programs were blocked despite changing DHCP-supplied addresses.

The --random-wait option was inspired by this ill-advised recommendation to block many unrelated users from a web site due to the actions of one.

So chances are, if the server accepts you with the --random-wait option turned on but not without it, it is using log analysis.

Richard
  • 8,588
1

Most of the mirrors aren't controlled by Ubuntu and their configuration is completely up to the sysadmins. By extension there may be some blocking on some mirrors. I personally don't see why they would but given the defaults, wget is pretty simple to fingerprint through its user-agent string even before you start considering behavioural tracking.

You can make wget look like the current apt quite simply:

wget -U "Ubuntu APT-HTTP/1.3 (0.9.9.1~ubuntu3)" ...

And as another user pointed out, if your current mirror is controlled by somebody who doesn't want you using wget, you could just use another mirror. There are loads of them.

Oli
  • 299,380