14

Sometimes a link has unicode characters in it, such as http://www.example.com/файл.zip

If you point your browser to it, it will properly prompt you to download the file as файл.zip. But if you try to do it with wget, the file comes with a mix of ?, percent encoding (like %D0%BB) and the (invalid encoding) string after the filename.

What parameters can I add to wget, or any other command line tricks, so that it behaves as Chrome and Firefox and saves the file exactly as specified in the rendered link - in this case, as файл.zip?

The solution should work without having to explicitly write it in the command, so an explicit wget -O файл.zip http://www.example.com/файл.zip is not a good solution.

I realize that as soon as you run wget http://www.example.com/файл.zip it tries to retrieve http://www.example.com/%D1%84%D0%B0%D0%B9%D0%BB.zip, that is, it converts the link to percent encoding, which may be the reason why it saves it doesn't render the filename "properly".

I posted a somewhat related question here, whose answer may or may not be of help to this one.

Strapakowsky
  • 12,304

4 Answers4

23

For wget, you can use:

wget http://www.example.com/файл.zip --restrict-file-names=nocontrol

if your system can handle UTF-8 or other encoding properly.

Finally, if you still have those % symbols left in your downloaded file, you can use Python module urllib.unquote(filename) that will replace %xx escapes by their single-character equivalent.

3

You can use curl instead, as follow

curl -O http://www.example.com/файл.zip

It will save it to файл.zip.

John Siu
  • 2,581
  • 1
  • 20
  • 24
0

I couldn't find a way to solve this issue with wget but could successfully transfer the files with Midnight Commander.

0

My answer is similar to the one posted by Balaji Purushotham.

I had to add .parse to get this working in python:

import wget
wget.download(urllib.parse.unquote(url), destination_file)
Nmath
  • 12,664