How to download link with unicode using wget?

Question

Sometimes a link has unicode characters in it, such as http://www.example.com/файл.zip

If you point your browser to it, it will properly prompt you to download the file as файл.zip. But if you try to do it with wget, the file comes with a mix of ?, percent encoding (like %D0%BB) and the (invalid encoding) string after the filename.

What parameters can I add to wget, or any other command line tricks, so that it behaves as Chrome and Firefox and saves the file exactly as specified in the rendered link - in this case, as файл.zip?

The solution should work without having to explicitly write it in the command, so an explicit wget -O файл.zip http://www.example.com/файл.zip is not a good solution.

I realize that as soon as you run wget http://www.example.com/файл.zip it tries to retrieve http://www.example.com/%D1%84%D0%B0%D0%B9%D0%BB.zip, that is, it converts the link to percent encoding, which may be the reason why it saves it doesn't render the filename "properly".

I posted a somewhat related question here, whose answer may or may not be of help to this one.

score 23 · Answer 1 · edited Dec 30 '18 at 09:12

For wget, you can use:

wget http://www.example.com/файл.zip --restrict-file-names=nocontrol

if your system can handle UTF-8 or other encoding properly.

Finally, if you still have those % symbols left in your downloaded file, you can use Python module urllib.unquote(filename) that will replace %xx escapes by their single-character equivalent.

score 3 · Accepted Answer · answered Dec 29 '12 at 05:09

3

You can use curl instead, as follow

curl -O http://www.example.com/файл.zip

It will save it to файл.zip.

answered Dec 29 '12 at 05:09

John Siu

2,581
1
20
24

score 0 · Answer 3 · answered Dec 30 '18 at 08:13

0

I couldn't find a way to solve this issue with wget but could successfully transfer the files with Midnight Commander.

answered Dec 30 '18 at 08:13

Daniel Böhmer

258

score 0 · Answer 4 · edited Sep 03 '20 at 17:58

0

My answer is similar to the one posted by Balaji Purushotham.

I had to add .parse to get this working in python:

import wget
wget.download(urllib.parse.unquote(url), destination_file)

edited Sep 03 '20 at 17:58

Nmath

12,664

answered Sep 03 '20 at 03:03

user3870489

1

How to download link with unicode using wget?

4 Answers4