74

Some parts of wikipedia appear differently when you're logged in. I would like to wget user pages so they would appear as if I was logged in.

Is there a way I can wget user pages like this

http://en.wikipedia.org/wiki/User:A

this is the login page:

http://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=Login&campaign=ACP3
Braiam
  • 69,112
user784637
  • 11,465

9 Answers9

75

The easy way: login with your browser, and give the cookies to wget

Easiest method: in general, you need to provide wget or curl with the (logged-in) cookies from a particular website for them to fetch pages as if you were logged in.

If you are using Firefox, it's easy to do via the cookie.txt add-on. Install the add-on, and:

  1. Click on the plugin and save the cookies.txt file (you can change the filename/destination).

  2. Open up a terminal, and use wget with the --load-cookies=FILENAME option, e.g.

     wget --load-cookies=cookies.txt http://en.wikipedia.org/wiki/User:A
    
  • For curl, it's curl --cookie cookies.txt ...

(I will try to update this answer for Chrome/Chromium users)

The hard way: use curl (preferably) or wget to manage the entire session

  • A detailed how-to is beyond the scope of this answer, but you use curl with the --cookie-jar or wget with the --save-cookies --keep-session-cookiesoptions, along with the HTTP/S PUT method to log in to a site, save the login cookies, and then use them to simulate a browser.
  • Needless to say, this requires going through the HTML source for the login page (get input field names, etc.), and is often difficult to get to work for sites using anything beyond simple login/password authentication.
  • Tip: if you go this route, it is often much simpler to deal with the mobile version of a website (if available), at least for the authentication step.
Pablo Bianchi
  • 17,371
ish
  • 141,990
27

Another easy solution that worked for me without installing anything extra, present both in Chrome/Brave and Firefox developer tools:

  1. Open "Network" tab of "DevTools": Ctrl + Shift + I
  2. Visit the page you want to save (e.g., a photo behind a login).
  3. Right-click the request and choose 'Copy', 'Copy as cURL'.

This will give you a command that you can paste directly into your shell, that has all your cookie credentials e.g.

curl 'https://mysite.test/my-secure-dir/picture1.jpg' \ 
  -H 'User-Agent: Mozilla/5.0 ...' \
  -H 'Cookie: SESSIONID=abcdef1234567890'

You can then modify the URL in the command to fetch whatever you want.

Pablo Bianchi
  • 17,371
4

With cURL is really easy to handle cookies in both ways.

curl www.target-url.com -c cookie.txt then will save a file named cookie.txt. But you need to log in, so need to use --data with arguments like: curl -X --data "var1=1&var2=2" www.target-url.com/login.php -c cookie.txt. Once you get loggued cookie you can send it with: curl www.target-url.com/?user-page.php -b cookie.txt

Just use -c (--cookie) or -b (--cookie-jar) to save and send.

Note1: Using cURL CLI is a lot of easier than PHP and maybe faster ;)

For save the final content you can easily add > filename.html to your cURL command then save full html code.

Note2 about "full": Yo cannot render javascript with cURL, just get the source code.

m3nda
  • 160
3

The blog post Wget with Firefox Cookies shows how to access the sqlite data file in which Firefox stores its cookies. That way one doesn't need to manually export the cookies for use with wget. A comment suggests that it doesn't work with session cookies, but it worked fine for the sites I tried it with.

Falko Menge
  • 807
  • 7
  • 14
3

For those still interested in this questions, there's a very useful Chrome extension called CurlWGet that allows you to generate a wget / curl request with authentication measures, etc. with one click. To install this extension, follow the steps below:

  1. Install the extension from the Chrome Webstore.
  2. Go the web page that would you like to download.
  3. Start the download.
  4. The extension will generate a link for you.

Enjoy!

TheOdd
  • 3,012
jehon
  • 205
3

Take a look at cliget for Firefox.

When you're about to download, on the final download dialog, you get the option to copy the download as curl command line to the clipboard.

Pablo Bianchi
  • 17,371
weberjn
  • 271
1

Have you tried this?

wget --user=username --password=password http://en.wikipedia.org/wiki/User:A
Eliah Kagan
  • 119,640
1

Try something like:

wget --keep-session-cookies --save-cookies cookies.txt --post-data 'user=goyamy&passwrd=mypassword' http://forum.ubuntu-it.org/

See also this link:

How to download this webpage with wget?

kenorb
  • 10,944
1

For more complicated website based logins you should also consider to use a Python script and some module which imitates a browser, like http://wwwsearch.sourceforge.net/mechanize/ instead of curl or wget.

This way session cookies are handled automatically, you can follow links and fill login forms, and so "script" yourself through the login process as if using your web browser.

kos
  • 41,268
StW
  • 51