Download website for offline use

Linux howto's, compile information, information on whatever we learned on working with linux, MACOs and - of course - Products of the big evil....
Post Reply
User avatar
^rooker
Site Admin
Posts: 1481
Joined: Fri Aug 29, 2003 8:39 pm

Download website for offline use

Post by ^rooker »

[PROBLEM]
I wanted to publish a php-based website made by me on a different server, but only as static HTML pages, since the php was only used for convenient automatic layouting of complex image arrangements (long story...)

Anyway:
I couldn't just use my browser and use "Save As...", because all the links would point to invalid filenames if saved as static HTML. They had to be rewritten. *sigh*


[SOLUTION]
Thanks to the GNU/Linux community and the Free Software people, who always seem to have thought about almost everything - including implementing a comfortable solution, I found a short article on the Linuxjournal-website about creating an offline copy using "wget" - which immediately, and perfectly, came to the rescue and saved my day.

In order not to use that valuable information, I thought I'd post it here, too. The commandline arguments I've actually used, were like this:

Code: Select all

$ wget \
     --recursive \
     --level=3 \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --restrict-file-names=windows \
     --domains website.org \
     --no-parent \
WEBSITE_ADDRESS (like https://whatever.com/dude/...) 

I like it when a tutorial also mentions which arguments, and why, they're using:
  • --recursive: download the entire Web site.
  • --no-parent: don't follow links outside the directory tutorials/html/.
  • --page-requisites: get all the elements that compose the page (images, CSS and so on).
  • --html-extension: save files with the .html extension.
  • --convert-links: convert links so that they work locally, off-line.
EDIT (2021-05-11): Added "--level" to avoid wget going crazy downloading the whole Internet. :shock:
Jumping out of an airplane is not a basic instinct. Neither is breathing underwater. But put the two together and you're traveling through space!
User avatar
^rooker
Site Admin
Posts: 1481
Joined: Fri Aug 29, 2003 8:39 pm

wget: Self signed certificate

Post by ^rooker »

In case you're trying to download a HTTPS secured website that has a non-trustworthy certificate for one reason or another (self signed, expired, non-trusted CA, etc), you might want to add the following option to the above wget command:

Code: Select all

--no-check-certificate
But be warned:
As wget already says:
To connect to http://www.somedomain.test insecurely, use `--no-check-certificate'.
Example error messages:
ERROR: no certificate subject alternative name matches requested host name
Jumping out of an airplane is not a basic instinct. Neither is breathing underwater. But put the two together and you're traveling through space!
Post Reply