LittleBizzy

Dominate technical SEO with a SlickStack cloud server for just $39/month!  Order Now

Recursive Wget Remote Files To Local Via SSH

If you’ve dealt with a fair share of website or server migrations, you know how frustrating a process it can be — especially when dealing with a poorly performing or extremely locked down web server. Because of the (continued) dominance of the Apache/cPanel duo in the world of web hosting, many webmasters possess limited knowledge when it comes to site migration methods outside of basic tools like FTP, Fantastico, or .cpmove export files.

One of the most powerful weapons in my “arsenal” as a migration cyborg is the recursive version of Wget, which few too many developers seem to know about. In the case that SSH and/or TAR-balling are unavailable on a source server (etc), recursive Wget is a glorious alternative that lets you rapidly “suck” all the remote files off any given public server, including the correct directory tree and hierarchies. Of course, some might protest that this method abandons the file permissions and owners/groups that would be preserved with tar-balls, but that is easily corrected with a simple permissions cron job.

Of course, one of the coolest things about Wget? Poor network connections won’t stop it:

“GNU Wget is a free Linux / UNIX utility for non-interactive download of files from the Web or and FTP servers, as well as retrieval through HTTP proxies. GNU/wget has been designed for robustness over slow dialup internet or unstable network connections. If a download fails due to a network problem, it will keep retrying until the whole file has been retrieved. If the server supports regetting, it will instruct the server to continue the download from where it left off.”

nixCraft

So for example, maybe you were able to export a WordPress site’s database, themes, and plugins with a free migration tool (etc). But hot damn, this site has a whopping 5GB of high resolution photos they’ve uploaded over the course of several years and you can’t export it via WordPress, nor do you have SSH access. Ladies and gentlemen, it’s time to Wget.

First, make sure you’re inside the local (new) server’s destination directory:

cd /home/example/www/wp-content/uploads/

Next, pull over the entire remote directory into your local directory (w/ no parents + infinite depth levels):

sudo wget -r -np -l inf https://www.example.com/wp-content/uploads/

Or you can do it via the FTP protocol:

sudo wget -r -np -l inf ftp://username:[email protected]:21/home/example/www/wp-content/uploads/

But maybe you’re a picky lad, and you want to retain the file timestamps. If you are moving to a new server, all the files getting new timestamps could break caching or just confuse your server management. Okie doke:

sudo wget --timestamping -r -np -l inf https://www.example.com/wp-content/uploads/

Keep in mind the following when it comes to file timestamps:

Note that time-stamping will only work for files for which the server gives a timestamp. For HTTP, this depends on getting a Last-Modified header. For FTP, this depends on getting a directory listing with dates in a format that Wget can parse.

Oops, sorry, what’s that you say? Now you have a SUPER ugly local path? (i.e.)…

/home/example/www/wp-content/uploads/www.example.com/wp-content/uploads/

…and you’re REALLY not looking forward to cleaning it all up… well then, it’s a good thing we can clean up those directory paths at the same time that downloading files occurs! Using the -nH shortcut removes the nasty “root” directory from being downloaded, and then the quite slick –cut-dirs=2 directive will slice off as many other sub-directories as you wish.

sudo wget --timestamping -r -np -l inf -nH --cut-dirs=2 https://www.example.com/wp-content/uploads/

And all in a single command… that’s literally easier than tarring and untarring. Verrrnice, high five!

Note: some bloggers have recommended using the --mirror a.k.a. -m option via Wget-FTP, perhaps because they aren’t aware of the --timestamping option above. That method is fine and dandy (and is recursive-infinite by default) however it only looks at “new” files that either don’t exist yet or whose Last Modified date has changed. Plus, it retains messy .listing files for FTP application purposes, which you possibly don’t want or need. (More Info)

About the Author

Jesse

Leave a Reply

Your email address will not be published. Required fields are marked *