All the Wget Commands You Should Know

Wget lets you download Internet files or even mirror entire websites for offline viewing. Here are 20 practical examples for using the wget command.

How do I download an entire website for offline viewing? How do I save all the MP3s from a website to a folder on my computer? How do I download files that are behind a login page? How do I build a mini-version of Google?

Wget is a free utility – available for Mac, Windows and Linux (included) – that can help you accomplish all this and more. What makes it different from most download managers is that wget can follow the HTML links on a web page and recursively download the files. It is the same tool that a soldier had used to download thousands of secret documents from the US army’s Intranet that were later published on the Wikileaks website.

You mirror an entire website with wget

Mirror an entire website with wget

Spider Websites with Wget – 20 Practical Examples

Wget is extremely powerful, but like with most other command line programs, the plethora of options it supports can be intimidating to new users. Thus what we have here are a collection of wget commands that you can use to accomplish common tasks from downloading single files to mirroring entire websites. It will help if you can read through the wget manual but for the busy souls, these commands are ready to execute.

1. Download a single file from the Internet
wget http://example.com/file.iso

2. Download a file but save it locally under a different name
wget ‐‐output-document=filename.html example.com

3. Download a file and save it in a specific folder
wget ‐‐directory-prefix=folder/subfolder example.com

4. Resume an interrupted download previously started by wget itself
wget ‐‐continue example.com/big.file.iso

5. Download a file but only if the version on server is newer than your local copy
wget ‐‐continue ‐‐timestamping wordpress.org/latest.zip

6. Download multiple URLs with wget. Put the list of URLs in another text file on separate lines and pass it to wget.
wget ‐‐input list-of-file-urls.txt

7. Download a list of sequentially numbered files from a server
wget http://example.com/images/{1..20}.jpg

8. Download a web page with all assets – like stylesheets and inline images – that are required to properly display the web page offline.
wget ‐‐page-requisites ‐‐span-hosts ‐‐convert-links ‐‐adjust-extension http://example.com/dir/file

Mirror websites with Wget

9. Download an entire website including all the linked pages and files
wget ‐‐execute robots=off ‐‐recursive ‐‐no-parent ‐‐continue ‐‐no-clobber http://example.com/

10. Download all the MP3 files from a sub directory
wget ‐‐level=1 ‐‐recursive ‐‐no-parent ‐‐accept mp3,MP3 http://example.com/mp3/

11. Download all images from a website in a common folder
wget ‐‐directory-prefix=files/pictures ‐‐no-directories ‐‐recursive ‐‐no-clobber ‐‐accept jpg,gif,png,jpeg http://example.com/images/

12. Download the PDF documents from a website through recursion but stay within specific domains.
wget ‐‐mirror ‐‐domains=abc.com,files.abc.com,docs.abc.com ‐‐accept=pdf http://abc.com/

13. Download all files from a website but exclude a few directories.
wget ‐‐recursive ‐‐no-clobber ‐‐no-parent ‐‐exclude-directories /forums,/support http://example.com

Wget for Downloading Restricted Content

Wget can be used for downloading content from sites that are behind a login screen or ones that check for the HTTP referer and the User Agent strings of the bot to prevent screen scraping.

14. Download files from websites that check the User Agent and the HTTP Referer
wget ‐‐refer=http://google.com ‐‐user-agent=”Mozilla/5.0 Firefox/4.0.1″ http://nytimes.com

15. Download files from a password protected sites
wget ‐‐http-user=labnol ‐‐http-password=hello123 http://example.com/secret/file.zip

16. Fetch pages that are behind a login page. You need to replace user and password with the actual form fields while the URL should point to the Form Submit (action) page.
wget ‐‐cookies=on ‐‐save-cookies cookies.txt ‐‐keep-session-cookies ‐‐post-data ‘user=labnol&password=123′ http://example.com/login.php
wget ‐‐cookies=on ‐‐load-cookies cookies.txt ‐‐keep-session-cookies http://example.com/paywall

Retrieve File Details with wget

17. Find the size of a file without downloading it (look for Content Length in the response, the size is in bytes)
wget ‐‐spider ‐‐server-response http://example.com/file.iso

18. Download a file and display the content on screen without saving it locally.
wget ‐‐output-document – ‐‐quiet google.com/humans.txt

wget

19. Know the last modified date of a web page (check the Last Modified tag in the HTTP header).
wget ‐‐server-response ‐‐spider http://www.labnol.org/

20. Check the links on your website to ensure that they are working. The spider option will not save the pages locally.
wget ‐‐output-file=logfile.txt ‐‐recursive ‐‐spider http://example.com

Also see: Essential Linux Commands

Wget – How to be nice to the server?

The wget tool is essentially a spider that scrapes / leeches web pages but some web hosts may block these spiders with the robots.txt files. Also, wget will not follow links on web pages that use the rel=nofollow attribute.

You can however force wget to ignore the robots.txt and the nofollow directives by adding the switch ‐‐execute robots=off to all your wget commands. If a web host is blocking wget requests by looking at the User Agent string, you can always fake that with the ‐‐user-agent=Mozilla switch.

The wget command will put additional strain on the site’s server because it will continuously traverse the links and download files. A good scraper would therefore limit the retrieval rate and also include a wait period between consecutive fetch requests to reduce the server load.

wget ‐‐limit-rate=20k ‐‐wait=60 ‐‐random-wait ‐‐mirror example.com

In the above example, we have limited the download bandwidth rate to 20 KB/s and the wget utility will wait anywhere between 30s and 90 seconds before retrieving the next resource.

Finally, a little quiz. What do you think this wget command will do?
wget ‐‐span-hosts ‐‐level=inf ‐‐recursive dmoz.org

Advertisements

Wolfram Alpha Answers Queries That Google Can’t

Wolfram Alpha can quickly solve a much wider array of problems and unlike Google, it doesn’t require to type queries in any particular syntax.

Want to know the current time in London, or how much is 10 pounds in grams? Google can directly answer some of these common questions without requiring you to sift through pages and pages of links.

However, it is not the only tool that offers instant answers. Wolfram Alpha, a search engine developed by Stephen Wolfram, can solve a much wider array of problems and unlike Google, it doesn’t require to type queries in any particular syntax. Here are some examples:

1. Your Location

where am i

A query like “where am i” will reveal your IP address and your current geographic location. Alternate queries that will get the same information about your computer include “who am i” and “what is my ip.”

2. Date and Time

You would normally need an Excel spreadsheet and may have to learn some formulas to perform basic calculations involving date and time but not with Wolfram Alpha. The tool lets you work with dates using natural English (similar toOutlook Calendar).

add a date

countdown to christmas

Time Difference between Dates

If you are to calculate the number of days before your next holiday, use “how many days until <holiday name>”. You can subtract dates like regular numbers or compute new dates with natural phrases like “second saturday of next month” or “now + 10 days.”

3. Food

Wolfram Alpha can instantly answer most of your food and nutrition related questions like how many calories are present in a bottle of Coke? Which is healthier – the french fries served at Burger King or the ones at McDonald’s?

calories

compare food nutritional value

vitamins

4. Time Zones

You know that a query like “time in <city>” will display the current time of that city. This works for most search engines but Wolfram has an additional feature that works in reverse.

You can specify the time of any city and it will convert that time into your local timezone. This should be handy when a client suggests a meeting time using his timezone and you have to quickly figure out if that time works out for you.

timezone calculation

Also see: A Less Confusing View of World Time Zones

5. Astronomy

If you are a fan of Astronomy, you’ll absolutely love Wolfram Alpha and it can compute the position of stars and planets for any given day.

The tool can tell you the exact dates for astronomical events like when the next solar eclipse will take place while a specific query like “solar eclipse in new york” will show you the date for the next eclipse that will be visible from New York.

next solar eclipse

star positions

6. Finance

Want to know the number of people who are working for a particular company. Wolfram Alpha can get this information and more using simple queries like “market cap of Apple” or “revenue of Google.” You may also use the tool to query past stock prices and indices.

stock prices

employee information

7. Colors

What do you get when you pour some red paint in a bucket of yellow paint? What’s the HTML and RGB equivalent of Purple?

colors

8. Comparisons

Wolfram|Alpha is an excellent tool for performing comparisons and it presents results in a neat table making it easy for you to interpret the data.

You may compare almost anything and everything from airports, universities, size of popular structures (Statue of Liberty vs Eiffel Tower), quantities (10 lb vs 12 kg), stock quotes, sales tax rates in various cities, sports teams, and even standard paper sizes.

compare paper sizes

9. Weather

Most search engines offer weather forecast for the next 7 or 10 days but with Wolfram Alpha, you can get historic weather conditions of a city as well for any given date.

And unlike Google which will only give you sunrise and sunset times for the next day, Wolfram Alpha can compute that information for any past and future date.

weather in london

sunrise time

10. Understand Relations

This is one of my favorites. Put it any complex family relationship – like your mother’s sister’s son’s wife’s father – and Wolfram will map it into a genealogical tree making it easier for you make sense of that relation.

family relations

These are just some uses of the very-awesome Wolfram|Alpha. Do check this pagefor more examples and then go here to add Wolfram to your browser’s search box.

Free Up Space in Gmail by Backing Up Everything First

When GMail Overflows

My Gmail account has filled up to 97 percent of my free allotment. I am too cheap to pay for more space. Is there a way send all this to another Gmail account as backup and free up space? Help, I only have a few days to spare! A. If you’ve maxed out the seven gigabytes of space you currently get with a Gmail account, there is a way to sling your old mail into another account. But it takes a few steps. First, sign up for a new Gmail account at http://www.gmail.com and make note of the new address and password. Next, log into your original Gmail account. At the top of the mail page, click on the Settings link and on the next screen, click on the Forwarding and POP/IMAP link. In the POP Download area of the page, click on the button next to Enable POP download for all mail. In the pop-up menu right below, choose what you want to do with all the old messages. When you pick the “keep Gmail’s copy in the Inbox” selection, you can go back and delete the nonessential messages you don’t want to keep in your original account to reclaim space. (If you don’t need all the old mail in this account, another option here automatically deletes all the messages from the Gmailbox after they are downloaded elsewhere.) Click the Save Changes button when finished. Next, log into your new Gmail account and click on the Settings link. Click on the Accounts and Import link. Skip the “Import mail and contacts” area and go to the “Check mail using POP3” area, then click on the button to add a new POP3 e-mail account. In the box that pops up, fill in your original Gmail account name and password and click the Add Account button. (The server address is pop.gmail.com.) Gmail then imports all the messages from the original account. If you have several gigabytes of mail, it could take hours to fetch it all. In your original account, you can now manually delete the less important messages. Finally, return to the Setting screen and disable the POP download function so the new mail stays on your original account. Or you could pay Google $20 for 80 gigabytes of storage — if that seems easier. (Google has more information about pricing and importing mail from other accounts in its help section.)