Wednesday, November 13, 2024

How to Use curl in a Bash Script to Download News Articles for Offline Viewing

Chat history

Ever wish you could save your favorite news articles to read offline, whether for an airplane ride or a cozy, no-WiFi spot? Using curl with a Bash script is a great way to download web content in seconds. This guide will show you how to create a script to fetch articles with curl, then add some options for those who want to personalize the process.

 


 

Why Use curl?

curl is a command-line tool for transferring data to and from servers. It’s fast, flexible, and widely supported, which makes it perfect for downloading articles, images, and more with a simple script.


Step 1: Setting Up the Script

First, let’s set up our script to download a list of articles. Create a file called download_articles.sh:

nano download_articles.sh

Copy this starter script into your file:

#!/bin/bash # Directory to save downloaded articles SAVE_DIR="$HOME/Articles" mkdir -p "$SAVE_DIR" # Create the directory if it doesn't exist # List of article URLs to download URLS=( "https://example.com/news/article1" "https://example.com/news/article2" "https://example.com/news/article3" )

This code will save articles to an Articles folder in your home directory and sets up a list of example URLs to download.


Step 2: Download Articles Using curl

To download each article, we’ll loop through our list of URLs and use curl to fetch each one. We’ll use basename to give each file a meaningful name based on its URL:

for URL in "${URLS[@]}"; do # Extract a filename from the URL FILENAME=$(basename "$URL").html # Download the article with curl curl -L "$URL" -o "$SAVE_DIR/$FILENAME" echo "Downloaded: $FILENAME" done

Here:

  • basename extracts the last part of the URL (e.g., article1), and we add .html for readability.
  • curl -L follows any redirects (handy for news sites).
  • -o "$SAVE_DIR/$FILENAME" saves the file in our Articles folder with a unique name.

Run the script by making it executable and then executing it:

chmod +x download_articles.sh ./download_articles.sh

Step 3: Customizing the Script for Extra Flexibility

Want to give your script a personal touch? Here are some easy modifications to make it even more versatile.

A. Prompting for URLs

You can allow users to enter URLs each time they run the script:

echo "Enter URLs to download (separate by spaces):" read -a URLS # Take input as an array

Now, when the script runs, it’ll prompt the user to enter article URLs, which it will then use to download.

B. Setting a Custom Save Directory

To allow users to specify a custom directory:

echo "Enter the directory to save articles (default is $SAVE_DIR):" read CUSTOM_SAVE_DIR SAVE_DIR="${CUSTOM_SAVE_DIR:-$SAVE_DIR}" # Use default if no input mkdir -p "$SAVE_DIR" # Ensure the directory exists

Now, the user can specify a directory to save articles, or press Enter to use the default Articles folder.

C. Downloading in Different Formats (PDF)

Some websites offer PDF versions of articles. If so, you can specify these URLs to download PDFs instead of HTML:

URLS=( "https://example.com/news/article1.pdf" "https://example.com/news/article2.pdf" )

Tip: Many sites that offer downloadable PDFs often end their links with .pdf or provide a download button on the page.

D. Adding a Timestamp to Filenames

To avoid overwriting files, you could add a timestamp to each filename:

TIMESTAMP=$(date +"%Y%m%d_%H%M%S") FILENAME="${TIMESTAMP}_$(basename "$URL").html"

This will give you unique filenames like 20241113_123456_article1.html, which are organized by date and time.

Step 4: Advanced Options with curl

For those who want even more control, here are a few useful curl options you can add to your script:

  • Silent Mode (-s): Hides curl’s progress bar. Add it to curl -sL ... if you want a cleaner output.
  • Retry on Failure (--retry): Retries the download in case of failure.

Example:

curl -sL --retry 3 "$URL" -o "$SAVE_DIR/$FILENAME"

This will attempt the download up to 3 times in case of network issues.


Step 5: Putting It All Together

Here’s the final version of the script with all the modifications:

#!/bin/bash echo "Enter the directory to save articles (default is ~/Articles):" read CUSTOM_SAVE_DIR SAVE_DIR="${CUSTOM_SAVE_DIR:-$HOME/Articles}" mkdir -p "$SAVE_DIR" echo "Enter URLs to download (separate by spaces):" read -a URLS for URL in "${URLS[@]}"; do TIMESTAMP=$(date +"%Y%m%d_%H%M%S") FILENAME="${TIMESTAMP}_$(basename "$URL").html" curl -sL --retry 3 "$URL" -o "$SAVE_DIR/$FILENAME" echo "Downloaded: $FILENAME" done

Step 6: Running the Script

Run your script with:

./download_articles.sh

The script will prompt you for a save location and URLs, download the articles, and save them with a timestamped filename. You’re all set to read offline!


Wrapping Up

With just a few lines of Bash and curl, you’ve got a personalized article downloader! Try expanding this further by adding features like automatic updates, downloading images or other media, or even sending files to a Kindle.

Now you’re ready to enjoy your offline reading sessions, all powered by a simple script.