Copy this starter script into your file:
This code will save articles to an Articles
folder in your home directory and sets up a list of example URLs to download.
Step 2: Download Articles Using curl
To download each article, we’ll loop through our list of URLs and use curl
to fetch each one. We’ll use basename
to give each file a meaningful name based on its URL:
Here:
basename
extracts the last part of the URL (e.g.,article1
), and we add.html
for readability.curl -L
follows any redirects (handy for news sites).-o "$SAVE_DIR/$FILENAME"
saves the file in ourArticles
folder with a unique name.
Run the script by making it executable and then executing it:
Step 3: Customizing the Script for Extra Flexibility
Want to give your script a personal touch? Here are some easy modifications to make it even more versatile.
A. Prompting for URLs
You can allow users to enter URLs each time they run the script:
Now, when the script runs, it’ll prompt the user to enter article URLs, which it will then use to download.
B. Setting a Custom Save Directory
To allow users to specify a custom directory:
Now, the user can specify a directory to save articles, or press Enter to use the default Articles
folder.
C. Downloading in Different Formats (PDF)
Some websites offer PDF versions of articles. If so, you can specify these URLs to download PDFs instead of HTML:
Tip: Many sites that offer downloadable PDFs often end their links with .pdf
or provide a download button on the page.
D. Adding a Timestamp to Filenames
To avoid overwriting files, you could add a timestamp to each filename:
This will give you unique filenames like 20241113_123456_article1.html
, which are organized by date and time.
Step 4: Advanced Options with curl
For those who want even more control, here are a few useful curl
options you can add to your script:
- Silent Mode (
-s
): Hidescurl
’s progress bar. Add it tocurl -sL ...
if you want a cleaner output. - Retry on Failure (
--retry
): Retries the download in case of failure.
Example:
This will attempt the download up to 3 times in case of network issues.
Step 5: Putting It All Together
Here’s the final version of the script with all the modifications:
Step 6: Running the Script
Run your script with:
No comments:
Post a Comment