Automate Blog Monitoring

Staying on top of all things Marketing can feel like trying to drink from a firehose. New blog posts, articles, and updates pop up faster than you can blink, making it a challenge to keep track of what’s new.

But what if you could have a digital assistant that automatically monitors your favorite blogs and alerts you the moment new content is posted?

That’s exactly what we have here—a simple yet powerful Python app that keeps its finger on the pulse of your favorite websites so you never miss a beat.

What Does The App Do?

This Python-based app is a digital watchdog that monitors a list of URLs for new blog posts. Every day, it checks these URLs to see if a fresh post has been published. If it finds something new, it sends an alert directly to your email inbox with the title and a link—so you can read it right away.

  • Get Instant Updates: Know immediately when new content is published, so you can be the first to share it, comment, or take action.
  • Save Time: Automate the repetitive task of checking websites daily. Spend that time doing something more productive instead.
  • Stay Ahead of Competitors: For marketers and content creators, being first to spot and react to new trends can set you apart from the crowd.

Where You Can Get It

I’ve open-sourced this initial version, which is set to only monitor one URL. All the code is here on Github if you want to take it and build off it.

The Nerdy Details

I leveraged a few key Python libraries to make this work:

  • Requests: To fetch the HTML content from the URLs we’re monitoring.
  • BeautifulSoup: To parse the HTML and find the exact element that signals a new post.
  • SQLite: A lightweight database to store the last known post from each site.
  • smtplib: To send email alerts whenever a new post is detected.
  • schedule: To automate the daily checks, ensuring the app runs like clockwork.

2. Setting Up the Project

I started by organizing the project into a neat directory, including our main Python script (main.py), a text file to list the URLs (urls.txt), and a SQLite database (monitor.db) to store data. I also set up a configuration file (config.py) to handle email settings securely.

3. Writing the Code

The code is straightforward yet powerful. I wrote functions to fetch and parse the latest blog posts from each URL. Then compared the current post to the last recorded one, stored in the SQLite database. If a new post is found, the app sends an email alert right away.

4. Automating with Cron

To make sure the app checks for new posts daily without needing manual intervention, I set up a cron job. This little task scheduler runs the script automatically at 9:00 AM every day, ensuring we get your updates without lifting a finger.

There’s a number of ways to do this but here’s how I’ve set up the cron job to run locally on my Mac. This ensures the system is running it right from the app directory and then it outputs error/success messages to a cron_log file.

0 9 * * * cd /local_path_to_your_app_directory && /path_to_python /local_path_to_your_app_directory/main.py >> /local_path_to_your_app_directory/cron_log.txt 2>&1

5. Handling Permissions and Security

On macOS, I had to grant Terminal and Python the necessary permissions to run without interruption. This step was crucial to ensure the app operates smoothly in the background.

Note: For the email deliverability, I went with a simple Gmail approach but you can obviously build off of that. For Gmail, you will have to enable two factor authentication on the sender account so that you can create a secure App Password. This password is required and you would add it to the config.py file.

Another important note, on running the cron job locally. I had to jump through a bit of trial and error getting it to run properly on MacOS.

At first, the URLs.txt file couldn’t be found when the job would run. I realized that this error was due to how cron operates in a different environment and working directory than when running manually.

Cron jobs, especially those running as the root user, operate in a default working directory (/private/var/root), which was different from where the script and urls.txt were located.

The key change that helped resolve the issue was switching to absolute paths for urls.txt and the log file. This ensured that regardless of the working directory, the script could always locate the necessary files.