Scratching the itch (reading things later)

Last night for some unknown reason I thought about my digital read/review pile. I've been a happy user of Pocket for a while now, but I've noticed some friction in saving and reading articles on there. Curious, I looked for alternatives. The most promising alternative was EmailThis, which looked like the perfect alternative. In brief, you give it a URL and it'll email the readable text to you. Since I already have a "read_review" folder under my email that would mean that I wold have one less "inbox" to check for articles.

I signed up for it and tried to get it to accept my email address. Unfortunately EmailThis never sent me a confirmation email. I even signed up for one year of Premium at $19 because I was certain this was going to be the solution. Nothing. I was so close to having what I wanted, but I was denied because of some technical issues on their end.

Then I thought "there has to be a library for this under Python". Indeed, I was one of the maintainers for the breadability package that has sadly gone moribund because of lack of interest and other factors. I did a search for alternatives to that and came up with an alternative that met the criteria I was looking for:

Maintained
Semi-current
Didn't require me to learn the totality of scraping a web-page to get readable content from it

That package was readability-lxml, which, while being a port the Ruby version of the JavaScript version of readability, did enough of what I needed it to do. Namely, take a web page and reformat it.

This, of course, was not the project that I needed to undertake at 11pm, but start it I did. I took a few URLs and rant them through. The results were promising, but I needed to do a little bit more to make the program work 100%.

This morning I tackled the rest of it (getting Python to send an email, putting together a rudimentary interface, etc.). By 11am I managed to get something working and working well. So much so that it not only scratched the itch that I'd attributed to EmailThis but also scratched the itch of Pocket as well.

Here's the code (released under the public domain because a) it works for me, and b) I don't want to fix the internet one site at a time. If this works for you, great! If not, great! I did my time helping with breadability and most of that was finding corner cases for how things didn't work properly.):

readlater.py (released into the public domain)

#!/usr/bin/env python3

import click
import smtplib
import ssl
import requests
from readability import Document
from config import HEADERS, receiver_email, smtp_server, port, sender_email
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText


def send_email(content, site):
    context = ssl._create_unverified_context()
    try:
        server = smtplib.SMTP(smtp_server, port)
        server.ehlo()  # Can be omitted
        server.starttls(context=context)  # Secure the connection
        server.ehlo()  # Can be omitted
        msg = MIMEMultipart()
        msg['From'] = f"Read Later <{sender_email}>"
        msg['To'] = receiver_email
        msg['Subject'] = site
        msg.attach(MIMEText(content.summary(), 'html'))
        text = msg.as_string()
        server.sendmail(sender_email, receiver_email, text)
    except Exception as e:
        # Print any error messages to stdout
        print(e)
    finally:
        server.quit()


def fetch(site):
    response = requests.get(site, headers=HEADERS)
    content = Document(response.text)
    return content


@click.command()
@click.argument('site')
def main(site):
    content = fetch(site)
    send_email(content, site)


if __name__ == "__main__":
    main()

config.py

HEADERS = {
        "User-Agent":
        "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0"
        }
receiver_email = "craig@localhost"
smtp_server = "localhost"
sender_email = "craig@localhost"
port = 25

Most of the work for this was trying to figure out how to get mail sent to my creaky localhost mail server (you will need to modify the code to make this work for another, better set-up mail server. There is an excellent article that I followed at RealPython for Sending Emails with Python), but once that was completed I was able to save webpages with ease. Even better, I exported my Pocket "Saves" and wrote a quick loop in Bash Script to import those. Once I did some checking to see if things were working I then deleted my Pocket account.

I'm not recommending this as a solution for most users (heck, most folks have a relationship with email that can be best summarized as "adversarial"), but it's something that works for how my mind and tools work. Now I have only one location to check for read/review instead of two. Now I can use either mutt or Thunderbird for reading any articles that I've saved. And, best of all, I have immediate feedback if something worked or not, as opposed to checking Pocket to realize that it gave up, or captured just a few headers from a broken site. And that's fine: I'm not looking to fix the web, I'm just looking to read articles that are interesting to me. In 12 hours (including sleeping time) I managed to achieve this. Even better, I can update this code as my needs change, and I don't need to maintain subscriptions for the privilege.

Sometimes the best way to scratch an itch is to think about what needs scratching and test those theories to find out if they hold up.

Craig Maloney More than you cared to know

Scratching the itch (reading things later)

navigation

navigation

links

social