Last night for some unknown reason I thought about my digital read/review pile. I've been a happy user of Pocket for a while now, but I've noticed some friction in saving and reading articles on there. Curious, I looked for alternatives. The most promising alternative was EmailThis, which looked like the perfect alternative. In brief, you give it a URL and it'll email the readable text to you. Since I already have a "read_review" folder under my email that would mean that I wold have one less "inbox" to check for articles.
I signed up for it and tried to get it to accept my email address. Unfortunately EmailThis never sent me a confirmation email. I even signed up for one year of Premium at $19 because I was certain this was going to be the solution. Nothing. I was so close to having what I wanted, but I was denied because of some technical issues on their end.
Then I thought "there has to be a library for this under Python". Indeed, I was one of the maintainers for the breadability package that has sadly gone moribund because of lack of interest and other factors. I did a search for alternatives to that and came up with an alternative that met the criteria I was looking for:
- Maintained
- Semi-current
- Didn't require me to learn the totality of scraping a web-page to get readable content from it
That package was readability-lxml, which, while being a port the Ruby version of the JavaScript version of readability, did enough of what I needed it to do. Namely, take a web page and reformat it.
This, of course, was not the project that I needed to undertake at 11pm, but start it I did. I took a few URLs and rant them through. The results were promising, but I needed to do a little bit more to make the program work 100%.
This morning I tackled the rest of it (getting Python to send an email, putting together a rudimentary interface, etc.). By 11am I managed to get something working and working well. So much so that it not only scratched the itch that I'd attributed to EmailThis but also scratched the itch of Pocket as well.
Here's the code (released under the public domain because a) it works for me, and b) I don't want to fix the internet one site at a time. If this works for you, great! If not, great! I did my time helping with breadability and most of that was finding corner cases for how things didn't work properly.):
readlater.py
(released into the public domain)
#!/usr/bin/env python3
import click
import smtplib
import ssl
import requests
from readability import Document
from config import HEADERS, receiver_email, smtp_server, port, sender_email
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
def send_email(content, site):
context = ssl._create_unverified_context()
try:
server = smtplib.SMTP(smtp_server, port)
server.ehlo() # Can be omitted
server.starttls(context=context) # Secure the connection
server.ehlo() # Can be omitted
msg = MIMEMultipart()
msg['From'] = f"Read Later <{sender_email}>"
msg['To'] = receiver_email
msg['Subject'] = site
msg.attach(MIMEText(content.summary(), 'html'))
text = msg.as_string()
server.sendmail(sender_email, receiver_email, text)
except Exception as e:
# Print any error messages to stdout
print(e)
finally:
server.quit()
def fetch(site):
response = requests.get(site, headers=HEADERS)
content = Document(response.text)
return content
@click.command()
@click.argument('site')
def main(site):
content = fetch(site)
send_email(content, site)
if __name__ == "__main__":
main()
config.py
HEADERS = {
"User-Agent":
"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0"
}
receiver_email = "craig@localhost"
smtp_server = "localhost"
sender_email = "craig@localhost"
port = 25
Most of the work for this was trying to figure out how to get mail sent to my creaky localhost mail server (you will need to modify the code to make this work for another, better set-up mail server. There is an excellent article that I followed at RealPython for Sending Emails with Python), but once that was completed I was able to save webpages with ease. Even better, I exported my Pocket "Saves" and wrote a quick loop in Bash Script to import those. Once I did some checking to see if things were working I then deleted my Pocket account.
I'm not recommending this as a solution for most users (heck, most folks have a relationship with email that can be best summarized as "adversarial"), but it's something that works for how my mind and tools work. Now I have only one location to check for read/review instead of two. Now I can use either mutt
or Thunderbird for reading any articles that I've saved. And, best of all, I have immediate feedback if something worked or not, as opposed to checking Pocket to realize that it gave up, or captured just a few headers from a broken site. And that's fine: I'm not looking to fix the web, I'm just looking to read articles that are interesting to me. In 12 hours (including sleeping time) I managed to achieve this. Even better, I can update this code as my needs change, and I don't need to maintain subscriptions for the privilege.
Sometimes the best way to scratch an itch is to think about what needs scratching and test those theories to find out if they hold up.