Removing Useless Warnings Inserted into E-Mails

Some time ago I had a bit of an issue with e-mails being modified by replacing links with so-called safelinks, which are effectively trackers, make it impossible to check links and make sharing links considerably less secure, among some other issues. Meanwhile, these people, so concerned about my security, started to add headers to e-mails warning me if that e-mail is not from Cardiff University.

It takes about one day to get used to it and ignore it, so the whole attempt is ineffective, as far as I can tell. More recently, they also started to put these headers in front of signed and even encrypted messages. The message itself is placed in a mime-attachment as a forwarded message. Luckily, at least for now, it is not further modified, even if the whole practice is in principle altering records. It still makes handling encrypted message and verifying signed messaged more complicated for no reason (any e-mail program can as well just display a warning for every incoming e-mail…). Overall, the effect of this is that e-mails are now even less secure and even more hassle.

Well, we can fix it, easily, with a mail filter (and as a side-effect, add the domains doing this to the list of potential attackers… I leave that to you in whatever system you are using there). It’s an extension of the original filter from the above post fixing the links and filtering the messages. So far, this worked reliably and can just be used in a procmail or similar e-mail filter. Here it is (it needs python 3 with urllib, email and BeautifulSoup packages – see imports).

#!/usr/bin/env python3
#
# email_cardiff_filter.py - fix Cardiff University e-mail security problems.
# Version 0.6
# Copyright (C) 2019-2021 Frank C Langbein, frank@langbein.org
#
# Dedicated to all Cardiff University system admins who waste my time.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program.  If not, see <‍https://www.gnu.org/licenses/‍>.

import sys
import argparse
import re
import urllib.parse
import email
from os.path import expanduser
from bs4 import BeautifulSoup

def replace_urls(str):
  # Fix malicous safelink links/trackers in str and return modified string.
  safelinks_pattern = re.compile(r'https?://[a-zA-Z0-9.-]*\.safelinks\.protection\.outlook\.com/\?url=((?:[^&]|%[0-9a-fA-F]{2})+)&[-a-zA-Z0-9+/&;=%.]*')
  res = ''
  pos = 0
  for match in safelinks_pattern.finditer(str):
    res += str[pos:match.start()] + urllib.parse.unquote(match.expand(r'\1'))
    pos = match.end()
  return res+str[pos:]

def fix_text(msg, ty, cte_default):
  # Fix text/{html,plain} blocks by removing modified text and fixing urls
  #    msg         - email message block
  #    ty          - type of block (expects text/html or text/plain)
  #    cte_default - default content transfer encoding, from e-mail header
  # Result is update to msg.

  # Decode block
  cs = str(msg.get_content_charset("utf-8"))
  text = msg.get_payload(decode=True).decode(cs, 'ignore').strip()

  # Check transfer encoding and make sure its in header
  cte_flag = False
  for k in reversed(range(len(msg._headers))):
    if msg._headers[k][0].lower() == 'content-transfer-encoding':
      cte = msg._headers[k][1].lower()
      cte_flag = True
  if not cte_flag:
    if len(text) == len(text.encode()):
      msg.add_header("Content-Transfer-Encoding","7bit")
      cte="7bit"
    else:
      msg.add_header("Content-Transfer-Encoding",cte_default)
      cte=cte_decfault

  # Cleanup text
  if ty == "text/html":
    soup = BeautifulSoup(text, 'html.parser')
    for illegal in soup.find_all(string=re.compile('External *email.*Cardiff *University'), limit=1):
      # Move to top-level of block (block is right after body)
      while illegal is not None and illegal.parent is not None and illegal.parent.name != "body":
        p = illegal.parent
        # Move up, making sure we remain first child (otherwise text is later in message, so not removed)
        k = 0
        while p.contents[k] == '\n':
          k = k + 1
        if p.contents[k] == illegal:
          illegal = p
        else:
          illegal = None
      # Remove illegal text block (if at start of message)
      if illegal is not None:
        for nxt in illegal.find_next_siblings(limit=2):
          if nxt.name == "br":
            nxt.decompose()
        illegal.decompose()
    text = str(soup)
  else:
    text = str(re.sub(r'^External *email.*Cardiff *University.*dolenni\.[\r\n]*', '', text, flags=re.S))

  # Fix URLs
  text = replace_urls(text)

  # Re-encode block
  try:
    msg.set_payload(text,cs)
  except UnicodeEncodeError:
    msg.set_payload(text,"utf-8")
  if cte == "base64":
    # Check if encoding worked and if not, switch to quoted-printable.
    # Needed as sometimes a base64 transfer-encoding header seems to be ignored.
    new_text = msg.get_payload(decode=True).decode(cs, 'ignore').strip()
    if new_text != text:
      msg.replace_header("Content-Transfer-Encoding","quoted-printable")
      try:
        msg.set_payload(text,cs)
      except UnicodeEncodeError:
        msg.set_payload(text,"utf-8")

def extract_cardiff_forward(msg):
  # Check if the multi-part mime message msg contains an actual signed or encrypted
  # forwarded e-mail, and this is just an attempt to modify the message that no one
  # actually cares about but makes decryption and validation of signatures harder,
  # so makes e-mail less secure.

  # Pattern indicating records have been altered
  idi_pattern_start = re.compile(r'^External *email.*Cardiff *University')
  idi_end = ' ddolenni.'

  # Find content modification
  idi_found = False
  for p in msg.walk():
    ty = p.get_content_type()
    if idi_found and ty == "message/rfc822":
      msg = str(re.sub(r'^Content-Type: .*[\r\n][\r\n]', '', str(p)))
      return email.message_from_string(msg)
    elif ty == "text/plain":
      # Decode block
      text = p.get_payload(decode=True).decode(str(p.get_content_charset("utf-8")), 'ignore').strip()
      if idi_pattern_start.match(text) and text[-len(idi_end):] == idi_end:
        # Danger, message modified - fixing
        idi_found = True

  return msg # Nope, other reason

if __name__ == '__main__':
  # Arguments parsing, only for basic housekeeping
  parse = argparse.ArgumentParser(description='Fix security problems with Cardiff University e-mails.')
  args = parse.parse_args()

  # Read message and process parts
  msg = email.message_from_file(sys.stdin)
  cte = msg.get("Content-Transfer-Encoding","7bit")
  # Do not modified if signed
  mod = True
  ct = msg.get_content_type()
  if ct[0:10] == "multipart/":
    msg = extract_cardiff_forward(msg) # Check if they've attmpted to modify a signed or encrypted message
  ct = msg.get_content_type()
  if ct == "multipart/signed":
    mod = False # Do not modify signed messages
  else:
    for p in msg.walk():
      ty = p.get_content_type()
      if ty == "application/pgp-signature":
        mod = False
  if mod:
    # Fix e-mail
    for p in msg.walk():
      ty = p.get_content_type()
      if ty == "text/html" or ty == "text/plain":
        # Fix message text and links in text
        fix_text(p, ty, cte)

  # Cleanup headers
  try:
    hdrs = [l.strip() for l in open('./headers.lst')]
  except:
    try:
      hdrs = [l.strip() for l in open(expanduser('~/etc/email/headers.lst'))]
    except:
      print("Cannot read headers.lst")
      quit()

  for k in reversed(range(len(msg._headers))):
    if not msg._headers[k][0].lower() in hdrs:
      del(msg._headers[k])

  print(msg)

The script also reads a headers.lst file and removes any headers not in that list – I cannot trust what is in those other headers and do not need them, so I simply remove them. It’s simple to comment this out. The headers.lst file is generated and regularly updated with this script (just run as a cronjob from time to time; make sure you have curl installed).

#!/bin/bash
#
# email_headers_update - get list of headers for mail/mime
# Version 0.2
# Copyright (C) 2019,2021 Frank C Langbein, Cardiff University, frank@langbein.org
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program.  If not, see <‍https://www.gnu.org/licenses/‍>.

test -r ./headers.lst || mkdir -p ~/etc/email && cd ~/etc/email

cat ‍>headers.tmp <‍<‍EOD
from
subject
date
to
return-path
envelope-to
delivery-date
received
dkim-signature
domainkey-signature
message-id
mime-version
content-type
thread-topic
thread-index
x-originating-ip
x-autoresponse-suppress
x-originatororg
x-sa-exim-connect-ip
x-sa-exim-mail-from
x-sa-exim-version
x-sa-exim-scanned
x-spam-checker-version
x-spam-level
x-spam-status
EOD

test -f headers.lst && cat headers.lst ‍>‍>‍headers.tmp

curl -s https://www.iana.org/assignments/message-headers/perm-headers.csv | tail -n +2 | while read l; do
  h="`echo $l | cut -d, -f1 | tr 'A-Z' 'a-z'`"
  p="`echo $l | cut -d, -f3 | tr 'A-Z' 'a-z'`"
  test "$p" = "mail" -o "$p" = "mime" && echo $h ‍>‍>‍headers.tmp
done

sort -u <‍headers.tmp ‍>‍headers.lst
rm -f headers.tmp

It should be relatively simple to adapt to other setups if you have some basic coding skills.

I am not doing anything on the web-interface for this. It should be simple to hide the messages in question or even open the encrypted/signed attachments with a grease/tampermonkey script. But I’ve not been on that web interface for eternities and do not intend to return to it. Just store all messages locally, remove them from the uncontrollable server and have some minimal peace.

(And sorry, I cannot support windows or macOS… the above should help, but I can’t do more for these hopeless platforms).

The license for both scripts is AGPL-3.0-or-later.

The angry little girl on the feature image for this article comes from here: https://tenor.com/view/mad-angry-angry-girl-angry-little-girl-gif-11979588. It’s a near-perfect match.

Cite this page as 'Frank C Langbein, "Removing Useless Warnings Inserted into E-Mails," Ex Tenebris Scientia, 26th February 2021, https://langbein.org/removing-useless-warnings-inserted-into-e-mails/ [accessed 27th July 2024]'.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Frank C Langbein
Ex Tenebris Scientia

Frank C Langbein
Ex Tenebris Scientia

Removing Useless Warnings Inserted into E-Mails