Microsoft Outlook / Office365 URL Rewriting in E-Mails

Microsoft outlook / office365 changes URLs in e-mails as part of their Advanced Threat Protection service. The intention, presumably, is to protect users from malicious links in e-mails by warning them before they go to the link location. It seems to be a premium service, with some control over it by the domain administrators, but users do not appear to be able to change it. Changing e-mail bodies is not just highly annoying, but this approach also has serious security and privacy issues, as noted before. Assume your snail mail would not just arrive opened, but edited with some parts crossed out and replaced by very hard to decipher text, and sometimes someone comes back and does some more editing. While this is not a recent development and I would normally not use office365 anyway, I am forced to deal with it as Cardiff University’s e-mail runs via this service. Of course, it can be solved by undoing the edits.

The Behaviour

If you send an e-mail containing links, such as

https://langbein.org/

to an e-mail address managed by outlook/office365, the links it contains are rewritten, e.g., to

https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flangbein.org%2F&data=01%7C01%7CLangbeinFC%40c*********.uk%7C0a***********332dbe008d679455a9c%7Cbdb74b3095684856bdbf06759778fcbc%7C1&sdata=B8QMlN4M***********z%2Bq0lVkNypUu%2B0BYsN0mv2k4%3D&reserved=0

(this is for demonstration only, the above link has been modified so it does not actually work). So the links are redirected via some host under *.safelinks.protection.outlook.com to check them and log access. In a text e-mail, the links are changed directly. In an HTML e-mail, the links’ HREF target is changed, but the text remains the original text/link. Signed or encrypted e-mails are not modified, so there is a simple way to protect links you send to people. However, most users do not seem to use this, sadly.

The Problem

With the greatest respect, I shall assume that this has been implemented due to complete incompetence combined with the desire to protect users and the assumption that such users are quite dumb, rather than malicious intent. However, the result is the same and such behaviour means that those ~~pretending~~ trying to protect users have to be treated as malicious, too.

It actually reduces security, as it makes checking links manually by users more difficult. Malicious links are usually long, hidden in the HREF attribute, etc. to try to hide the fact that they point to a malicious site. Turning links into long, very hard to manually parse strings, means it becomes nearly impossible for users to detect that themselves. It further suggests to users that someone is actually checking those links for them, so links may be more trusted again and users may not check anymore at all. It is, however, very unlikely that mircosoft, or anyone else, is able to check all links. Also, valid links, in particular in HTML e-mails, may all of the sudden look like malicious links, for those users that keep checking.

Worse, it is a serious privacy violation, as it enables administrators of the e-mail domains/safelink sites, and anyone managing to gain access to these legally or illegally, to track the links users are following. Of course the administrators, etc. can also check e-mails directly, but this makes it not just simpler, but also enables to check which links were followed. So there is a very high potential for information the users do not want anyone else to know to leak. And what if that URL is a link to share sensitive information via links with hash sums to gain access? These links are then available in even more databases than just the e-mail (while the e-mail may be deleted, the logs may not).

Even worse, if the e-mail with a modified link is forwarded to someone else or published somewhere, then others using that link are also tracked, via the forwarding user’s reference. It does not just violate the original user’s privacy, but also anyone they forward the e-mail to is tracked (also meaning it is actually not possible to say who accessed the link). The issues on security are similarly transferred to those users. And how long will the links work? E.g. if such a link makes it into a publically archived mailing list, then the links may easily become unusable eventually (maybe even hopefully, once the microsoft domain goes down).

The final question is if it actually enhances security, even if not all sites are classified correctly by that link tracking service. As signed and encrypted e-mails are not verified, there is a simple way around it, by attackers signing their e-mail or encrypting them for users for which keys are available. They could even create URLs that look similar to the rewritten links, making unsuspicious users even more likely to follow them without checking. While signing/encrypting adds some costs for an attacker, the general practice of rewriting URLs may actually mean they get more usable results. Moreover, one cannot actually expect anyone to be able to say whether a link is malicious or not and some links may be marked as malicious because they are inconvenient for whoever does the classification. So the whole approach is flawed, creating more rather than fewer problems.

Fix It!

If the service provider becomes the enemy, then we have to fix it. The basic approach is, of course, to undo the rewriting in one way or another. There are problems with that in case the undo actually damages the link. Below are two solutions: a mail filter rewriting the links for forwarded mail; a greasemonkey / tampermonkey script for modifying the links life on the office365 site. These scripts have not been tested for all situation, so please use them carefully and completely at your own risk (of damaging or loosing your e-mails). Alternatively, one could block or reroute the DNS records for the safelink URLs via changing the DNS records in a local DNS server and using a different host to respond to the requests.

Forward and Reverse

If you are reading your e-mail on another server, e.g. via collecting them via fetchmail or forwarding them, then it is quite simple to add a filter that reverses the rewrite. Here is a simple python3 script that can be used with a mail filter (maildrop, sieve, procmail, etc – apply it to mails where the body contains safelinks.protection.outlook.com or the e-mail contains a header like x-ms-exchange-safelinks-url-keyver) that should reverse the URLs to their original form. This comes with a warning that it may damage some links and e-mails, as the pattern used may not work for all scenarios.

#!/usr/bin/python3
#
# ms_safelink_filter.py - substitute microsoft e-mail safelinks with original links
# Version 0.4
# Copyright (C) 2019 Frank C Langbein, Cardiff University, frank@langbein.org
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program.  If not, see .

import sys
import argparse
import os
import re
import datetime
import urllib.parse
import quopri
import email

parse = argparse.ArgumentParser(description='Substitute Microsoft e-mail safelinks with original links.')
parse.add_argument('--store', metavar='dir', type=str, default='', help='directory to store url files, if given')
args = parse.parse_args()

pattern = re.compile(r'https?://[a-zA-Z0-9.-]*\.safelinks\.protection\.outlook\.com/\?url=((?:[^&]|%[0-9a-fA-F]{2})+)&[-a-zA-Z0-9+/&;=%.]*')
urls = []

def repl_url(str):
  res = ''
  pos = 0
  for match in pattern.finditer(str):
    if args.store != '':
      urls.append(str[match.start():match.end()])
    res += str[pos:match.start()] + urllib.parse.unquote(match.expand(r'\1'))
    pos = match.end()
  return res+str[pos:]

def fix_text(msg):
  cs = str(msg.get_content_charset("utf-8"))
  t = msg.get_payload(decode=True).decode(cs, 'ignore').strip()
  for k in reversed(range(len(msg._headers))):
    if msg._headers[k][0].lower() == 'content-transfer-encoding':
      del(msg._headers[k])
  if len(t) == len(t.encode()):
    msg.add_header("Content-Transfer-Encoding","7bit")
  else:
    msg.add_header("Content-Transfer-Encoding","8bit")
  msg.set_payload(repl_url(t),cs)

msg = email.message_from_file(sys.stdin)
for p in msg.walk():
  ty = p.get_content_type()
  if ty == "text/html" or ty == "text/plain":
    fix_text(p)

if len(urls) > 0:
  with open(os.path.join(args.store,"urls-" + datetime.datetime.now().strftime("%Y-%m-%d")),'a+') as f:
    for url in urls:
      f.write(url+"\n")

print(msg)

Improve the Office365 Web Application

If you are using the web interface for outlook/ofice365, then there is greasemonkey / tampermonkey which runs additional javascripts on specific websites. A script to change the URLs on the outlook pages, as the python script above, is relatively simplye to write. I do not use this myself, so this is minimally tested, but should work in principle.

// ==UserScript==
// @name         ms_safelink_filter
// @namespace    https://langbein.org/
// @version      0.1.1
// @description  Substitute microsoft e-mail safelinks with original link
// @author       Frank C Langbein, Cardiff University, frank@langbein.org
// @copyright    APGL 3, https://www.gnu.org/licenses/
// @include      https://outlook.office365.com/*
// @grant        none
// ==/UserScript==

(function() {
    // Simple patterns for link rewrite
    var host = 'safelinks.protection.outlook.com'; // rewrite links that match this
    var before = '?url='; // start of URL in link
    var after = '&'; // text after URL in link

    // Rewrite link href and text
    function rwLink(link) {
        var ndx = link.href.indexOf(host);
        if (ndx != -1) {
            ndx = link.href.indexOf(before);
            if (ndx != -1) {
                var nLnk = link.href.substring(ndx + before.length);
                ndx = nLnk.indexOf(after);
                if (ndx != -1) nLnk = nLnk.substring(0, ndx);
                link.href = unescape(nLnk);
            }
        }
        ndx = link.textContent.indexOf(host);
        if (ndx != -1) {
            ndx = link.textContent.indexOf(before);
            if (ndx != -1) {
                var nTxt = link.textContent.substring(ndx + before.length);
                ndx = nTxt.indexOf(after);
                if (ndx != -1) nTxt = nTxt.substring(0, ndx);
                link.textContent = unescape(nTxt);
            }
        }
    }

    (function () {
        // Execute rewrite on new/modified links if page is modified
        document.addEventListener('DOMNodeInserted', function(event){
            if (!event || !event.target || !(event.target instanceof HTMLElement)) return;
            var node = event.target;
            if (node instanceof HTMLAnchorElement) rwLink(node);
            var links = node.getElementsByTagName('a');
            for (var i = 0; i < links.length; rwLink(links[i++]));
        }, false);
    })();

    // Initially rewrite links
    var links = document.getElementsByTagName('a');
    for (var i = 0; i < links.length; rwLink(links[i++]));
})();

Control your DNS...

Approaches like piHole redirect DNS queries to a local server, serving usually empty or small files instead of the original intended one. This is useful to block adds and any tracking sites, etc. As *.safelinks.protection.outlook.com is effectively a tracking site, it should be added to the domains blocked. This is easy to do with piHole and similarly runnig your own DNS server enables you to do this directly in the DNS server configuration.

Of course you could also redirect such links to your own server and process them by filtering or forwarding them to the intended site. This is also simple to code and may avoid the URL rewriting issues due to pattern matching problems above and would cover any access from within the network using the DNS server. Yet, if the format of the redirects change, this may stop working and it does not solve the problem of actually having long, quite usless links in e-mails, forwarding and publishing these.

Postscript

Some more or less useful script to check the links, for info.

#!/usr/bin/python3
#
# ms_safelink_check.py - check microsoft safelinks
# Version 0.2.4
# Copyright (C) 2019 Frank C Langbein, Cardiff University, frank@langbein.or
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program.  If not, see .

import sys
import argparse
import os
import requests
import re
import random
import time
import urllib
import datetime

pattern_link = re.compile(r'https?://[a-zA-Z0-9.-]*\.safelinks\.protection\.outlook\.com/\?url=((?:[^&]|%[0-9a-fA-F]{2})+)&[-a-zA-Z0-9+/&;=%.]*')
pattern = re.compile(r'https?://[a-zA-Z0-9.-]*\.safelinks\.protection\.outlook\.com/')

aws_url = "**REPLACE_URL**/check_redirect"
aws_api_key = "**REPLACE_KEY**"

parse = argparse.ArgumentParser(description='Check microsoft safelinks')
parse.add_argument('path', metavar='dir', type=str, default='', help='directory with url files')
parse.add_argument('--remove_older', metavar='N', type=int, default='30', help='remove url files oder than N days [default: 30]')
parse.add_argument('--max_repeat', metavar='R', type=int, default='1', help='maximum time to repeat requests [default: 1]')
parse.add_argument('--min_repeat', metavar='r', type=int, default='1', help='minimum time to repeat requests [default: 1]')
parse.add_argument('--max_sleep', metavar='S', type=float, default='0.1', help='maximum seconds to sleep between repeats [default: 0.1]')
parse.add_argument('--min_sleep', metavar='s', type=float, default='1', help='minimum seconds to sleep between repeats [default: 1]')
parse.add_argument('--blocked', metavar='p', type=str, default='', help='call when URL is blocked')
parse.add_argument('--aws', action='store_true', help='check url via aws')
args = parse.parse_args()

today = "urls-" + datetime.datetime.now().strftime("%Y-%m-%d")

rep = random.randint (args.min_repeat,args.max_repeat)
while rep > 0:
  for file in [f for f in os.listdir(args.path) if os.path.isfile(os.path.join(args.path, f)) and f != today ]:
    fn  = os.path.join(args.path,file)
    if os.stat(fn).st_mtime < time.time() - args.remove_older * 86400:
      os.remove(fn)
    else:
      with open(fn) as f:
        urls = f.read().splitlines()
      for url in urls:
        for match in pattern_link.finditer(url):
          real_url = urllib.parse.unquote(match.expand(r'\1'))
        if args.aws:
          try:
            req = requests.get(aws_url, headers = {"x-api-key":aws_api_key,'x-url':url})
            if req.text == 'BLOCK':
              if args.blocked != '':
                os.system(args.blocked+" '"+url+"'")
              print(req.status_code, "Blocked: " + real_url)
            elif req.text == 'FAIL':
              print(req.status_code, "Failed: " + real_url)
            else:
              print(req.status_code, req.text)
          except:
            pass
        else:
          try:
            req = requests.get(url, allow_redirects=False)
            while req.status_code == 302 and pattern.match(req.headers['Location']) != None:
              req = requests.get(req.headers['Location'], allow_redirects=False)
            try:
              print(req.status_code, req.headers['Location'])
            except:
              if args.blocked != '':
                os.system(args.blocked+" '"+url+"'")
              print(req.status_code, "Blocked: " + real_url)
          except:
            print(sys.exec_info())
            pass
      rep -= 1
      if rep > 0:
        time.sleep(random.uniform(args.min_sleep,args.max_sleep))

# AWS Lambda - check_redirect
# Useful if blocked/piHoled locally
#
# import re
# from botocore.vendored import requests
#
# pattern = re.compile(r'https?://[a-zA-Z0-9.-]*\.safelinks\.protection\.outlook\.com/')
#
# def lambda_handler(event, context)
#   url = event['headers']['x-url']
#   try:
#     req = requests.get(url, allow_redirects=False)
#     while req.status_code == 302 and pattern.match(req.headers['Location']) != None:
#       req = requests.get(req.headers['Location'], allow_redirects=False)
#     try:
#       res = req.headers['Location']
#     except:
#       res = "BLOCK"
#   except:
#     res = "FAIL"
#     pass
#   return { 'statusCode': 200, 'body': res }

# python2 stub to check blocked URLs
#
# import sys
# import mechanize # not (yet?) supported by python3
#                  # could use selenium instead (more complex to setup)
#
# br = mechanize.Browser()
# br.set_handle_equiv(True)
# br.set_handle_redirect(True)
# br.set_handle_referer(True)
# br.set_handle_robots(False)
# br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# br.set_cookiejar(mechanize.LWPCookieJar())
# 
# res = br.open(sys.argv[1])
#
# try:
#   br.select_form(nr=0)
#   br.submit ()
#   # Process result...
# except:
#   pass

Cite this page as 'Frank C Langbein, "Microsoft Outlook / Office365 URL Rewriting in E-Mails," Ex Tenebris Scientia, 13th January 2019, https://langbein.org/microsoft-outlook-office365-url-rewriting-in-e-mails/ [accessed 27th July 2024]'.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Frank C Langbein
Ex Tenebris Scientia