Aufbau eines E-Mail-Archivierungssystems: Die Herausforderungen und natürlich die Lösung - Teil 1

Building an E-Mail Archiving System: Die Challenges and of Course the Solution – Part 1

Feb 4, 2019

Herausgegeben von

Jeff Goldstein

•

Kategorie:

E-Mail

Ready to see Bird
in action?

Termin für eine Demo

Aufbau eines E-Mail-Archivierungssystems: Die Herausforderungen und natürlich die Lösung - Teil 1

Über a year ago I wrote a blog on how to retrieve copies of emails for archival and viewing but I did not broach the actual storing of the email or related data, and recently I wrote a blog on storing all of the event data (i.e. when the email was sent, opens, clicks bounces, unsubscribes, etc) on an email for the purpose of auditing, but chose not to create any supporting code.

Angesichts der zunehmenden Nutzung von E-Mails in behördlichen Umgebungen habe ich beschlossen, dass es an der Zeit ist, ein neues Projekt zu starten, das all dies mit Codebeispielen für die Speicherung des E-Mail-Textes und aller damit verbundenen Daten zusammenfasst. Im Laufe des nächsten Jahres werde ich dieses Projekt weiter ausbauen, mit dem Ziel, eine funktionierende Speicher- und Anzeigeanwendung für archivierte E-Mails und alle von SparkPost erzeugten Protokollinformationen zu erstellen. SparkPost verfügt nicht über ein System zur Archivierung des E-Mail-Textes, aber es macht den Aufbau einer Archivierungsplattform recht einfach.

In this blog series, I will describe the process I went through in order to store the email body onto S3 (Amazon’s Simple Store Service) and all relevant log data in MySQL for easy cross-referencing. Ultimately, this is the starting point for building an application that will allow for easy searching of archived emails, then displaying those emails along with the event (log) data. Die code for this project can be found in the following GitHub repository: https://github.com/jeff-goldstein/PHPArchivePlatform

Dieser erste Eintrag der Blogserie wird die Herausforderung beschreiben und eine Architektur für die Lösung aufzeigen. In den übrigen Blogs werden Teile der Lösung zusammen mit Codebeispielen detailliert beschrieben.

Als erstes musste ich herausfinden, wie ich eine Kopie der an den ursprünglichen Empfänger gesendeten E-Mail erhalten würde. Um eine Kopie des E-Mail-Textes zu erhalten, müssen Sie entweder:

Erfassen Sie den E-Mail-Text vor dem Versenden der E-Mail
Den E-Mail-Server veranlassen, eine Kopie zu speichern
Lassen Sie den E-Mail-Server eine Kopie für Sie erstellen und speichern

Wenn der E-Mail-Server Elemente wie Link-Tracking oder Öffnen Sie-Tracking hinzufügt, können Sie die Nummer 1 nicht verwenden, da sie die Änderungen beim Open-/Klick-Tracking nicht widerspiegelt.

Das bedeutet, dass entweder der Server die E-Mail speichern oder Ihnen eine Kopie dieser E-Mail zur Speicherung anbieten muss. Da SparkPost nicht über einen Speichermechanismus für E-Mail-Textkörper verfügt, aber eine Möglichkeit hat, eine Kopie der E-Mail zu erstellen, werden wir SparkPost bitten, uns ein Duplikat der E-Mail zu senden, das wir in S3 speichern können.

This is done by using SparkPost’s Archive feature. SparkPost’s Archive feature gives the sender the ability to tell SparkPost to send a duplicate of the email to one or more email addresses and use the same tracking and open links as the original. SparkPost-Dokumentation defines their Archive feature in the following manner:

Die Empfänger in der Archivliste erhalten eine exakte Kopie der Nachricht, die an die RCPT TO-Adresse gesendet wurde. Insbesondere werden alle kodierten Links, die für den RCPT TO-Empfänger bestimmt sind, in den Archivnachrichten identisch sein

Die einzigen Unterschiede zur RCPT TO-E-Mail bestehen darin, dass einige der Kopfzeilen anders sind, da die Zieladresse für die Archivierungs-E-Mail eine andere ist, aber der Text der E-Mail ist eine exakte Kopie!

If you want a deeper explanation here is a link zum SparkPost documentation on creating duplicate (or archive) copies of an email.

Nebenbei bemerkt: SparkPost erlaubt es Ihnen, E-Mails an cc-, bcc- und Archivadressen zu senden. Bei dieser Lösung konzentrieren wir uns auf die Archivadressen.

* Hinweis * Archivierte E-Mails können NUR erstellt werden, wenn E-Mails über SMTP in SparkPost eingespeist werden!

Now that we know how to obtain a copy of the original email, we need to look am log data that is produced and some of the subtle nuances within that data. SparkPost tracks everything that happens on its servers and offers that information up to you in the form of message-events. Those events are stored on SparkPost for 10 days and can be pulled from the server via a RESTful API called message-events, or you can have SparkPost push those events to any number of collecting applications that you wish. The push mechanism is done through webhooks and is done in real time.

Currently, there are 14 different events that may happen to an email. Here is a list of the current events:

Bounce
ClickDelay
Lieferung
Ausfall der Generation
Generation Ablehnung
Ursprüngliche Öffnung
InjectionLink Abbestellen
Liste abbestellen
Open
Außerhalb der Band
Richtlinie AblehnungSpam-Beschwerde

* Follow dieser Link for an up to date reference guide for a description of each event along with the data that is shared for each event.

Each event has numerous fields that match the event type. Some fields like the transmission_id are found in every event, but other fields may be more event-specific; for example, only open and click events have geotag information.

One very important message event entry to this project is the transmission_id. All of the message event entries for the original email, archived email, and any cc and bcc addresses will share the same transmission_id.

There is also a common entry called the message_id that will have the same id for each entry of the original email and the archived email. Any cc or bcc addresses will have their own id for the message_id entry.

So far this sounds great and frankly fairly easy, but now is the challenging part. Remember, in order to get the archive email, we have SparkPost send a duplicate of the original email to another email address which corresponds to some inbox that you have access to. But in order to automate this solution and store the email body, I’m going to use another feature of SparkPost’s called Weiterleitung eingehender E-Mails. What that does, is take all emails sent to a specific domain and process them. By processing them, it rips the email apart and creates a JSON structure which is then delivered to an application via a webhook. See Appendix A for a sample JSON.

If you look real carefully, you will notice that the JSON structure from the inbound relay is missing a very important field; the transmission_id. While all of the outbound emails have the transmission_id with the same entry which binds all of the data from the original email, archive, cc, and bcc addresses; SparkPost has no way to know that the email captured by the inbound process is connected to any of the outbound emails. The inbound process simply knows that an email was sent to a specific domain and to parse the email. That’s it. It will treat any email sent to that domain the same way, be it a reply from a customer or the archive email send from SparkPost.

Der Trick ist also, wie man die ausgehenden Daten mit dem eingehenden Prozess verbindet, der gerade die archivierte Version der E-Mail abgerufen hat. Ich habe mich dafür entschieden, eine eindeutige Kennung im Text der E-Mail zu verstecken. Wie man das macht, bleibt jedem selbst überlassen, aber ich habe einfach ein Eingabefeld erstellt, in dem das versteckte Tag aktiviert ist.

Ich habe dieses Feld auch in den Metadatenblock des X-MSYS-API-Headers aufgenommen, der während der Injektion an SparkPost übergeben wird. Diese verborgene UID wird am Ende der Klebstoff für den gesamten Prozess sein. Sie ist eine Hauptkomponente des Projekts und wird in den folgenden Blogbeiträgen ausführlich behandelt.

Jetzt, da wir die UID haben, die dieses Projekt zusammenhält, und verstehen, warum sie notwendig ist, kann ich damit beginnen, die Vision des Gesamtprojekts und der entsprechenden Blogbeiträge zu entwickeln.

Erfassen und Speichern der Archiv-E-Mails zusammen mit einem Datenbankeintrag für die Suche/Indexierung
Erfassen aller Nachrichtenereignisdaten
Erstellen Sie eine Anwendung zur Anzeige der E-Mail und aller zugehörigen Daten

Hier ist ein einfaches Diagramm des Projekts:

build an email archiving system - diagram

The first drop of code will cover the archive process and storing the email onto S3, while the second code drop will cover storing all of the log data from message-events into MySQL. You can expect the first two code drops and blog entries sometime in early 2019. If you have any questions or suggestions, please feel free to pass them along.

Frohes Schicken.

- Jeff