Outlook of no return
Yesterday I got a crazy idea that I wanted to migrate some mail for mailinglists from Outlook to a mail application I use for mailinglists. It turned out to be easier said than done.
First I tried to save a mail in Outlook, but I could only save it as text (in Swedish and did just include subject, from, to, date (also in Swedish), and body), or as binary files. Hurray!
Then I tried to export a folder of messages to a comma separated file. I tinkered a bit with Ruby, got the csv library to parse the file and was able to create one file per mail. But, I later found out that the exported file from Outlook did “just” include subject, body, from name, from address, from type (e.g. “SMTP”), to name, to address, to type, cc name, cc address, cc type, bcc name, bcc address, bcc type, billing information, sensitivity, categories, priority, travel allowance (km). That is a lot of information! But not anything about when it was sent, the id of the mail, the headers, and a lot of other interesting stuff. Hence, comma separated files were a dead end. Sigh!
Somehow I thought that I would get what I wanted by using COM and accessing Outlook directly. It must be possible since there are several synchronization applications for example Palm. But I had almost no clue about how to use COM (the only time I’ve messed with it before was a couple of months ago in Java). So I turned to Python. After a lot of tinkering I managed to access the different mails in Outlook and read some of the information (it was not completely trivial). So I started by printing stuff like date, from, to, subject, and body. But, of course, the date was not a valid rfc822 date. And apparently is strptime() only available on some versions of Unix, but the Python Cookbook came to the rescue with a pure Python version of strptime(). So now I could parse the date and create a rfc822 compatible date.
But then I got greedy. By using Outlook I can read the headers, why should I not be able to do that in Python? So I googled around for some time and eventually I found an object model of the MailItem Outlook Object. But the headers are nowhere to find! But then I found the OutlookSpy which is an Outlook plugin that you can use to browse the objects in Outlook. There it was! My search seemed to be over: the property with the tag PR_TRANSPORT_MESSAGE_HEADERS. But how do you access that? After some time of trial and error I posted a question to comp.lang.python. I got a pointer to msgstore.py in the SpamBayes project. But the code was a little bit hard to understand.
I had given up at this point. But as I was writing the above I decided to take a look at the script again and noticed the test at the bottom. That didn’t look too tricky. And after a while I managed to use it to get both headers and body of a mail. Success! Now I have exported all the 487 messages to individual files, and then imported the messages into the other mail application. Now I can sleep with a smile on my face! :) Good night!) ds.
ps. What are they (Microsoft) thinking? Why do they lock their users? Why aren’t they embracing open standards? When will I be trusted with my own data? Why, oh, why? Perhaps I should get that sleep now :
Zero comments