How to extract OLE object from MS Access database
Years ago we developed simple application in MS Access for keeping different requests from the users in one place, we chose MS Access as a front end and one existing MySQL server for the back-end (our decision to use MySQL was a deliberate one – anyway, it well might be any other RDBMS, it’s not important fact in the context of this memo).
Basically the design of the application was very simple, when someone from DBA team received an e-mail request from the user he/she entered some meta information in the database plus Word document that was attached to the message by simply copy/pasting word document in MS Access. The inserted document was stored in MySQL blob field as OLE document (doc with some binary meta data envelope). Here is the screenshot:
The small preview image of the “inserted” Word document for example is stored in MySQL BLOB field as part of the OLE envelope. As long as we’re happy with the MS Access as a front end we can always extract the doc by double clicking the preview image or by right clicking the image and selecting Open from the context menu.
We plan to rewrite the application, most likely with Oracle Apex on top of Oracle XE. We faced the problem of how to migrate our data from MySQL to Oracle XE. Word documents stored in blob fields are really not true (binary) .doc files any more, but OLE objects (some binary header plus doc). We thought that finding official documentation about OLE object header would be easy and that all we’ll have to do is to cut off the OLE header from each document file stored in blob field. We’re wrong.
To make long story short, we found the solution on Stephen Lebens site. He wrote MS Access application that can extract (any) OLE content from the mdb file and save the document in native format. You can download ExtractInventoryOLE.zip from here.
First we imported table with OLE objects from MySQL to local mdb file and then extracted the word documents with above application. Several hundreds Word documents where dumped without a hitch in less than a minute.
Here is a screenshot from ExtractInventoryOLE to get a filling:
You can select unique field (in our case it was a primary key for the table; a numeric sequence) that is used to name the documents. The good news is that you can extract the OLE objects even if OLE Server that was used to insert the document is not present/installed on the machine (for example MS Word, Excel etc.).