Attachments as Google Docs

From ESM Wiki
Jump to: navigation, search

Contents

Introduction

This document provides an example integration of Email Stream Management with Google Docs Web Service. The idea is to have some selected email attachments to be automatically uploaded to a Google Docs account for a specific user. In this example integration MPP does the following:

  • Selects email attachment (body part) by file name extension.
  • Retrieves OAuth config for each recipient from MySQL database.
  • Uploads selected attachments (body parts) to Google Docs for each recipient using retreived OAuth config.

MPP configuration options for this example intergration are provided in example config file. MySQL database structure and some useful queries are provided in example database structure.

OAuth database

To create a database structure use CREATE TABLE statements from example database structure file. If you are using existing QReview database users table might be already there so its creation can be skipped.

MPP retrieves OAuth config for each recipient from MySQL database and uses retrieved config for authentication with Google Docs service. But first of all OAuth database should be filled with data. This is not a job of MPP Core. There should be a standalone Web application that will interact with an administrator to register OAuth service and with a user to grant access to his account for the application. Such an application is outside the scope of this example.

However for testing purposes one can fill OAuth MySQL database manualy:

  • Fill an entry for Google Docs service by executing:
INSERT INTO `oauth_service` (`type`, `uri`, `oauth_consumer_key`, `oauth_consumer_key_secret`)
VALUES ('docs.google.com', 'http://docs.google.com/feeds', 'anonymous', 'anonymous');

SET @oauth_service_id=LAST_INSERT_ID();
  • If you would like to use existing database user substitute your@email.com with appropriative value and execute:
SELECT @user_id:=`id` FROM `users` WHERE `address`='your@email.com';
  • Or fill an entry for new database user by executing;
INSERT INTO `users` (`address`, `verified`)
VALUES ('your@email.com', 1);

SET @user_id=LAST_INSERT_ID();
  • Register an account on http://docs.google.com if you don't have one for tesing purposes.
  • Go to http://googlecodesamples.com/oauth_playground/index.php
  • In section 1 Choose your Scope(s) select Documents List.
  • In section 2 Modify the OAuth Parameters select oauth_signature_method=HMAC-SHA1, oauth_consumer_key=anonymous and consumer secret=anonymous. Other fields should be blank.
  • Proceed with sections 3 and 4. You will be redirected to log into an account for which you are going to obtain OAuth authentication information. After authorizing a request you will be redirected back to OAuth Playground site.
  • Proceed with section 5. After that notate from section 2 values of oauth_token and oauth_token_secret. You may need to convert these values from percent encoding to their original values because google sometimes present them percent-encoded.
  • Fill an entry for OAuth user by substituting your_oauth_token and your_oauth_token_secret with values from previous step and executing:
INSERT INTO `oauth_user` (`user_id`, `oauth_service_id`, `oauth_token`, `oauth_token_secret`)
VALUES (@user_id, @oauth_service_id, 'your_oauth_token', 'your_oauth_token_secret');

After all these steps you will have database filled with required OAuth data for a user with address your@email.com. You may repeat this steps for another user if needed.

Pipeline

Strip body pipeline is used because this pipeline is executed at the end of all processing when spam/virus filtering is applied and decision that message is good already made. Pipeline is enabled with <strip_body_enabled> and engines are specified with <strip_body_engines>. The pipeline consists from the following engines:

  • select_body_part
  • mysql.oauth_docs_google_com
  • http.upload_body_part

Select body part

Select body part is a boorex engine that selects body parts (attachments) to be uploaded to Google Docs. It starts with <boorex id="select_body_part"> element.

There is one regular expression with "file_match" ID defined using <i id="file_match"> engine option. The expression will match any string that ends with one of suffixes: ".doc", ".pdf", ".xls", ".txt", ".rtf", ".xml" or ".zip". The expression is in PCRE format.

The engine exports one result with "is_match" ID defined using <result id="is_match"> option. The result has one case that checks if body part file name matches "file_match" expression. If it matches then result is evaluated to "yes" otherwise default evaluation to "no" value is done. The result value will be accessible in consequent engines in the pipeline through $engines.select_body_part.is_match macro. The result will be evaluated for each body part due to implicit iteration principle.

MySQL engine

MySQL engine incorporates database connection settings and OAuth for docs.google.com query method. It starts with <mysql> element.

Connection settings are provided with compound <connection> option. Its sub-options are self-descriptive. You should change them to fit your MySQL installation. In this example database user defined with <user> option doesn't have password. However if a user must have a password you should add <passwords> option that specifies path to passwords file in common MPP policy group, create this file and add appropriative password entry to it and references this entry using <password_id> sub-option for the <connection> as following:

<mppd>
    <common>
        ...
        <passwords>/path/to/passwords/file.txt</passwords>
        ...
    </common>

    <engines>
        <mysql>
            <connection>
               <host>newhendrix</host>
               <port>3306</port>
               <database>yourdatabase</database>
               <unix_socket>/var/run/mysqld/mysqld.sock</unix_socket>
               <defaults_file>/etc/my.cnf</defaults_file>
               <user>username</user>
               <password_id>user_password_id</password_id>
            </connection>
        </mysql>
        ...
    </engines>
</mppd>

File /path/to/passwords/file.txt must contains key-value pairs of password ID's and password itself. Pairs must be one per line. Password ID goes first then goes password itself separated by space:

...
user_password_id user_password
...

Password ID is an arbitrary string that is used to reference corresponding password from XML config. This mechanizm is provided to prevent unwanted passwords leaks because config files are often exchanged between different parties for reviewing, illustration and other purposes at development and troubleshooting stages.

OAuth for docs.google.com

OAuth for docs.google.com is a method of MySQL engine that does retrival of OAuth config from database. It starts with <query id="oauth_docs_google_com"> element.

The method has a precondition that will cause method action to be executed if any of body parts (see rules for iteration through precondition dimensions in main iteration algorithm) is matched by previous select body part engine in the pipeline.

Option <template> specified MySQL query template. The query retrieves OAuth service and user configs for each recipient. The template expression is CDATA-bracketed because it contains special XML character. The template contain only one macro expression ${escape $recipient} which is a MySQL escaped SMTP recipient address.

The method exports a series of results with ID's "oauth_..." and "uri". These are OAuth user and service config. Results have specific to MySQL engine structure. Option <if_empty_table> defines evaluation branch in case MySQL query doesn't return any result otherwise branch defined with <if_filled_table> is used. If query doesn't return any result a value will be empty ($empty macro) otherwise it will be equal to value of corresponding field in first row of query result table ($field macro).

Another result that the method exports has "nonce" ID. It is an unique value that will be used in oauth_nonce authentication parameter for further HTTP requests. The value doesn't related to any data in the database. It is programmatically generated string that is a concatenation of current timestamp ($time macro) and 32-bit counter ($counter32 macro). The result must depend on body part that is why it has a case (otherwise redundant) that depends on body part.

HTTP Engine

HTTP engine encorporates SSL options and upload body part method. It starts with <http> element.

Option <ssl_context> with <load_veriy_file> sub-option should be specified because services may use https:// scheme. Options defines Certificate Authorities (CA) to use for verifying peer. File /usr/share/ssl/cert.pem is a default file that comes with OpenSSL installation. It contains certificates of many widespread CA's. The location may be different for different installations (for example /etc/pki/tls/cert.pem) so this options should be modified to work with a specific installation.

Upload body part

Upload body part is a method of HTTP engine that uploads selected body parts to Google Docs using retrieved OAuth config. It starts with <query id="upload_body_part"> element.

The method has a precondition that will cause method action to be executed if a body part has been matched and OAuth data has been retrieved. For this purpose precondition uses results exported by previous engines and accessed with $engines.select_body_part.is_match and $engines.mysql.oauth_docs_google_com.oauth_token macros.

Option <method> specifies HTTP request method. Option <uri> specifies HTTP request URI that is a concatenation of service URI ($engines.mysql.oauth_docs_google_com.uri macro) and additional component according to Google Documents List Data API specification.

Option <headers> specifies HTTP request headers. Its value is CDATA-bracketed to be safe with special XML characters. The most complex expression is for Authorization header. It is composed to comply with OAuth Core 1.0 specification. The following macros and expressions are used:

  • $engines.mysql.oauth_docs_google_com.nonce refer to a unique identifier for each request.
  • $time is used to put timestamp into each request.
  • $uri_escape is used to do percent (URI) encoding.
  • $encode_base64 is used to do Base64 encoding for signature.
  • $hmac is used to generate HMAC-SHA1 signature.
  • The following expression is a signature base string:
POST&${uri_escape $engines.mysql.oauth_docs_google_com.uri/default/private/full}&${uri_escape
oauth_consumer_key=${uri_escape $engines.mysql.oauth_docs_google_com.oauth_consumer_key
}&oauth_nonce=$engines.mysql.oauth_docs_google_com.nonce&oauth_signature_method=HMAC-SHA1&oauth_timestamp=$time&oauth_token=${uri_escape
$engines.mysql.oauth_docs_google_com.oauth_token}&oauth_version=1.0}
  • Next expression is a HMAC-SHA1 key:
${uri_escape $engines.mysql.oauth_docs_google_com.oauth_consumer_key_secret}&${uri_escape $engines.mysql.oauth_docs_google_com.oauth_token_secret}

Option <entity> defines HTTP request entity which is body part data in this case.

See also

Personal tools