Engine

From ESM Wiki
Jump to: navigation, search

Contents

Introduction

Engine is a building block of processing logic. It is an abstraction of a precompiled component that implements useful lower-level functionality. Main purpose of an engine is to perform some specialized atomic action. An action can be querying Web Serview or database, saving file, generating MIME formated body part, etc. Administrator should consult specification for each engine type to find out what action it performs. List of available engine types is here.

This article describes logic and configuration constructs that are common to all engines though specific engine may modify this constructs or introduce its own. Engine consists from the following logic parts:

  • Precondition defines under what condition engine action should be performed and result calculated.
  • Action defines what an engine does. It is main value of an engine.
  • Result defines what data produced by engine action will be visible for consequent engines and postprocessors in a pipeline.

Configuration

Engine is configured as a branch of XML tree in MPP configuration file (known as mppd.conf.xml). Engine branch root element is a child element of /mppd/engines.

Engine type

Root element name defines engine type. Administrator have to choose from available engine types those that fit his/her needs. Engine type defines a structure of engine branch.

Engine ID

Each engine has unique ID. It is assigned with id attribute for an engine root element. ID has the following syntax:

engine_id = 1*(ALPHA  |  DIGIT  |  “_”);

If ID is not specified explicitly it is considered to be equal to element name (i.e. engine type name). ID is used to reference an engine in a pipeline and to reference engine result in subsequent engines and postprocessors.

As an example lets consider the following XML:

<mppd>
    <engines>
        <boorex id=”check_regex”>
            ...
        </boorex>
        <mysql id=”db”>
            ...
        </mysql>
        <file id=”public_ftp”>
            ...
        </file>
        <mime>
            ...
        </mime>
    </engines>
</mppd>

In this example one defines three engines: boorex engine with ID check_regex, mysql engine with ID db and file engine with ID public_ftp. For mime engine ID is not defined so it is considered to have ID equal to its type name: mime.

Engine method

Some engine types are not supposed to be used directly in a pipeline but they rather serve as containers for several related engines. In that case each inner engine is called method. Method root element is a child of engine root. Method type is defined with its root element name as for engine. Available method types are defined by their container engine. Each method has ID which is unique withing its container. Several methods may share some options defined in their container. Method can be used in any context where ordinary engine can be used. In that sense method is engine. The difference is that method is referenced through its container, i. e. ID that a method is referenced in a pipeline (sometimes called full or global ID) is combined from engine ID and method ID separated with '.' (dot) character. Thus full ID may be defined as following:

engine_full_id =   engine_id                  ; for simple engines
                 | (engine_id "." method_id)  ; for engine with methods

For example the following XML defines two query methods withing mysql engine:

<mppd>
    <engines>
        <mysql id=”db”>
            <query id=”check_sender”>
                ...
            </query>
            <query id=”check_recipient”>
                ...
            </query>
        </mysql>
    </engines>
</mppd>

First method will have full ID db.check_sender and second method – db.check_recipient. Thus for example in WBL pipeline this methods will be specified as:

<wbl_engines>db.check_sender, db.check_recipient</wbl_engines>

Engine option

Internal structure of engine (method) branch consists from XML elements called options. Option may be compound or leaf. Compound option consists fully from child elements - another options. It doesn't have any unstructured character data besides indentation whitespaces. From the other hand leaf option consists only from character data. It doesn't have any child elements. Compound options are used for structuring data while leaf options are "work horses" that hold data. Option may have XML attributes. Attributes always hold data (they are always leafs).

Leaf option (or attribute) may be loadtime or runtime. Loadtime option is evaluated only once during load/reload of MPP and stays the same for all requests. It may not depend on request (runtime) properties. Its value in XML is its final value that will be used in processing of all requests. From the other hand runtime option is evaluated during each request processing transaction and may be different for different requests. It may depend on request (runtime) properties. Its value in XML is a template that defines rules to calculate its final value that will be used in processing of current request.

Syntax and evaluation rules of a runtime option template is based on those of template or condition. In most cases they are fully equivalent. However some options may modify them though not very "deeply". Individual option specification should reflect all details about an option.

An important note about option and attribute values syntax is that they are XML character data so some characters that are special to XML markup must be escaped according to XML rules. In option values this characters are "<", ">" and "&". In attribute values they are the same plus """ (double quote) and "'" (single quote). Alternative and preferable method for long and escapes-rich option values is to use "<![CDATA[" and "]]>" brackets. This way there is no need to escape inside this brackets. The only thing to track is occurence of "]]>" inside brackets. It is very unlikely but if it happens then each occurence must be replaced with "]]]]><![CDATA[>" otherwise XML parse will incorrectly iterpret a value.

Each option is related to one of logic parts of an engine: precondition, action or result.

Lets consider config snippet from Salesforce Emails as Tasks sample:

<mppd>
    <engines>
        <http id="salesforce_http">

	    <method>POST</method>
            
            <ssl_context>
                <load_verify_file>/usr/share/ssl/cert.pem</load_verify_file>
            </ssl_context>

            <headers>
                Date: $http_date
                Content-Type: text/xml; charset=utf-8
                Accept: text/xml; charset=utf-8
                User-Agent: MPP
                Cache-Control: no-cache
                Pragma: no-cache
                SOAPAction: ""
            </headers>

            <query id="attach_attachment_to_opportunity">

                <precondition>$engines.salesforce_http.find_opportunity.Id $NE $empty</precondition>

                <uri>$engines.salesforce_http.login.serverUrl</uri>
                
                <entity><![CDATA[<?xml version="1.0"?>
                    <soapenv:Envelope
                        xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
                        xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                    >
                        <soapenv:Header>
                            <SessionHeader xmlns="urn:enterprise.soap.sforce.com">
                                <sessionId>$engines.salesforce_http.login.sessionId</sessionId>
                            </SessionHeader>
                        </soapenv:Header>
                        
                        <soapenv:Body>
                            <create xmlns="urn:enterprise.soap.sforce.com">
                                <sObjects xsi:type="Attachment" xmlns="urn:sobject.enterprise.soap.sforce.com">
                                    <Body>${xml_cdata $attachment.base64}</Body>
                                    <ContentType>${xml_cdata $attachment.mime_type}</ContentType>
                                    <Name>${xml_cdata $attachment.file_name}</Name>								
                                    <ParentId>$engines.salesforce_http.find_opportunity.Id</ParentId>
                                </sObjects>    
                            </create>
                        </soapenv:Body>
				
                    </soapenv:Envelope>
                    ]]>
                </entity>
			
                <result id="errors_message">
                    <result>${xml_xpath //*[local-name()='errors']/*[local-name()='message']}</result>
				
                    <validate>
                        <case>
                           <condition>$result $NE $empty</condition>
                           <log level="error">Salesforce attach_attachment_to_opportunity fault: $result</log>
                           <throw>CRM failure</throw>
                        </case>
                    </validate>
                </result>
			
                <result id="faultstring">
                    <result>${xml_xpath //*[local-name()='faultstring']}</result>
				
                    <validate>
                        <case>
                            <condition>$result $NE $empty</condition>
                            <log level="error">Salesforce attach_attachment_to_opportunity fault: $result</log>
                            <throw>CRM failure</throw>
                        </case>
                    </validate>				
                </result>
			
                <result id="Id">
                    <result>${xml_xpath //*[local-name()='id']}</result>
                </result>
			
            </query>
        </http>
    </engines>
</mppd>

In this snippet:

  • There is one http engine with "salesforce_http" ID.
  • <method> is a leaf loadtime option related to action.
  • <ssl_context> is a compound option related to action.
  • <load_verify_file> is a leaf loadtime option related to action.
  • <headers is a leaf runtime option related to action. Its syntax is based on template syntax.
  • query method with "attach_attachment_to_opportunity" ID for the engine.
  • <precondition> is a leaf runtime option related to precondition. It has condition syntax.
  • <uri> is a leaf runtime option related to action. It has template syntax.
  • <entity> is a leaf runtime option related to action. It has template syntax. CDATA brackets are used to address XML escaping requirements.
  • <result> are compound options related to result.

Precondition

Precondition defines under what condition engine action should be performed and result calculated. If precondition is not met the action will not be performed. What will happen with result is described further in Null result section.

In most cases precondition is configured with one or more <precondition> options for an engine. These options are leaf runtime options that have condition syntax. They are processed in the following way:

  • During request processing when control reaches an engine implementation checks whether one or more preconditions are defined for an engine.
  • If no precondition specified then control goes to performing an action unconditionally.
  • If one or more preconditions are defined implementation starts evaluating preconditions in turns as they appear in XML.
  • If a precondition is evaluated to false then it is not met, evaluation stops, action is skipped and result gets null value.
  • If a precondition is evaluated to true then it is met and evaluation goes to next precondition.
  • If all preconditions are met control goes to performing an action.

In the following example:

<mppd>
    <engines>
        <mysql>
            <query>
                <precondition>$engines.resume.is_match $EQ yes</precondition>
                <precondition>$engines.check_sender.is_good $EQ yes</precondition>
                ...
            </query>
        </mysql>
    </engines>
</mppd>

there are two preconditions that checks for results from previous engines. One can rewrite this example to have one precondition:

<precondition>$engines.resume.is_match $EQ yes $AND $engines.check_sender.is_good $EQ yes</precondition

Action

Action defines what an engine does. It is main value of an engine. Action is configured with zero or more engine options. Action and its configuration is an engine-specific subject and should be found in engine specification. For a list of available engines see here.

For example in the following XML:

<mppd>
    <engines>
        <file>
            <save id=”save_attachment”>
                <relative>
                    <path>/$year-$month-$day/$recipient/${time6}${counter16}</path>
                </relative>
                <data>$body_part</data>
                ...
            </save>
        </file>
    </engines>
</mppd>

compound option <relative> and leaf option <data> configures file saving action for engine file.save_attachment. Both options are runtime by definition.

Result

Result defines what data produced by engine action will be visible for consequent engines and postprocessors in a pipeline. Result value is a string. This string is accessed by consequent engines using special macro. The way this string is accessed by postprocessors depends on specific postprocessor. The process of calculating result value and making it visible to consequent engines and postprocessors sometimes called exporting (or it is said that an engine exports a result). Engine may export multiple different results.

Though specific engine may provide its own construct for result configuration there is a basic construct that is used in most engines and described here. Configuration starts from compound engine option <result> which is the root for all further options related to this particular result. Multiple <result> options may be specified. In that case each different option will configure exporting of different result.

Result ID

Value of a loadtime id attribute for <result> option defines result ID. Result ID must be unique for results within an engine. It has the same syntax as engine ID:

result_id = 1*(ALPHA  |  DIGIT  |  “_”);

Consequent engines and postprocessors will access a result value by its full ID constructed as following:

result_full_id = "engines." full_engine_id "." result_id;

Special macro to access result value from a template or condition is constructed as following:

result_macro = "$" result_full_id;

In the following example:

<mppd>
    <engines>
        <boorex id="check_for_pattern">
            ...
            <result id="is_match_sender">
                ...
            </result>

            <result id="is_match_recipient">
                ...
            </result>
        </boorex>

        <http id="web_service">
            ...
            <query id="post_message">
                ...
                <result id="message_uri">
                    ...
                </result>
            </query>
        </http>
    </engines>
</mppd>

there are two engines with full ID's check_for_pattern and web_service.post_message. First engine exports two results with ID's is_match_sender and is_match_recipient. Corresponding full ID's are engines.check_for_pattern.is_match_sender and engines.check_for_pattern.is_match_recipient. Second engine exports one result with ID message_uri. Corresponding full ID is engines.web_service.post_message.message_uri. In templates and conditions of consequent engines all this results will be available the following macros:

$engines.check_for_pattern.is_match_sender
$engines.check_for_pattern.is_match_recipient
$engines.web_service.post_message.message_uri

It is allowed to have one result without explicit ID within an engine. In this case result is exported as engine ID. For example:

<mysql>
    <query id="insert_message">
        ...
        <result>
            <set>$mysql_insert_id</set>
        </result>
    </query>
</mysql>

will be exported as:

$engines.mysql.insert_message

Result cases

Result option consists from zero or more <case> options and single <set> option for default result. Case option in turns consists from single <condition> and single <set>. Cases represents conditional evaluations. Default result represents default evaluation if no match case found. All inner <set> options are templates and all <condition> options are conditions. For example:

<mppd>
    <engines>
        <http>
            <query>
                ...
                <result id="wbl_condition">
                    <case>
                        <condition>${http_header X-Is-Whitelisted} $EQ yes</condition>
                        <set>whitelist</set>
                    </case>
                    <case>
                        <condition>${http_header X-Is-Blacklisted} $EQ yes</condition>
                        <set>blacklist</set>
                    </case>
                    <set>none</set>
                </result>
            </query>
        </http>
    </engines>
</mppd>

In this example result with ID wbl_condition is evaluated to "whitelist", "blacklist" or "none" depending on values of HTTP headers.

Implementation evaluates result in the following way:

  • Iterates through all <case>'s in the order as they appear in XML.
  • For each case evaluates its <condition>. If condition is evaluated to true then evaluates its <set>, sets obtained value as value of a result and finishes evaluation.
  • If condition is evaluated to false goes to next condition.
  • If no case is found with matched condition or there are no case defined at all evaluates default <set> and sets obtained value as value of a result and finishes evaluation.

In some older configs instead of <set> it may appear <result> option. However it is depricated for the favour of <set>.

Result with validation

Result with validation is an enhanced version of basic result construct. The main purpose of enhanced version is to facilitate validation of data during and after result evaluation and to do something useful if data is invalid. In addition to basic features it allows:

  • Write log message into MPP log.
  • Terminate execution of current pipeline with an error condition.

Result with validation is configured the same way as basic result but there are additional options available:

  • <log> is a template option that defines log message to write into MPP log. Attribute level defines at which log level to write. Possible values are: "always", "crit", "error", "warning", "info", "debug" and "debug_data". Multiple log options are allowed in the same scope in which case messages will appear in log in order they are specified in XML. Log options may appear as children of <case>'s or result itself. They are evaluated before corresponding <set>'s and <throw>'s (see further).
  • <throw> is a template option that defines error message to terminate execution of current pipeline with. Only one throw is allowed in the same scope. Throw options may appear wherever log options may. They are evaluated after all logs but before corresponding <set>'s. One should be careful when designing throw message template. Possibilities to obtain long and/or multiline messages should be excluded to avoid SMTP and other protocols failures.
  • <validate> is a compound option that is evaluated after result value is obtained. It almost resemblers the structure of result with validation. The difference is that no internal <set> options are allowed (no sense to calculate result value because it is already obtained), all template and condition options inside <validate> can use macro $result that is evaluated to a nearly obtained result value and inner <validate> is not allowed (it doesn't make sense to have it).

For example:

<mppd>
    <engines>
        <http>
            <query>
                ...
                <result id="wbl_condition">
                    <case>
                        <condition>$action.response.code.category $NE 2</condition>
                        <log level="error">Not successful response received: $action.response.code $action.response.reason</log>
                        <throw>WBL failure</throw>
                        <set>error</set>
                    </case>
                    <case>
                        <condition>${http_header X-Is-Whitelisted} $EQ yes</condition>
                        <log level="debug">Found X-Is-Whitelisted</log>
                        <set>whitelist</set>
                    </case>
                    <case>
                        <condition>${http_header X-Is-Blacklisted} $EQ yes</condition>
                        <log level="debug">Found X-Is-Blacklisted</log>
                        <set>blacklist</set>
                    </case>
                    <case>
                        <condition>${http_header X-WBL-Condition} $NE $empty</condition>
                        <log level="debug">Found X-WBL-Condition with value "${http_header X-WBL-Condition}"</log>
                        <set>${http_header X-WBL-Condition}</set>
                    </case>
                    <set>none</set>

                    <validate>
                        <case>
                            <condition>$result $EQ whitelist</condition>
                            <log level="info">Message is whitelisted.</log>
                        </case>
                        <case>
                            <condition>$result $EQ blacklist</condition>
                            <log level="info">Message is blacklisted.</log>
                        </case>
                        <case>
                            <condition>$result $EQ none</condition>
                        </case>
                        <log level="error">Unrecognized WBL condition: $result</log>
                        <throw>WBL failure</throw>
                    </validate>
                </result>
            </query>
        </http>
    </engines>
</mppd>

In this example:

  • First <case> checks whether HTTP response code is not successful, puts detailed message into log at error level and throws short generic WBL failure message. Result will never gets evaluated though it is required to be present.
  • Second and third <case>'s check if corresponding HTTP response headers have "yes" values, log this fact at debug level and sets corresponding result values.
  • Fourth <case> checks for another HTTP response header (that is an alternative way for remote server to report result), log the value of the header at debug level and sets result value to the value of the header.
  • Default <set> (the one after all cases) sets result value to "none" if no case condition where matched.
  • Option <validate> evaluates after result is obtained. It checks in <case>'s for allowed values (invalid value may come from fourth result case) and logs white/blacklist events at info level. Note that $result macro is used that refer to nearly obtained result. If no case condition where matched it logs a message at error level and throws short generic WBL failure message.

Result with tagging

Result with tagging allows to associate tag(s) in a form of name=value with:

  • Entire message. Tags are then available in next pipelines through $tag.name and $tag.value macros. Tags are also stored in archive database if MPP archival is used. Message tag is added by specifying tagName="some_name" for <set> option. In this case calculated result value besides been exported is also added as tag with specified name. For example:
<boorex id="classify_with_regex">
    ...
    <result>
        <case>
            <condition>$body ~= some_expression
            <set tagName="classify_with_regex">fit</set>
        </case>
        <set>$empty</set>
    </result>
</boorex>

A tag classify_with_regex=fit will be added if email body text matches expresion with id some_expression.

Null result

Result may get special null value in the following cases:

Null value is distinct from any possible string value of a result (including empty and "null" strings). Null value has the following properties:

  • In precondition or action it will cause engine to skip its action and to evaluate all its results to null.
  • In result it will cause engine to evaluate corresponding result to null.
  • Some macros (like $not_empty) may change previous properties. Macro specification should reflect this fact.

This properties are the basis for precondition chaining principle.

Personal tools