Prosessing WORD template document with tags

Example: Merging WORD Template Document with Tags

A common requirement in many systems is the ability to create documents from a template and merge them with content from a database. While several plugins are available for this purpose, this example demonstrates how to achieve it using the ACF plugin.

Microsoft Word supports various document formats, including DOCX, DOC, RTF, and more. The DOCX format is essentially a ZIP archive, while the DOC and RTF formats represent single-file documents.

The easiest way to perform merge operations would, of course, involve plain text documents, as they do not contain any formatting information. However, this approach would not provide the flexibility to format the document as desired. You could consider using the Markdown format and then converting it to an HTML file, but this method does not support document styles such as headers, footers, and other requirements for paper documents or PDFs. The DOCX format can be challenging to process, as it involves a zipped document structure. The DOC format is a binary format and not easily handled for merging purposes. On the other hand, the RTF format is a text-based format but contains numerous formatting directives. This is the format we have chosen for our approach to merging documents.

In this solution, we use the RTF format for the template and then save the result with a .doc extension. This prompts Word to open it in compatibility mode, which is perfectly acceptable. You can later save it as DOCX if that's the desired format.

Let's take a closer look at the RTF format. If you save a Word document as RTF and open it in a plain text editor, you'll notice numerous directives starting with a backslash. These are directives that define various parameters for the document.

Suppose we opt to use double angle brackets on each side of our merging tag, like this:

<<Name>>

Our task is to open the RTF template, locate all the tags, and replace them with actual database content. While this may seem straightforward, issues can arise when users apply formatting within the tags. For instance, a tag like <<Address>> may become distorted if a user applies the "Bold" style to only the Address word of the tag, resulting in something like this:

<<}{\rtlch\fcs1 \ab\af31507 \ltrch\fcs0 \b\insrsid5650507\charrsid15009246 Address}{\rtlch\fcs1 \af31507 \ltrch\fcs0 \insrsid5650507 >>

How can we address this challenge?

Answer: We can use regular expressions. Specifically, we'll employ a carefully crafted regular expression with the ACF function regex_extract. This function allows us to extract tags from the document in a way that facilitates safe replacements, even when the tags contain formatting codes.

In regular expressions, we use "groups" (enclosed in parentheses) to extract specific portions of a match. With regex_extract, we obtain an array with one row for each match in the document. Each row contains a delimited list, including the full match and all the groups within it. One group contains the bare tag name without the << >> markers allowing us to identify the merge tag itself.

Here is the regular expression used in this solution. If you are not familiar with regex, it's OK, I have made and tested it:

string reg = "<<((?:€€[a-z0-9]*|\\}|\\{|\\s)*)([a-z0-9.\-_æøåÆØÅA-Z]*)((?:€€[a-z0-9]+|\\}|\\{|\\s)*)>>";

This regex captures four groups:

The full match
Formatting before the tag
The tag itself
Formatting codes after the tag

When performing replacements in the document, we replace the full match (1) with (2) followed by the text corresponding to (3) and (4).

As the backslash character (\) has a special meaning in regex and string literals in our code, and the document contains this character for each of the directives, it would be hard to distinguish them from each other. To simplify the regex, we temporarily use double euro signs (€€) as placeholders for backslashes in the document, making the regex easier to construct. After processing, we replace these placeholders back with backslashes.

In the example below, we've created a function to simplify the retrieval of tag text based on a tag's name, called GetSubstitute. This function accepts a tag as input and generates the corresponding text for that tag. While you can customize this function to suit your specific requirements, we've reserved the tag date to automatically return today's date in text format. Otherwise, we assume that the tag refers to a local variable in the script, which is set using the Set Variable script step. In this case, the tag name should be prefixed with a dollar sign ($). This allows the calling script to prepare all the tags referenced in the document before invoking the MergeRTFDocument function to complete the merge.

As mentioned earlier, RTF, being a Microsoft format, uses ISO-8859-1 encoding instead of UTF-8. To ensure that text from FileMaker, which is in UTF-8 encoding, displays correctly in the document, we need to perform a conversion. We accomplish this by using the ACF function from_utf, which takes the text as the first parameter and the encoding name as the second.

The ACF function eval is employed to retrieve the text from the locally prepared script variable in the calling script. If the variable happens to be nonexistent, FileMaker will return an empty string, causing the corresponding tag to be omitted from the document.

Here's the complete example:

Package DocuLib "Library for documentation project"

function DocuLib_Version ()
    return 10;
end

// Function to get the text to replace in the tag. 

function GetSubstitute(string tag)
    string ret;
    if ( tag == "Date") then
        ret = string(now(), "dd.mm.yyyy"); 
    else
    // Ref a FileMaker variable used from the calling script. 
    // As the text should be in ISO-8859-1 format, we do a conversion as well. 
        ret = from_utf( eval ( '$'+tag), "ISO-8859-1"); 
    end if
    // handle new-line characters in the replacement text, put on some RTF newline directives
    ret = substitute ( ret, char(13), "\r€€par ");
    return ret; 
end

// Perform the actual tag replacement.
function MergeRTFDocument ( string template, string outPutDoc )

    string docu; 
    
    // if empty template, then get the user to select it. 
    if ( template == "") then
        template = select_file ("Select a RTF File?");
    end if
    
    // If the user cancels the file selection, return an empty string.
    if ( template == "") then
        return ""; 
    end if
    
    // Get the content of the selected document.
    int x = open ( template, "r"); 
    docu = read ( x ) ; 
    close (x);
    
    // Substitute double euro signs to simplify the regex.
    docu = substitute ( docu, "\\", "€€"); 
    
    // Define the regex pattern.
    string reg = "<<((?:€€[a-z0-9]*|\}|\{|\s)*)([a-z0-9.\-_æøåÆØÅA-Z]*)((?:€€[a-z0-9]+|\}|\{|\s)*)>>"; 
    
    // Extract the tags from the document.
    array string results = regex_extract ( reg, docu,"|*|"); 
    
    array string match; 
    int i, z = sizeof ( results); 
    string s1, s2, s3; 
    
    // Loop through all the matches. 
    for ( i=1, z)
        match = explode ( "|*|", results[i]);
        s1 = match[2];  // Formatting before
        s2 = match[4];  // Formatting after
        // Replace the full match, call our "GetSubstitute" function to obtain the text for the tag.
        docu = substitute ( docu, match[1], s1+GetSubstitute (match[3])+s2 ); 
    end for
    
     // Replace the euro signs back with backslashes.
    docu = substitute ( docu , "€€", "\\");

     // Create a new document with the .doc extension.
    if ( outPutDoc == "") then
        outPutDoc = template + ".doc";
    end if
    
     // Write the modified content to the new document.
    x = open ( outPutDoc, "w"); 
    write ( x, docu); 
    close ( x );
    
    return "OK"; 

end

// From the script applied:
// ACF_run( "MergeRTFDocument"; ""; "")

From FileMaker, we use a script like this:

# Being in the document layout
Set Variable [ $Name; Value:Contacts::First Name & " " & Contacts::Last Name ]
Set Variable [ $Company; Value:Contacts::Company ]
Set Variable [ $Address; Value:Addresses::Address Line 1 & "¶" & Addresses::Address Line 2 & "¶" & Addresses::Postal Code & " " & Addresses::City ]
Set Variable [ $res; Value:ACF_run( "MergeRTFDocument"; "";"") ]

After a test run, we have this result:

Merged

Summary: This is a working example demonstrating how to perform merging in RTF documents for generating letters, contracts, and other documents from template files. In a fully functional solution, you would require a value list to select the template to use and some configuration settings to specify the location of the templates. Additionally, for archiving the resulting documents, further configuration settings would be necessary. To enhance user experience, you can utilize the openURL script step to automatically open the processed document in Microsoft Word, allowing users to add any additional details not covered by the merge operation. This comprehensive approach streamlines the document generation process.

The complete solution could also have some selection of canned text snippets to include in the document, making the document production faster and implying less afterwork with the merged document. A text field in the document table could hold the selected canned paragraphs and merged with a single merge tag.

We could also use the DocumentService functions in this plugin to facilitate storage and/or encryption of the stored documents where more users can access the documents in a secure way.

References:

Regex tool at regex101.com
The Regex chapter in this manual.
RTF specifications