Does Generative AI Make Sense When Processing Transactional Documents?

Users are faced with so many choices in the market for IDP that it’s almost impossible to compare apples to apples when it comes to functionality and cost. Sure, any given solution might have great accuracy and functionality but the price makes the solution less viable and accessible. Coupled with pricing constraints of Cloud providers running LLM’s- power/electricity, processing cycles which continually change, so do these pricing models.

Understanding Microsoft licensing seems as though it requires a PhD at times. But in the case of consuming Machine Learning around documents, consumers have a variety of options.

  1. Azure AI Document Intelligence
  2. Azure OpenAI
  3. Microsoft Power Automate AI Builder/AI Hub

Each of these three options above have very different licensing models, so it can be difficult to discern which solution within Microsoft offers the best price versus functionality. In the use cases explored here, we are basing it off of common procurement documents such as invoices, sales orders, BOLs, and similar. Procurement use cases, unlike unstructured recognition, typically have multiple key value pairs (e.g. invoice number, vendor name, etc.). They also have tabular data, which in most cases could require over 30 header/footer fields to be extracted along with possibly 10+ tabular line item fields extracted, which may paginate across pages. In this particular case, being charged by word or token rather than page or document can impact pricing when processing high volumes—which is the case with most procurement documents.

So, let’s explore the pricing model of each of the three Microsoft document ai and ml approaches.

Azure AI Document Intelligence

At the time of this publication, v4.0 of Azure AI Document Intelligence has two major components to pricing, and is generally priced by page in 1,000 quantities.

Prebuilt OR Custom

Note: While there are now 13 different pricing options in the Azure AI Document Intelligence pricing table, we focus on the two major tools that we use the most for data extraction.

Prebuilt Models

Prebuilt models utilize common use cases through swarm sourced models that are popular in most business use cases. For example, invoices, tax forms, contracts, general documents, health ID, and the like. These are essentially Small Language Models, with no uptraining capability and output data as a standard number of text and tabular data in a cost conscious methodology. As implementation consultants, we tend to use these models for your transactional documents, where as many key-value pairs and default prebuilt model fields are returned, and there is no limit to the number of extraction points that can be extracted per page. So for $0.01/page, you can consume pay as you go licensing with no limit to number of words, characters, punctuation, inputs or outputs. Pricing is simple and easy to calculate. There is no model building, and each time you upload a document, the prebuilt models eventually learn.

Custom Models

Custom models, when built, come in two types: template and neural. We are focusing on neural models for the sake of this article. Custom models are well suited for unstructured or semi-structured documents where there are complex or no key-value pairs. In these cases few-shot predictive learning, a machine learning technique where a model can make accurate predictions about new data by training on only a very small number of labeled examples, is used where there are slight variations in the extraction content. However, these models don’t work well with tabular data that paginates and rolls over to following pages. Custom models can be untrained, with your own custom fields like text, table, check mark, or signature. Multiple models can be condensed into a composite model, and each time there is a change in the model it needs to be rebuilt, which can take some time. At $0.03/page, the custom model is 3x more expensive than prebuilt models.

Licensing Model

For both prebuilt and custom models, Azure charges by page in 1,000 page increments. Furthermore, the pay as you go, commitment and disconnect tiers offer a lot of flexibility in pricing and deployment options. Azure is one of the only Cloud engines that offer on-premise, offline cognitive services and even disconnected, non-pinging licensing.

Pay-as-you-go licensing is common among the Big 3 Cloud providers, making procurement of licensing easy and straight forward. You are charged monthly based on consumption at $0.01/page for prebuilt and $0.03/page for custom.

Commitment licensing has a minimum consumption of 20,000 pages/month and will reduce the cost per page from $0.01 to $0.0076 per page which is a 24% discount for prebuilt, and from $0.03 to $0.02295 per page which is a 23.5% discount for custom. At the highest volume of 500,000 pages per month, that cost drops to $0.0064 and $0.01785 per page respectively for prebuilt and custom, a 36% discount. So if you know your volumes and they are in excess of 6M pages per year, a commitment tier would save $1,800 per month for prebuilt and $6,075 per month for custom. There is a significant impact to cost based on how you buy.

Disconnected licensing is ideal for on-premise cognitive services where no Internet access to the Azure AI Document Intelligence container/docker is required. The on-premise docker can use pay-as-you-go and commitment tiers if licensing pinging is available. But in cases where license pinging is not available, Microsoft mandates a minimum of 1.2M pages per year (100,000/month) at $0.0072/page for prebuilt and $0.0204/page for custom. So, comparing disconnected to commitment licensing, there is an upcharge of ~11% and requires a prebuy.

Azure OpenAI

Azure OpenAI, is Microsoft’s wrapped access to OpenAI’s powerful language models. The service allows consumption of the GPT-4o, 4o mini and o1 models from within your Azure tenant via Azure AI Studio.

Azure OpenAI is the generative AI offering for unstructured document extraction which differs from Azure AI Document Intelligence in that models are not pre-built, prompt engineering is used to structure common question prompts that extract the correct data. Here we do not define a field, extract key-value pairs, but rather form a question that’s generic enough to extract a value for example, “What is the vendor’s name?”, “What is the invoice total?”, “What is the transaction date?”

Can Azure OpenAI Support Transactional Documents? What are the Challenges?

So typically Azure OpenAI common use cases would be things that are unstructured like contracts, financial letters, lending, accident reports, claims to name a few. How about procurement documents that are semi-structured, meaning we know they are invoices but the vendor data is not in the same location.

Extracting tabular line items from invoices would likely be more difficult than using Azure AI Document Intelligence, given crafting a query prompt that automatically extracts a table that spans multiple pages would be a challenge. Picking a synonym set for the tabular column name (e.g. item number or part number), may look like “what’s the part or item number?”

Cost and licensing is also much more complex than in Azure AI Document Intelligence. Azure OpenAI licensing is charged based on input and output tokens as well as the GPT model, location and volume.

So what is a token? Input tokens represent the text you provide to the model and output tokens are generated by the model in response to your input. Model consumption is by 1M tokens and inputs and outputs are priced separately.

How does my cost differ by model? Azure OpenAI has different pricing based on the GPT model you consume. For instance the GPT version 4o model either in mini or not is the most commercially viable model whereas GPT 1o is available for complex, reasoning logic but at a significant cost difference. Additionally, whether the model is global, US/EU, or regional does also impact pricing.

What is the cost to process a single page document with Azure Open AI?

In order to estimate single page document extraction and normalized the cost of Azure OpenAI and AI Document Intelligence in page c consumption pricing, we need to understand the total number of characters per page, what model, location and whether its batch API enabled or not.

A large explanation of how a token is equal to characters is as complex as Microsoft licensing models. Ultimately a token is not a direct one to one match against a character, but there is some algorithm based on common words, short phrases, complex, less common words and punctuation and spacing all go into the formula. For our purposes we assume a token is equal to 4 characters of text. We assume 812 tokens per page of text assuming 1 token per 4 characters.

  1. For the GPT-4o model, the cost is $5 per million input tokens and $15 per million output tokens
  2. For the GPT-4o mini model, the cost is $0.15 per million input tokens and $0.60 per million output tokens.
  3. For the GPT-4o model, batch API pricing which is designed for large scale, high volume asynchronous tasks with a 24hr turnaround is offered at 50% off published pricing.

So the total cost per page using Azure Open AI 4o and 4o mini would be $0.01624 and $0.00061 per page respectively. If the volume is high (>1M pages) and turnaround can be up to 24hrs, batch API pricing would be $0.00812 and $0.000305 per page, making generative AI through Azure Open AI, the least costly service to use for processing document extraction.

  • GPT-4o:
    • Input: (812\1,000,000 tokens x $5 = $0.00406 )
    • Output: (812\1,000,000 tokens x $15 = $0.01218)
    • Total: $0.01624 per page
  • GPT-4o mini:
    • Input: (812\1,000,000 tokens x $0.15 = $0.00012)
    • Output: (812\1,000,000 tokens x $0.60 = $0.00049)
    • Total: $0.00061 per page

Consuming GPT 1o preview and 1o mini would be exponentially more at $0.06699 and $0.013398 per page respectively.

  • GPT-1o preview:
    • Input: (812\1,000,000 tokens x $16.50 = $0.013398)
    • Output: (812\1,000,000 tokens x $66 = $0.053592)
    • Total: $0.06699 per page
  • GPT-1o mini:
    • Input: (812\1,000,000 tokens x $3.30 = $0.0026796)
    • Output: (812\1,000,000 tokens x $13.20 = $0.0107184)
    • Total: $0.013398 per page

Microsoft Power Automate AI Builder/AI Hub

When consuming Azure AI Document Intelligence Custom and Prebuilt AI models from the Microsoft Power Platform, referred to as AI Hub or AI Builder, Microsoft charges based on the number of tokens, but differently than Azure Open AI. For tier 1, AI Builder credits at $500 per month, and you get 1M tokens. Each page of a document is equal to 32 tokens so 31,250 pages per month at $500 = $0.016 per page.

The benefit of using the Power Platform over Azure AI Document Intelligence or Azure OpenAI is the ability to create a workflow process and UI using PowerApps. Essentially, by using Copilot Studio in Power Automate you can quickly create a workflow process to monitor and email, extract the PDF attachment and extract data using prebuilt or custom document models and output the document and data into the Dataverse or SharePoint.

Conclusion

Ultimately, when comparing all three ways to consume Microsoft data models, Power Platform, Azure AI Document Intelligence, or Azure OpenAI licensing models, the pricing is relatively similar per page. But, Azure OpenAI 4o mini is well below the cost of AI Document Intelligence. The math to figure out Azure OpenAI is more difficult when guessing inputs and outputs. For all three consumption models, at the low end, minimum volumes pricing is $0.016, $0.01, $0.00061 per page (e.g. $16, $10, $0.61 per 1,000 pages) respectively for Power Platform, Azure AI Document Intelligence and Azure OpenAI.

So, from a cost perspective, yes, Azure OpenAI can be used for transactional documents depending on the business requirements. From a functionality perspective, using Azure OpenAI for transactional procurement documents can be likened to using a jack hammer when you just need a hammer. Yes, Azure OpenAI can read and interpret invoices for instance, but prebuilt models are just more simplistic, enable swarm/crowd sourced knowledge across the globe, return default fields, and manage tabular data better where it rolls to following pages.

Azure OpenAI is more about interacting with and generating text, whereas Azure AI Document Intelligence is about analyzing and extracting data from documents. So, in most cases, Azure AI Document Intelligence is better suited for transactional documents like invoices. With Azure OpenAI, a user would have to craft very specific question prompts to properly extract data and may need to tweak the queries based on a test data set whereas Azure AI Document Intelligence has already pre-programmed the logic specific to its prebuilt data models. But Azure OpenAI 4o mini model at 93% less expensive per page than Azure AI Document Intelligence may make the prompt engineering exercise worth the preverbal squeeze.

About the Author

Brent Wesler, is the VP of Strategic Technology and Digital Automation at PiF Technologies and focuses on strategic business consulting practices such as Machine Learning, Artificial Intelligence, robotic process automation and workflow automation. He has spent the last 25 years within solution consulting, presales, architecting and professional services leadership roles implementing services around Cloud, document and workflow automation software.

His experience includes VP of Professional Services at Westbrook Technologies (acquired by DocuWare), VP of Business Development and Professional Services for Square 9 Softworks, a manufacturer of ECM, web forms, and IDP solutions, and Global Worldwide Presales Solutions Engineer for Kodak Alaris.

Through his two decades within the document automation space, both on the manufacturer and value-added reseller side, Brent has seen the industry go through an extreme technology inflection. As a thought leader, speaker, podcaster and regular contributor to industry publications, Brent focuses on business process re-engineering with customers and the technology that allows for its automation.


📨Get IDP industry news, distilled into 5 minutes or less, once a week. Delivered straight to your inbox ↓

Share This Post
Have your say!
10