A client asked us for a WhatsApp assistant able to read PDFs, analyze photos and reply in natural language. The non-negotiable constraint: all the data had to stay in their own AWS account, without passing through any third-party service. We deployed a serverless architecture based on Amazon Bedrock, and the result now handles several hundred conversations a month.
The architecture builds on the open-source project building-gen-ai-whatsapp-assistant-with-amazon-bedrock-and-python published by AWS. It combines Amazon Bedrock, Lambda, DynamoDB and AWS's native WhatsApp integration to create a fully serverless, multimodal AI assistant.
In this article, I walk through this architecture exactly as we deployed it, explain how each building block fits together, and give you the keys to launch your own private WhatsApp AI assistant in a few CDK commands.
Why a private WhatsApp assistant on AWS?
There's no shortage of SaaS WhatsApp chatbot solutions (we cover the business use cases in detail in our article on automating customer relations through WhatsApp and AI). But they raise a major problem: your data travels through third-party servers. Customer messages, sensitive documents, business information... it's all stored by a provider you don't control.
With an architecture hosted in your own AWS account:
- Data sovereignty: everything stays in your AWS account, in the region of your choice (eu-west-1 for France). Amazon Bedrock does not store your data and does not use it to train its models.
- Controlled cost: serverless architecture = you only pay for what you consume. No server running idle, no fixed monthly subscription.
- Full customization: you control the AI model, the agent's instructions, media processing and business logic. No limitations imposed by a SaaS.
- Natively multimodal: text, images, videos, documents and voice notes. The Amazon Nova model understands all of these formats.
Serverless architecture: how it works
The architecture relies on 7 AWS services that chain together in an event-driven way. No servers to manage, everything scales automatically.
Message processing flow
whatsapp_in
The 7 AWS services at the core of the architecture
AWS Lambda: 3 specialized functions
- whatsapp_in: the entry point. Receives the message via SNS, identifies the type (text, image, audio, video, document), downloads the media into S3, and routes it to the right processing path.
- bedrock_agent: invokes the Bedrock agent with the message and the conversation context. For media, it first uses the Converse API to analyze the content before passing it to the agent.
- transcriber_done: triggered at the end of an audio transcription. Retrieves the transcribed text and forwards it to the Bedrock agent.
Amazon Bedrock: the brain of the assistant
The project uses the Amazon Nova model through Bedrock, AWS's managed generative AI service. Two APIs are used:
- Bedrock Agents: handles text conversations with memory and custom instructions. The agent maintains context across multiple exchanges thanks to a session ID stored in DynamoDB.
- Bedrock Converse API: analyzes multimodal content (images, videos, documents). The result of the analysis is injected into the agent's conversation as additional context.
DynamoDB: conversation memory
Two DynamoDB tables store, respectively, the message history (type, content, timestamp, phone number) and the Bedrock agent's session context (session ID per user). This is what lets the assistant remember previous exchanges and provide contextual responses. All with latency under 10 ms.
Amazon S3: media storage
Multimedia files (images, videos, documents, voice notes) are downloaded from WhatsApp and stored in S3 before processing. S3 also serves as the destination for Amazon Transcribe's transcription results. SSE encryption enabled by default.
Amazon Transcribe: speech-to-text
WhatsApp voice notes are transcribed into text automatically. The service detects the language automatically (no need to specify it) and supports over 100 languages. Once the transcription is complete, an event triggers the transcriber_done Lambda, which forwards the text to the agent.
SNS + AWS End User Messaging: native WhatsApp integration
AWS End User Messaging provides native WhatsApp Business integration, without going through Twilio or another intermediary. Incoming messages are published to an SNS topic, which triggers the processing Lambda. Responses are sent directly via the End User Messaging API. Less latency, lower cost, fewer external dependencies.
Deploy in 5 steps with AWS CDK
The entire infrastructure is defined in Python with AWS CDK (Cloud Development Kit). A single cdk deploy creates all the resources.
Prerequisites
- • AWS account with the CLI configured
- • Python 3.8+
- • AWS CDK v2.172.0+
- • Meta Business Account (for WhatsApp)
cd private-assistant-v2
source .venv/bin/activate
The cdk deploy automatically creates: the 3 Lambda functions, the S3 bucket, the 2 DynamoDB tables, the Bedrock agent, the SNS topic, the IAM roles and the End User Messaging configuration. Expect around 5 minutes for the full deployment.
In production, we found that the step taking the longest isn't the AWS deployment but the Meta Business Account verification (expect 2 to 5 business days). Our advice: kick off this step in parallel with development so it doesn't block your go-live.
Under the hood: how the code processes each message
The whatsapp_in Lambda is the entry point. It receives the SNS events and orchestrates processing based on the message type:
Text message
The text is extracted from the WhatsApp payload and passed directly to the bedrock_agent Lambda via an asynchronous Lambda call (InvocationType: Event). The Bedrock agent generates the response based on its instructions and the conversation history.
Image, video or document
The media is downloaded from WhatsApp and stored in S3. The S3 path is passed to the bedrock_agent Lambda, which uses the Bedrock Converse API to visually analyze the content (image description, text extraction from a PDF, video analysis). The result is then injected into the agent's conversation as context.
Voice note
The audio is stored in S3, then an Amazon Transcribe job is launched with automatic language detection. Once the transcription is complete, an event triggers the transcriber_done Lambda, which retrieves the text and forwards it to the Bedrock agent. The customer can therefore talk to the assistant just as they would talk to a human.
Memory management
Each message is saved in DynamoDB with the phone number as the partition key. The Bedrock agent uses a unique session ID per user, also stored in DynamoDB, to maintain conversation context. The result: the assistant remembers everything the customer said previously, even if the conversation spans several days.
Cost estimate: how much does it really cost?
The advantage of a serverless architecture: no fixed cost. You pay as you go. Here is an estimate for 1,000 messages per month:
| Service | Estimated cost / month | Details |
|---|---|---|
| AWS Lambda | ~ 0.50 USD | 3 functions, 256 MB, ~15s/invocation |
| Amazon Bedrock (Nova Lite) | ~ 3-10 USD | ~0.003 USD/request (input + output tokens) |
| Amazon S3 | < 1 USD | Media storage + transcriptions |
| DynamoDB | < 1 USD | On-demand, low volume |
| Amazon Transcribe | ~ 2-5 USD | 0.024 USD/min (if voice notes) |
| SNS + End User Messaging | < 1 USD | + WhatsApp Business fees per conversation |
| Estimated total | 20 - 50 USD | For 1,000 messages/month |
Comparison: SaaS WhatsApp chatbot solutions cost between 100 and 500 USD/month with message limits. Here, for a comparable volume, you're 3 to 10 times cheaper, with full control over your data and your infrastructure.
Concrete use cases
This architecture opens the door to many use cases thanks to its multimodal capability. If you're looking for a broader view of integrating AI in business, see our practical guide to integrating generative AI in SMEs.
Smart customer support
The customer sends a photo of their faulty product, the AI diagnoses the problem and suggests a solution. Voice notes are supported for customers who prefer to speak.
Medical follow-up
A patient sends a post-operative photo, the AI analyzes the progress and alerts the practitioner if needed. Medical documents (PDF) are analyzed automatically. This is exactly what PostCare.net offers, a patient follow-up solution via WhatsApp and AI.
Training assistant
Learners send their exercises (photos or documents), the AI grades them and provides personalized feedback. Multilingual support thanks to automatic detection.
Real estate assistant
A prospect sends photos of their property, the AI provides an estimate and recommendations. Documents (inspection reports, floor plans) are analyzed automatically.
Frequently asked questions
How much does a WhatsApp AI assistant on AWS cost?
Thanks to the serverless architecture, costs are proportional to usage. For 1,000 messages per month, expect roughly 20 to 50 USD all in. The cost grows linearly with volume, with no tiers and no fixed subscription.
What types of files can the assistant process?
Text, images (JPEG, PNG), videos (MP4), documents (PDF, Word) and voice notes. Media is analyzed by Amazon Nova through the Bedrock Converse API. Voice notes are transcribed by Amazon Transcribe with automatic language detection.
Is the data secure?
Yes. All data stays in your own AWS account. Bedrock does not store your data and does not use it for training. S3 encrypts files with SSE, DynamoDB encrypts at rest. However, messages do travel through WhatsApp, which has its own privacy policies.
Can the agent's responses be customized?
Absolutely. The Bedrock agent accepts custom instructions that define its behavior, tone, scope and limits. You can give it a specific personality, restrict the topics it covers and set response rules. The DynamoDB history enables contextual responses.
Do you need a Meta Business Account?
Yes. WhatsApp integration via AWS End User Messaging requires a verified Meta Business Account. Creating one is free but verification takes a few days. It's the same account as the one used for Facebook Ads or Instagram Business.
How do you delete the infrastructure?
Empty the S3 bucket, then run cdk destroy. All resources are removed within a few minutes. That's the advantage of Infrastructure as Code: deployment and teardown in a single command.
Read also
Want to deploy your own WhatsApp AI assistant?
I'll support you in deploying, customizing and integrating your WhatsApp AI assistant on AWS. A serverless, secure and cost-optimized architecture.
Let's talk about your projectReply within 24h - First conversation free and with no commitment