From Alert to Fix: Building TypeScript Remediation Lambdas for Common Security Hub Findings
securitytypescriptawsautomation

From Alert to Fix: Building TypeScript Remediation Lambdas for Common Security Hub Findings

AAlex Mercer
2026-04-11
19 min read
Advertisement

Build auditable TypeScript Lambda playbooks that auto-fix Security Hub findings with CI/CD gates and safe remediation patterns.

From Alert to Fix: Building TypeScript Remediation Lambdas for Common Security Hub Findings

If you run AWS at scale, Security Hub can feel like a permanent stream of red flags: public S3 buckets, missing CloudTrail, overly permissive ECR settings, and other drift that quietly turns into risk. The trick is not to suppress findings or drown in tickets; it is to build a safe remediation path that is fast enough to matter and controlled enough to satisfy auditors. In this guide, we’ll design a TypeScript Lambda remediation pattern that turns common findings into auditable fixes with CI/CD gates, change logs, and rollback-friendly safeguards. If you’re already mapping your security posture, pair this with our overview of operational security hardening and the broader view of automation patterns for operations teams.

1) What Security Hub remediation should actually do

Detect, decide, remediate, record

Good remediation is not just “fix the thing.” It is a four-step control loop: detect the finding, decide whether it is safe to act automatically, apply the smallest corrective change, and record the action in a durable audit trail. That makes the system both operationally useful and compliance-friendly. Security teams often start with detection only, then add a human ticket, and eventually discover that the ticket queue is slower than the risk curve.

A TypeScript Lambda is a great fit because it gives you typed event handling, reusable policy logic, and a codebase that can be tested like any other application. Instead of hard-coding one-off scripts, you build a remediation service with explicit guardrails. For a broader software discipline around stability, the logic here is similar to the practices in QA checklists for stable release environments: define the checks before you act, then automate only the safe subset.

Why TypeScript is the right language for remediation

Security automation fails in ugly ways when the code is loosely typed and error-prone. TypeScript helps you model Security Hub events, AWS SDK responses, allowlists, and action plans in a way that catches many mistakes before deployment. That matters when a single wrong parameter can touch dozens of accounts or cause an unintended config change. Typed interfaces also make it easier to review code quickly during security approvals.

In practice, TypeScript lets you define a bounded set of remediation actions like PublicS3Block, CloudTrailBootstrap, or EcrEncryptionEnable and route findings to the right handler. This reduces the chance that a malformed finding triggers the wrong playbook. If you’re deciding whether to build internal tooling or buy a platform, the tradeoff mirrors the decision framework in evaluating software tools and the broader build-versus-buy logic in build vs. buy in 2026.

Where automated remediation fits in the control plane

The right place for this automation is between Security Hub and the infrastructure control plane. Findings arrive, an event bus routes them, a Lambda evaluates policy, and an execution role applies the fix. That keeps the remediation logic close to the source of truth while still allowing change management, logging, and approvals. The result is a repeatable response model instead of an incident-by-incident scramble.

Think of this system as a secure workflow, not a magic button. The best remediation stacks are designed like workflow automation systems: explicit state, visible transitions, and a clear approval path when automation should stop and hand off to humans.

2) Security Hub findings worth auto-remediating first

Start with low-blast-radius, high-confidence findings

Not every finding should be remediated automatically. Start with issues that are both common and deterministic, where the fix is almost always the same. Examples include public S3 buckets, missing CloudTrail in a single-account baseline, insecure ECR repository settings, and disabled logging on critical services. These are classic “configuration drift” findings, and they are ideal candidates for deterministic code.

A practical way to prioritize is to rank findings by blast radius, reversibility, and confidence. A change that can be reversed in a few lines of code is much safer than one that alters networking or identity boundaries. This is similar to choosing incremental operational automation rather than all-at-once transformation, much like the mindset in incremental AI tools for database efficiency.

Examples of frequent remediations

For public S3, the remediation is usually to block public access at the account or bucket level, remove public ACLs, and verify the bucket policy. For missing CloudTrail, the remediation is to create or re-enable a trail, ensure it delivers logs to an encrypted bucket, and confirm log file validation is on. For insecure ECR settings, the fix may include enabling scan on push, ensuring encryption at rest, and restricting repository policies.

Aws Security Hub’s AWS Foundational Security Best Practices standard is exactly where many of these findings originate, and AWS’s control catalog is broad enough that you can build a useful remediation library without inventing your own taxonomy. The key is to map each control to one safe action plan. You do not need to remediate every control on day one; you need a stable subset that reduces risk quickly while preserving trust.

Controls that should usually require human approval

Some findings are too context-sensitive for automatic action. Examples include IAM policy overreach, VPC security group changes, or anything that might break production traffic. For those, emit a ticket, Slack alert, or approval request with a recommended fix instead of acting directly. A good remediation platform knows when to stop.

This is where strong operational discipline matters. Security automation should be treated like a change system with guardrails, not as a hidden background process. If you want a useful parallel, the alerting and escalation discipline described in critical patch alerting without panic is a good model: notify clearly, act selectively, and avoid unnecessary noise.

3) Architecture: event-driven, typed, and auditable

The basic pipeline

The simplest robust design is: Security Hub finding event → EventBridge rule → TypeScript Lambda → AWS SDK remediation action → audit record in DynamoDB, S3, or CloudWatch Logs. Add SNS or Slack for notifications and use Step Functions if the remediation needs multiple decision points. This architecture keeps individual functions small and testable while allowing you to compose more advanced flows later.

The most important design choice is the decision layer. Your Lambda should not blindly execute everything it receives. It should first classify the finding by product, control ID, severity, account, region, and environment, then check policy rules before applying the fix. That’s the same kind of deliberate routing logic used in resilient operations workflows and can be modeled cleanly in TypeScript.

Suggested AWS building blocks

Use EventBridge for fan-out and filtering, Lambda for execution, IAM for least privilege, DynamoDB for structured audit metadata, and S3 for immutable log exports. If you need approval gates, use Step Functions or a manual approval workflow in CodePipeline. For remediation against multiple accounts, centralize the event intake in a delegated admin account and pass context explicitly so the action is attributable.

This is also where infrastructure-as-code matters. The remediation service itself should be deployed through the same disciplined pipeline that protects the workloads it changes. For teams already investing in operational resilience, the same philosophy appears in guides like preparedness for disruptive future tech operations and resilient automation strategies: build systems that absorb change without losing visibility.

Auditability by design

AUDIT is not a postscript. Every remediation invocation should store the finding ID, control ID, target resource ARN, before/after state summary, action taken, execution role, code version, and approval source if one exists. You should be able to answer “who changed what, when, why, and under which policy?” without digging through ad hoc logs. If auditors can’t reconstruct it, the process is not mature enough.

One practical pattern is to write an immutable event record to S3 and a queryable summary to DynamoDB. Then expose a dashboard for security operations to review remediation frequency and exceptions. That’s the same kind of evidence-first posture recommended in security-conscious vendor governance, where proving control is as important as having the control.

4) A TypeScript Lambda cookbook for common findings

Pattern 1: Public S3 bucket remediation

For a public S3 bucket finding, your Lambda can evaluate whether the bucket belongs to a known public distribution use case, then apply a safe default: block public access, remove public ACLs, and alert the owner. The function should refuse to act if the bucket is tagged for intentional public hosting or if the account is in a controlled exception list. In other words, the code should be opinionated but not reckless.

type FindingType = 'S3_PUBLIC_READ' | 'CLOUDTRAIL_MISSING' | 'ECR_INSECURE_CONFIG';

interface RemediationRequest {
  findingId: string;
  controlId: string;
  resourceArn: string;
  accountId: string;
  region: string;
  severity: 'LOW' | 'MEDIUM' | 'HIGH' | 'CRITICAL';
}

The remediation action should be idempotent: if the bucket already has public access blocked, the function should record a no-op rather than failing. That makes retries safe and reduces operational noise. In highly regulated environments, the difference between a failed fix and a safe no-op is substantial from an audit and reliability standpoint.

Pattern 2: Missing CloudTrail bootstrap

CloudTrail remediation is a little more sensitive because the function may need to create a trail, a KMS key, and an encrypted log bucket. Start by verifying that the trail does not already exist and that the account is not under a special landing-zone policy that handles logging elsewhere. If the log archive account pattern is already in place, your function should only validate and alert rather than create duplicate trails.

For a truly safe implementation, split the logic into a planner and an executor. The planner calculates the intended changes and stores them; the executor applies only approved plans. That pattern helps you preserve an audit trail and gives security leadership a chance to review bootstrap actions. This is the same reason careful communications matters in release management and why organizations prefer structured, reviewable systems over impulse-driven changes.

Pattern 3: Insecure ECR settings

For ECR findings, a common safe remediation is enabling image scanning, enforcing encryption at rest, and restricting overly broad repository policies. The function should check whether the repository is production-critical and whether policy updates would break CI workflows before making any change. If the action can disrupt a deployment pipeline, route it to approval first.

Because ECR settings can affect build and deploy flow, treat the remediation as part of CI/CD, not separate from it. The same pipeline that builds the Lambda can also validate that repositories remain compliant after changes. This aligns with the discipline behind policy-aware automation and the broader principles of controlled operational change.

Pattern 4: Logging and encryption defaults

Many findings are simply “logging off” or “encryption off.” These are usually good candidates for direct remediation because they strengthen posture without altering application logic. Turn on the missing control, confirm the resource accepts it, then record the exact API response for the audit log. The function should not assume success just because the API returned 200; it should validate the resulting state with a read-after-write check.

If you want to keep the remediation codebase maintainable, create shared adapters for common tasks like tagging, state verification, and exception handling. That keeps the playbook consistent and reduces the chance that one remediation path drifts from another. Strong software hygiene matters here because security automation ages quickly when it is built as a pile of scripts instead of a typed service.

5) CI/CD gating for safe automated remediation

Separate build, test, and promote stages

Security automation should be promoted like production software. Build the Lambda, run unit tests against mocked findings, run integration tests in a sandbox account, and only then deploy to the remediation account. Each stage should prove that the function is safe, deterministic, and idempotent. If a change cannot survive that pipeline, it should not touch live infrastructure.

One practical trick is to pin the policy pack or control mapping as versioned data. That way, any change in remediation behavior becomes visible in code review and release notes. This is the same philosophy behind resilient release engineering and operational stability guides like release QA checklists and well-managed distributed operations.

Approval gates and exception handling

Not every deployment should be automatic. Use approval gates for new remediation types, high-blast-radius actions, and changes that touch identity, network, or account-wide controls. You can also require a “break-glass” approval path for production accounts, even if lower environments are fully automatic. This prevents automation from becoming an uncontrolled privilege escalator.

A useful practice is to attach a remediation risk score to each action and require human approval above a threshold. That score can include control criticality, resource count, account sensitivity, and reversibility. It sounds formal, but it is exactly what auditors want: a clear policy that explains why some actions are automatic and others are not.

Testing strategy that actually catches bad fixes

Unit tests should verify policy decisions, finding parsing, and payload generation. Integration tests should run against disposable AWS resources and confirm the post-remediation state. End-to-end tests should simulate the Security Hub event, invoke the Lambda, and confirm the audit log record exists. If you only test the “happy path,” you are validating optimism, not correctness.

For teams that want to reduce manual friction without losing control, the strategy is similar to other forms of operational automation: use typed inputs, deterministic outputs, and explicit rollback logic. This mirrors the value of structured dashboards in other domains, like the way decision dashboards help operators act on data without drowning in it.

6) Audit trail design: prove every fix happened for a reason

What to log for each remediation

At minimum, log the finding ARN, security product, control ID, timestamp, execution role, function version, target resource, pre-state, post-state, and remediation disposition. If a human approved the action, include the approver identity and approval timestamp. If the function declined to act, log the reason code and policy rule that blocked it. This gives you a usable record for internal investigations and external audits alike.

Where possible, include structured JSON logs rather than free-form text. Structured logs can feed SIEM tools, dashboards, and incident workflows without a parsing layer. This is essential for an audit trail because “we think we fixed it” is not acceptable evidence.

Immutability and retention

Send a copy of the log stream to an S3 bucket with object lock or equivalent immutability controls if your compliance regime requires it. Retain records according to policy, and make sure the retention period is long enough to cover audit cycles and incident reviews. The goal is not just to have logs, but to have logs you can trust after the fact.

In regulated settings, this is often where teams discover they need stronger governance. The same reason a company needs clear external communication around changes is why you need detailed remediation logs: when something is questioned later, the evidence must already exist. If you want a related operational mindset, look at how false-positive analysis emphasizes traceability and context.

Evidence bundles for compliance

For each control family, consider generating an evidence bundle that contains the finding before the fix, the action taken, the post-fix verification, and the policy decision that authorized it. That bundle can be attached to internal control testing and external compliance reviews. In practice, this can save hours of manual reconciliation every month.

This approach also makes it easier to demonstrate continuous compliance rather than point-in-time compliance. That distinction matters because Security Hub is continuously evaluating your environment, and your remediation system should be continuously proving that it can respond safely.

7) Reference implementation blueprint

Lambda handler shape

A production-grade handler should do five things: parse the event, normalize the finding, authorize the action, execute the remediation, and write the audit record. Keep each step isolated so the handler remains readable and testable. If possible, use a command pattern so each finding type maps to a dedicated remediation class. That makes it much easier to extend the system as your coverage grows.

Here is the practical shape: Security Hub event enters the handler; the handler looks up a policy record; if approved, it dispatches to a specific executor; the executor uses AWS SDK clients with scoped permissions; after success, it writes a structured log record and emits a notification. This design makes it easy to review, scale, and debug.

Policy engine and allowlist model

The policy engine should understand account type, environment, control ID, resource tags, and exception windows. For example, a developer account might allow some actions automatically, while a production account requires approval. Use explicit allowlists for exceptional resources rather than broad exclusions. That keeps exceptions visible and time-bounded.

A practical analogy is the way buying decisions should be constrained by clear criteria rather than impulse. The same discipline appears in software evaluation and timing-sensitive decision playbooks: good systems are designed to reduce emotional or accidental decisions.

Rollback and verification

Whenever possible, record the previous state before changing it so you can roll back if needed. Some controls, like public access blocks, are easy to reverse; others, like trail creation, may need compensating actions rather than true rollback. After every remediation, verify the desired end state with a read operation and write the result to the audit record. If verification fails, mark the action as incomplete and alert a human.

This final verification step is the difference between “we called the API” and “we actually fixed the issue.” In security work, that distinction matters a lot. It is also why a disciplined remediation service is more trustworthy than a shell script in a runbook.

8) Operational rollout: how to introduce this safely

Phase 1: observe-only mode

Start by listening to Security Hub findings and classifying them without changing anything. Use this phase to validate your parser, policy map, and alert routing. Observe-only mode will reveal false positives, missing tags, and account exceptions before you risk a live change. It also gives stakeholders confidence that the system understands the environment.

During this stage, report remediation recommendations and estimated confidence scores. That gives teams a preview of what automation will do later. You can think of it as the operational equivalent of a proof-of-concept before the production rollout.

Phase 2: auto-remediate only the safest controls

Enable automatic action for a very small set of controls with the lowest blast radius, such as public S3 blocking or logging enablement in non-production accounts. Keep the approval path for everything else. Measure how many findings are remediated, how often the function no-ops, and whether any false actions occurred. Metrics tell you whether your trust in the automation is justified.

At this stage, you should be reviewing the audit trail weekly. If the logs are noisy, incomplete, or hard to query, fix the logging system before expanding remediation coverage. The trustworthiness of the whole pattern depends on the quality of its evidence.

Phase 3: expand with policy guardrails

Once the low-risk controls are stable, expand to additional services and accounts. Add more nuanced logic for tags, ownership, environment, and maintenance windows. Use the lessons from your first remediations to improve policy precision, not to relax safeguards. Mature automation becomes safer over time because it learns where to stop, not because it acts more aggressively.

For teams that want to build a broader compliance automation capability, this pattern scales well into other controls and services. The same architecture can be extended to new AWS services as Security Hub findings evolve, especially as the AWS Foundational Security Best Practices standard continues to cover more operational surfaces.

9) Comparison table: remediation approaches and tradeoffs

ApproachSpeedAuditabilityRiskBest use case
Manual ticket onlySlowMediumLow operational risk, high exposure windowHigh-blast-radius changes
Alert + human approvalModerateHighLow to mediumSecurity-sensitive accounts
Automated Lambda remediationFastHigh if logged wellLow to medium for deterministic fixesPublic S3, logging, encryption, baseline controls
Step Functions with approvalsModerateVery highLower than direct automationMulti-step or cross-account workflows
Fully manual runbookSlowestDepends on process disciplineLow change risk, high drift riskRare or complex remediations

The big takeaway is that automation is not the opposite of control. Done well, it is the mechanism that makes control visible, consistent, and fast enough to reduce exposure. The right model depends on the finding’s blast radius, the account’s sensitivity, and your organization’s appetite for auto-action.

10) FAQ: practical questions teams ask before going live

How do I decide which Security Hub findings to automate first?

Start with findings that are frequent, low-blast-radius, and deterministic. Public S3, missing logging, and encryption defaults are usually better first candidates than IAM or network changes. If a fix can be expressed as a small, idempotent function with reliable verification, it belongs near the top of the queue.

How do I keep remediation auditable for compliance?

Log the finding, policy decision, action taken, execution role, code version, and verification result for every run. Store immutable copies of records in a controlled log bucket and keep structured fields so auditors can query them later. If a human approved the change, include the approver identity and timestamp.

Should remediation Lambdas act in all accounts?

No. Production and regulated accounts often need stricter controls, approvals, and exception handling. Use account tiers and environment tags to define where automatic remediation is allowed and where it must stop for review.

What is the safest way to handle CloudTrail findings?

Use a planner/executor model and validate that the account does not already use a centralized logging architecture. If you need to create trails or buckets, do it only after confirming there isn’t an organizational logging control already in place. CloudTrail is important enough that duplicate or conflicting fixes can create new problems.

How do I test remediation without touching production?

Run the Lambda in observe-only mode first, then in a sandbox or disposable account with synthetic findings. Add integration tests that verify actual post-remediation state and use CI/CD gates before promoting any change. This protects you from both parsing errors and unsafe policy decisions.

What if the remediation breaks something?

Every playbook should include a rollback or compensating action whenever possible, plus a fast alert path to humans. Idempotency and verification greatly reduce the chance of a bad state, but no automation should be assumed perfect. The audit trail should make it clear what happened so responders can correct it quickly.

11) Conclusion: automation that security and auditors can both trust

Security Hub remediation works best when it behaves like a product, not a script. That means typed inputs, explicit policy decisions, safe defaults, repeatable deployment, and logs that can survive an audit. TypeScript Lambda functions are a strong foundation because they give you the structure you need without slowing you down. When paired with CI/CD gates and a clear exception model, they can turn common findings into fast, consistent fixes instead of recurring incidents.

Use the cookbook approach: begin with low-risk controls, build a strong audit trail, and expand only when the data says the automation is safe. If you need additional guidance on operational hardening and policy design, revisit our related material on operational security checklists, operations automation patterns, and build-vs-buy decision frameworks. The goal is simple: move from alert to fix without losing control, traceability, or trust.

Advertisement

Related Topics

#security#typescript#aws#automation
A

Alex Mercer

Senior SEO Content Strategist

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-04-16T20:27:58.898Z