Skip to content

Handle async deletion for lambda managed instances#13751

Open
anisaoshafi wants to merge 2 commits intomainfrom
drg-157
Open

Handle async deletion for lambda managed instances#13751
anisaoshafi wants to merge 2 commits intomainfrom
drg-157

Conversation

@anisaoshafi
Copy link
Contributor

@anisaoshafi anisaoshafi commented Feb 12, 2026

Motivation

Support async deletion for lambda managed instances functions.
The response when you delete a lambda managed instances function is 202, which means it's handled async.
The function can't be invoked anymore, but it can be retrieved.

Changes

If function is lambda managed instance type:

  • Add a small delay before actually deleting the lambda function from the store.
  • Update the function state to Deleting and lastupdate status to InProgress while the function is being deleted.

Tests

Related

Part of DRG-157

@anisaoshafi anisaoshafi added aws:lambda AWS Lambda semver: patch Non-breaking changes which can be included in patch releases docs: skip Pull request does not require documentation changes notes: skip Pull request does not have to be mentioned in the release notes labels Feb 12, 2026
@anisaoshafi anisaoshafi added this to the 4.14 milestone Feb 12, 2026
@github-actions
Copy link

github-actions bot commented Feb 12, 2026

Test Results - Preflight, Unit

23 111 tests  ±0   21 249 ✅ ±0   6m 16s ⏱️ -11s
     1 suites ±0    1 862 💤 ±0 
     1 files   ±0        0 ❌ ±0 

Results for commit 6bda2af. ± Comparison against base commit 58d61f3.

♻️ This comment has been updated with latest results.

@anisaoshafi anisaoshafi requested a review from joe4dev February 12, 2026 17:13
@github-actions
Copy link

github-actions bot commented Feb 12, 2026

Test Results (amd64) - Acceptance

7 tests  ±0   5 ✅ ±0   3m 5s ⏱️ +7s
1 suites ±0   2 💤 ±0 
1 files   ±0   0 ❌ ±0 

Results for commit 6bda2af. ± Comparison against base commit 58d61f3.

♻️ This comment has been updated with latest results.

@anisaoshafi anisaoshafi marked this pull request as ready for review February 12, 2026 17:38
@anisaoshafi anisaoshafi removed the request for review from dominikschubert February 12, 2026 17:38
@github-actions
Copy link

github-actions bot commented Feb 12, 2026

Test Results (amd64) - Integration, Bootstrap

    5 files      5 suites   2h 0m 3s ⏱️
3 803 tests 3 565 ✅ 238 💤 0 ❌
3 809 runs  3 565 ✅ 244 💤 0 ❌

Results for commit 6bda2af.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Feb 12, 2026

LocalStack Community integration with Pro

    2 files  ±    0      2 suites  ±0   1h 33m 37s ⏱️ - 31m 4s
3 779 tests  - 1 435  3 537 ✅  - 1 325  242 💤  - 110  0 ❌ ±0 
3 781 runs   - 1 435  3 537 ✅  - 1 325  244 💤  - 110  0 ❌ ±0 

Results for commit 6bda2af. ± Comparison against base commit 58d61f3.

This pull request removes 1436 and adds 1 tests. Note that renamed tests count towards both.
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_lambda_dynamodb
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_opensearch_crud
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_search_books
tests.aws.scenario.bookstore.test_bookstore.TestBookstoreApplication ‑ test_setup
tests.aws.scenario.kinesis_firehose.test_kinesis_firehose.TestKinesisFirehoseScenario ‑ test_kinesis_firehose_s3
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_destination_sns
tests.aws.scenario.lambda_destination.test_lambda_destination_scenario.TestLambdaDestinationScenario ‑ test_infra
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_prefill_dynamodb_table
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input0-SUCCEEDED]
tests.aws.scenario.loan_broker.test_loan_broker.TestLoanBrokerScenario ‑ test_stepfunctions_input_recipient_list[step_function_input1-SUCCEEDED]
…
tests.aws.services.apigateway.test_apigateway_api.TestApigatewayIntegration ‑ test_put_integration_response_templates

♻️ This comment has been updated with latest results.

@anisaoshafi
Copy link
Contributor Author

anisaoshafi commented Feb 12, 2026

Failing tests (in APIGW) are unrelated and being handled here: #13752

Copy link
Member

@joe4dev joe4dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to fix some breaking changes and race conditions before merging this PR. See PR comments for details and suggestions.

⚠️ Unfortunately, we have very limited test coverage for lambda.DeleteFunction (see TODOs in the lambda provider), so we need to be cautious not to break things, and cannot rely on CI to catch

self, function: Function, version: FunctionVersion, qualifier: str
):
def _cleanup(function: Function, qualifier: str):
time.sleep(0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs: Can we clearly document the motivation for using a sleep here?
For example: Simulate async function version cleanup

Could we clarify the async nature in the function name? (e.g., delete_function_async suffix)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

self, function: Function, version: FunctionVersion, qualifier: str
):
def _cleanup(function: Function, qualifier: str):
time.sleep(0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

race condition: What's an appropriate sleep time to mitigate flakes in CI (if 2xget_function in the test takes longer than this)?

I wonder whether we should bump this slightly (e.g., 0.5) to reduce the likelihood of flakiness, given that 0.01 consistently fails and 10x deviation in CI could happen 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought 0.1s was okay, but with a longer sleep we're safer, will bump it to 0.5 just in case CI operations as slower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increased it to 0.5 ✔️

# TODO: introduce locking for safe deletion: We could create a new version at the API layer before
# the old version gets cleaned up in the internal lambda service.
function = store.functions.pop(function_name)
function = store.functions.get(function_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: This function removal is not properly mapped to LMI and introduces a couple of problems:

  1. Why do we map this function-level removal to a loop where we remove the function multiple times for each existing function version?
  2. Why do we switch the order of removing the function from the store and stopping the service? This leads to a race condition in which the function is visible at the API level while being destroyed!
  3. (minor) Why do we delete each function version individually if we can remove the top-level function once? The top-level function contains versions.

# published versions are invokable.
if version.config.capacity_provider_config:
function_has_capacity_provider = True
if version.id.qualifier == "$LATEST":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: This useless condition indicates that we're missing something related to the delete lifecycle

The current implementation does NOT stop any Lambda containers spawned by Lambda Managed Instances because we don't invoke stop_version. We can test this by setting a breakpoint at the end of the testcase and validating that all Lambda containers are gone. The following code fixes this:

# Stop all function versions except $LATEST because only published versions are invokable with
# Lambda Managed Instances, and we do NOT want to create a version manager for $LATEST.
if version.id.qualifier != "$LATEST":
    self.lambda_service.stop_version(qualified_arn=version.id.qualified_arn())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that containers were removed in my local implementation, but then I went ahead refactoring the code in local a few times and forgot to recheck if any docker containers left and now I see that they're not cleaned up properly.
Thanks for flagging this, will fix it 🤞🏼

else:
function.versions.pop(qualifier, None)
self.lambda_service.stop_version(version.id.qualified_arn())
destroy_code_if_not_used(code=version.config.code, function=function)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Does destroy_code_if_not_used work with simulated async store deletion? The same question applies to version.config.code.destroy() below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

aws:lambda AWS Lambda docs: skip Pull request does not require documentation changes notes: skip Pull request does not have to be mentioned in the release notes semver: patch Non-breaking changes which can be included in patch releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants