Terraform test command use case and questions

francois.munger · November 29, 2023, 3:37pm

Hi,

I started exploring the terraform test command.

I feel like there is a big potential use case here, even if it’s maybe not the intended use.

After experimenting and reading I made a test file with a single ‘run’ block with no asserts :

run "check_apply_errors_before_real_apply" {

}

This test will run ‘Terraform apply’ on and empty state and then tear down so it doesn’t change the existing infrastructure. So far so good.

Now is there a way to import an existing state onto the test ?

With Terraform most of the errors we get are not the kind checks, validations and asserts will find.
It’s mostly the integration errors that happens when we terraform apply on our aldready existing infrastructure, as sometimes there is a conflict with the provider, an existing resource, etc…

It would be amazing to run a test that is just that : a terraform apply that does not change your existing infrastructure (as terraform test aldready does), but with our actual state imported within the test.

This way we will catch all those annoying errors that happens during terraform apply. And because we are executing terraform test, no actual changes are applied.

That would be a real game changer. It would then be easy to add a step in a ci/cd pipeline with this test and voila!, you know exactly what will happen in the terraform apply step without deploying anything and then stop the pipeline.

If we can aldready import an existing state in a test, give me the recipe please !!

Cheers,
Francois

apparentlymart · November 29, 2023, 5:03pm

Hi @francois.munger!

The way that terraform test is able to do its work without affecting your existing infrastructure is by using an entirely separate state, as you’ve seen. If it used your “real” state then running tests would make changes to your real infrastructure, which I assume is not what you are hoping for.

There’s work currently underway to allow defining mocks for providers and overriding particular resource instances with hard-coded test data, which might get things closer to what you are hoping for, but I’m not sure… I am imagining that a test author could use the mocking and override features to create the effect of some existing infrastructure without that actually having to exist, and therefore the test can respond to that fictitious situation. Of course, that would only be helpful if the mock/override is realistic, and would not allow you to test that the infrastructure is really working afterwards, because there would be none to test.

francois.munger · November 29, 2023, 6:45pm

Thanks for the quick reply,

As you said we do not want to make changes to our real infrastructure. Mocking and overriding might work for our use case, but that would take some effort to setup. We will take a look at this feature when it comes out.

Terraform test uses an entirely separate empty state. When it runs, it does not create real infrastructure when you run the test. So when running the test, would it be possible to copy our state into this entirely separate state, so that when terraform test run, it still does in it’s seperate state, but this time the seperate state contains things that aldready been deployed. Or another way to see this: think of it as a big data.tf file that loads before the test runs.

After the test, it does it’s tear down as usual and flush his separate state, without deploying real infrastructure.

Our state is unharmed because we told terraform test beforehand to copy our state in it’s own separate state.

For a use case to test that terraform apply will not trigger errors during deployment, I think it would make sense.

Do you think such a feature could be implemented ?
I’m just curious…

apparentlymart · November 29, 2023, 8:20pm

Hi @francois.munger,

I think the part I’m not following is that the Terraform state is largely just a cache of some information whose source of record is the remote system, and so copying just the Terraform state, without somehow also cloning the real infrastructure objects it is describing, wouldn’t really be significantly different than working directly with the original state.

The only real difference would be that the “real” state would not be updated to match the changes to the real system that were made by the test. From Terraform’s perspective it would seem like those objects were changed outside of Terraform, although in practice it would of course be a change from a different part of Terraform in this case.

The only way to ensure that the test run can be isolated from real infrastructure is for the test run to create its own infrastructure to work against, which is how it is currently designed.

I suspect I’m just misunderstanding what you are proposing though, so hopefully what I’ve written above will help you understand what I’m imagining when I read your proposal, and that’ll help to clarify my mistake.

francois.munger · November 30, 2023, 12:34pm

Hi @apparentlymart,
I undestand your point, and I think you understood mine well too. I just wanted to know if there is a way to view the outcome of the terraform apply command (apply complete, with or without errors) in an environment that replicates our existing infrastructure before running the terraform apply on the real infrstructure.

copying just the Terraform state without somehow also cloning the real infrastructure objects it is describing, wouldn’t really be significantly different than working directly with the original state.

I agree with you here. Then would it be possible to somehow convert all ressources of our existing state into some sort of big data.tf file (well you know the data objects we put in data.tf file that tells Terraform that it is an existing ressource not managed by Terraform), and import it in the test environment, so that when you run the terraform apply in this test environment it deploys on an empty state, but it knows that resources are aldready deployed, which are all the ressources of our existing infrstructure because of the data we somehow imported. Then the result of this apply would be I think as close as it can be of the real thing.

That’s why I thought of terraform test. As you said the command creates real infrastructure, but it destroys it afterwards if i’m not mistaken.
So if we could tell Terraform to know about the existing infrastructure (using data for example) and run the test, it would apply the changes on the existing infra, and then destroy what it created leaving our infra untouched. You have a test that pretty much tells you what will happen when you terraform apply for real.

Why do I want to do this ? Just because nothing is more annoying and stressful than a Terraform deployment in production that stops midway during apply because some sort of conflict or error happened.

Cheers
Francois

apparentlymart · November 30, 2023, 4:13pm

Hi @francois.munger,

Thanks for the additional context!

A problem I can see in what you described is the assumption that it would always be possible to “undo” whatever actions the test scenarios described.

While it’s typically possible to “undo” a create action by deleting what was created, many other actions don’t have a means to revert them. For example:

If something gets deleted, typically “undoing” that would mean creating a new object similar to the original one, rather than restoring the original object. Whether that’s an acceptable “undo” is debatable.
Even some in-place updates are irreversible. For example, on some platforms it’s possible to make a disk larger but not smaller, or to upgrade to a newer version but not back to an older version.

I agree with you that it’s annoying when something fails during apply even though the plan succeeded. Unfortunately, I don’t think there’s any universal way to avoid that, because the underlying APIs that Terraform providers are wrapping often provide know way to know if something will succeed without actually trying it. Provider developers try to reimplement validation rules they know about in the provider to avoid this, but there are too many variations to be comprehensive and there are some situations that cannot be validated client-side because they rely on information that a client cannot obtain efficiently.

However, I don’t think that the test framework offers any improvement in that area: the same underlying problems still exist. If it were always possible to rehearse an action without affecting underlying infrastructure then the main Terraform plan operation would do that, without any need for the test system.

I think the best we can do with terraform test is to try to emulate the existing system as closely as possible, either as a separate set of real infrastructure or as a set of mocks, and then use that to reduce (but not eliminate) the risk of unexpected problems during apply.

Combining that with continued investment by provider developers in producing the most accurate plans possible (including efforts to detect situations that would fail during apply) is, I think, the best we can reasonably do with the current state of the art in infrastructure API design.

(If APIs were designed to expose all of the information required to determine if an operation can succeed without actually running that action then this problem would be less significant, but even that significant improvement cannot be totally comprehensive: the remote system might start having an outage between plan and apply, or something outside of Terraform might change the remote system in a way that invalidated what was planned. But we can still aim to gradually get better.)

francois.munger · November 30, 2023, 4:41pm

Thank you @apparentlymart for your clear explanations, I appreciate a lot.

Have a nice day,
Francois