ACM.113 Instance demonstrating how swallowing errors can come again to chew
It is a continuation of my sequence of posts on Automating Cybersecurity Metrics.
I wrote a about error dealing with in my sequence on safe programming. I defined that it’s not advisable to swallow errors, of in different phrases catch them in some code and never report them in any output from the applying.
This submit exhibits that CloudFormation is doing that in not less than one case and the way it causes issues. I wasn’t planning to jot down this submit however I needed to spend time working round the issue once I found it, so right here you go.
When swallowing errors to disregard one sort of error impacts all errors
With regard to my delete script I assumed that I had not adopted my very own guidelines as a result of I used to be in a rush and being a tad lazy. As a substitute of checking to see if a CloudFormation stack exists earlier than I delete it, I believed I had merely ignored errors the place it didn’t exist as a result of it was already deleted. I initially thought that was the issue. Because it seems this wasn’t my code in any case.
I used to be making an attempt to delete the primary stack in my record — an EC2 occasion — as a result of I wish to check my delete an redeploy scripts. I wrote about how I developed that deletion script right here and handled numerous dependencies.
Effectively now I’ve a brand new dependency on the EC2 stack. I knew I had some new sources and dependency points. That’s why I’m going again to check my scripts. Reasonably than attempt to insert all the brand new dependencies up entrance, I figured I might check and resolve any errors alongside the best way.
The one factor is, I didn’t get an errors after working the script from the CLI output however once I checked to see that the primary stack within the record obtained deleted, it nonetheless existed. I didn’t get an error on the command line. There was additionally no error message within the CloudFormation stack record so initially I didn’t notice what was occurring.
CloudFormation typically warns you when issues can’t be deleted and leaves issues in an unpleasant error state which I don’t like. I wrote about that just lately in my submit on the right way to repair CloudFormation:
On this case, CloudFormation didn’t depart the stack in an error state. I didn’t notice that an error had occurred. I clicked on the stack and realized that though the stack didn’t report an error or find yourself in an error standing, the stack occasion record included an error message. The stack couldn’t be deleted as a consequence of an EIP affiliation I added for the reason that final time I examined this delete stack:
I used the EIP affiliation to keep away from having to alter firewall guidelines once I delete stacks. I cannot delete the EIP however solely the affiliation for proper now so I don’t need to delete the native firewall guidelines I created on this submit:
I added a remark within the script indicating that also must be accomplished.
When swallowing errors comes again to hang-out you
Right here’s the place my error is inflicting me grief. Initially I believed I used to be swallowing all errors related to the aws CloudFormation delete-stack command to disregard errors when a stack doesn’t exist. That will have brought about an identical consequence to what I’m dealing with.
Reasonably than swallowing errors, I would like to regulate the code to what an excellent engineer ought to do. Disgrace on me. I must examine to see if the stack exists earlier than I try to run the delete command and never be lazy.
Nonetheless, upon additional investigation, I’m not the one swallowing the error. I had already eliminated my code that swallowed that error and was correctly reporting the outcomes to the display. So I used to be confused. Why did I not get an error when the stack didn’t delete?
After I run my delete stacks I print out the instructions so I can simply re-run them. Right here’s the command I run to delete my EC2 stack:
aws cloudformation delete-stack --stack-name AppDeploy-EC2-Developer --profile delete
After I run that command in a terminal — I get no indication that the deletion failed. Aha. AWS is definitely swallowing the error not reporting it again on the command line. My code has no technique to know an error has occurred. It seems to be like every little thing was profitable. How is my script presupposed to know that the EC2 occasion by no means obtained deleted?
This isn’t good. An AWS buyer could also be working a listing of delete instructions for stacks and issues find yourself in a bizarre state as a result of their script continues to delete issues after an error has occurred and it ought to have stopped.
DO NOT SWALLOW ERRORS IN PRODUCTION ENVIRONMENTS.
That is what the error seems to be like within the AWS console. It isn’t even reported as an error it’s reported as “UPDATE_COMPLETE” which isn’t correct.
Maybe AWS might as an alternative depart a stack in a legitimate state however have a flag that there was an error on the final run or one thing like that. As a result of, sure, I don’t actually like stacks hanging round like this that don’t actually have errors. A change rolled again and I simply determined to go away the stack as is.
However not reporting any error when an error occurred causes issues for downstream actions taken by a single script. The script continues when it ought to cease and report an error. The next sources don’t delete correctly as a result of a previous dependency didn’t get deleted. As a result of that also exists, subsequent stacks fail to delete.
How can I resolve this drawback?
Fortunately, I had security checks constructed into my delete script. That signifies that I can step by and examine every stack delete correctly earlier than continuing to the following one. I wrote about that within the above submit on deleting sources.
Each time I’m testing that my script correctly deletes new sources, the script offers the choice to confirm each single name to delete a stack. I used that choice and paused the script after the primary stack to ensure it labored earlier than continuing. That’s once I decided the sources within the first stack had not deleted accurately although I obtained no error.
That’s useful but it surely doesn’t actually resolve the issue. I want to know that the error occurred and have the script exit.
We will’t try this. Recall that neither the stack standing nor the occasions point out an error state. It might be fairly hokey to parse out the stack purpose and attempt to decipher that. What different choices do we now have?
We will as an alternative run a question to see if the stack nonetheless exists after calling the delete operate and throw our personal error primarily based on the truth that the stack nonetheless exists when it shouldn’t.
Unlucky that we now have to do that additional work right here all as a consequence of a swallowed error message. I hope this illustrates why you shouldn't swallow errors. This complete weblog submit might have been prevented and it took a while to type this out.
We might use this operate that waits for the stack to get right into a accomplished deletion state:
What I don’t like about this selection is that as you’ll be able to see it’s going to examine 120 instances! I do know on this case the stack deleting completes in about 2 seconds after which it exists in an UPDATE_COMPLETE standing. In order that’s looks like it’s going to waste my life away doing pointless checks on this explicit case.
This operate to examine if a stack exists has the converse drawback:
If I’m deleting a stack and the deletion is profitable and I sit round ready for the stack to exist that is also going to waste possible much more minutes or hours of my life.
All we have to do is run one question to see if the stack exists — sure or no. If it nonetheless does after working our delete script we now have an error. I can use cloudformation describe-stacks:
What I can do is output the error message if the stack doesn’t exist to a variable:
If the stack exists it can return the usual response:
Now I can write a operate to examine if a stack exists and return false if the outcomes comprise the string “doesn’t exist.” In any other case I’ll return true.
What I truly did after testing additional is create two capabilities.
get_stack_status
Observe that on this operate, I’m swallowing an error to seize it in a variable, however I’m not ignoring it. I’m utilizing it to offer an applicable standing, “NOEXIST”, that I can use in subsequent logic.
Additionally, am including || true on the finish of that assertion so my script doesn’t get into an error state and cease when the AWS CLI throws an error as a result of the stack doesn’t exist.
stack_exists
On this operate I take advantage of the get_stack_status to get the CloudFormation standing which will probably be NOEXIST if the stack doesn’t exist. If it doesn’t I return false.
The opposite factor this operate does is examine for the DELETE_IN_PROGRESS state. The deletion motion doesn’t watch for the the deletion of a stack to finish earlier than returning management to the calling utility. Then the examine to see if the stack nonetheless exists will point out true although the stack is within the strategy of being deleted. When the calling operate determines the stack nonetheless exists, it can exit this system. We’d like to ensure the deletion is full earlier than reporting whether or not or not the stack exists.
We will question the state of a stack like this:
aws cloudformation describe-stacks --stack-name AppDeploy-EC2-Developer --query Stacks[0].StackStatus --output textual content
We will run a loop so long as the stack stays within the DELETE_IN_PROGRESS state.
Then as soon as the state shouldn’t be in a DELETE_IN_PROGRESS state we will examine to see if the stack nonetheless exists return true if it does.
I’m not positive if this code works in all circumstances. I ponder concerning the stack standing when it comprises a number of sources. Most of my stacks don’t. We’ll cope with that drawback if and when it occurs.
I can use that operate in two methods to enhance my code.
- First I’ll skip trying to delete a stack that doesn’t exist. That ought to save time ready on AWS standing experiences from CloudFormation.
- Secondly, I can report an error when stack deletion fails although the stack standing signifies success and resolve the issue that impressed this submit.
Testing the script exhibits it really works and now the script experiences whether or not a stack doesn’t exist or whether or not it was deleted.
Now there’s yet one more factor we have to do. If the stack was not deleted we have to exit. If the stack nonetheless exists after trying to delete it, there’s an issue that must be resolved:
Now I can check my total delete script and ensure it really works.
I obtained one other error the place a stack was not deleted and my script correctly exited:
On this case the stack standing was CREATE_COMPLETE not UPDATE_COMPLETE so our technique of checking for deletion catches circumstances that will not have been caught by checking for UPDATE_COMPLETE.
Don’t swallow errors
All that additional code and time spent penning this submit was as a consequence of the truth that an error occurred with out reporting an error state. Looks as if AWS might save us a while and ensure to report the error correctly as an alternative so our code can take a correct motion consequently.
Don't swallow errors.
Particularly when different persons are relying in your code.
Now on to what I needed to be doing …deploying a user-specific EC2 occasion on AWS.
Comply with for updates.
Teri Radichel
For those who appreciated this story please clap and observe:
Medium: Teri Radichel or Electronic mail Listing: Teri Radichel
Twitter: @teriradichel or @2ndSightLab
Requests providers through LinkedIn: Teri Radichel or IANS Analysis
© 2nd Sight Lab 2022
All of the posts on this sequence:
____________________________________________
Writer:
Cybersecurity for Executives within the Age of Cloud on Amazon
Want Cloud Safety Coaching? 2nd Sight Lab Cloud Safety Coaching
Is your cloud safe? Rent 2nd Sight Lab for a penetration check or safety evaluation.
Have a Cybersecurity or Cloud Safety Query? Ask Teri Radichel by scheduling a name with IANS Analysis.
Cybersecurity & Cloud Safety Assets by Teri Radichel: Cybersecurity and Cloud safety lessons, articles, white papers, displays, and podcasts