Subscribe to Martyr2's Programming Underground        RSS Feed
***** 1 Votes

Barriers of Validation

Icon 4 Comments
Hello once again Dream.In.Coders! Today I want to talk to you about a design decision which involves validating data. Often times newbies build error prone code. This is because they are just learning and lack the experience of proper validation... and lack the experience of being smacked in the face when the code fails on the input of "a" when an integer is expected. As we become more experienced we learn about validating incoming data. Then we often times take it to the extreme, validating our data in every routine and every class method. Sometimes we do this to the point where our code becomes bloated with endless validation routines or slows to a crawl while we check that "a" is a number, is then a number between -1 and 37, is then prime, it is not 5 etc etc etc. How do we strike a good balance? I will talk about creating "validation barriers" in your code and how we can segregate code that needs validation from the routines and classes that do not. All this on another great episode of the Programming Underground!

<Taylor Swift theme music... only she is doing it through tar. Not so swift now is she?>

Many experts may argue that validation should be done on a function by function basis. The idea behind this idea is portability (and extreme paranoia). If you move the function to another project, it can protect itself. Fantastic idea, but do you want to have to always worry about validating input or would you rather worry about getting the function right? Do I really want to worry about num1 and num2 and that they are valid in a simple add(num1, num2) function? I think I rather worry about making sure add returns the correct result when I give it two integers. Wouldn't you?

Setting up barriers within a project can protect classes or functions from this validation chaos. Think of it as setting up a clean room. A section of the program where functions don't have to worry about the incoming data being incorrect, it has passed through validation already and it can assume the input is going to be correct. On one side of this layer is the functions which take in user input or input from external "dirty" sources like a file or a database connection. The other side are functions which work with data that it knows is valid and can freely work on it.

One of these barriers can be a set of validation functions or classes that sit between the input functions/classes and validates the data before passing it on to the "pure" functions. If data is bad, it can reject it or it can simply scrub it and make the bad data into good clean data. For instance, assume we ask the user for a letter. They enter in -1. Obviously wrong. The validation routine would take in this data, see it is not a letter, and perhaps setup a default value like 'a' which it then passes on to the functions that deal with this character. Perhaps it sees the -1 and sends back a message to the user saying "Hey dummy, I asked for a letter not an integer! Try again!".

Below is a graphic to illustrate two kinds of barriers. The first one is the barrier of classes/functions which separate the input classes from the "safe zone" of pure functions/classes. You will also notice that one of the classes has a second barrier on it. This is the interface for the class which will validate data coming into the class. In other words, the public properties and methods which ensure that the class is always set into a valid state. This is a class barrier I will speak about next.

Attached Image

Class Barriers (The Interface)

We often write classes to be reusable and why should we reinvent the wheel when we don't have to? Classes should always ensure that the data they receive keeps the class valid. At no time should your Color class say it is of color "February 17, 2011". Makes no sense. We as programmers should setup barriers in the interface methods and properties of our class to ensure that the class is created with proper data. This also goes for any time we want to alter the state of that class these methods/properties will keep the class valid. So when we call Color's "setColor()" method we should be able to validate that the incoming data is indeed a valid color name for a color before we change the class to represent said color. Any internal helper methods for Color on the other hand already knows that the internal data is good, no validation necessary.

Helper methods within these classes, private to the class and the hidden from the outside world, can then be in its own "safe zone" within the class. It can always assume that the class' private data members are valid and safe to use. In other words no validation is necessary... unless of course the class is reaching out for possibly dirty input again.

So these barriers we setup can help us in the following ways....

1) Provide a central location for creating validating routines to scrub and validate data from all kinds of sources. Easy for maintenance.
2) The barrier classes and routines can be portable. Yay for reusability!
3) We cut down on the need for validation code for all of our classes. We can get to work on solving problems without worrying about the incoming data being tainted and needing to be cleaned for each and every function. (Simplification and reduced complexity)
4) We keep classes in a proper and valid state which will cut down on errors later when we use them in other areas of code.
5) Less time validating every step of the way cuts down the bloat and increases the performance.

Other places you might want to consider these validation barriers...

1) Between subsystems in a complex system (Between items in a block diagram for those engineers out there)
2) Anywhere your system is expected to output valid data (Barriers can help our systems strive to be good citizens and output valid info)
3) Perhaps to act as wrappers around possibly unsafe or dangerous code. (Wall off bad code with validation)

With these barriers put in the correct places, we can keep our code secure and quickly isolate bugs as they appear. These barriers create choke points where bad data is forced through some kind of cleaning process before being sent through. Then once cleared, it can be treated with a certain level of expectations. I hope you enjoyed the entry and I look forward to writing another article to help you all become better programmers! Thanks for reading! :)

If you want more blog entries like this, check out the official blog over on The Coders Lexicon. There you will find more code, more guides and more resources for programmers of all skill levels!

4 Comments On This Entry

Page 1 of 1

Sergio Tapia Icon

17 February 2011 - 04:43 PM
I enjoyed reading this. :) Very well written!

I've been trying to find a groove for the style of programming I have, but I can't seem to feel comfortable enough with a single approach. For example, I've tried validating input via public setter methods, but I don't really know where/how to let the calling code know their input is incorrect.

These days I'm much more reliant on an MVC pattern. I try to use that in most of my applications. What do you think is best, to show a GUI message that the input is incorrect directly from the model, or to set the class state as invalid? If I do the latter I'd have to check if the state is valid from the GUI so in a sense I'm checking things twice no? Eager to hear how you do it. :)
1

Martyr2 Icon

17 February 2011 - 04:58 PM

Sergio Tapia, on 17 February 2011 - 03:43 PM, said:

I enjoyed reading this. :) Very well written!

I've been trying to find a groove for the style of programming I have, but I can't seem to feel comfortable enough with a single approach. For example, I've tried validating input via public setter methods, but I don't really know where/how to let the calling code know their input is incorrect.

These days I'm much more reliant on an MVC pattern. I try to use that in most of my applications. What do you think is best, to show a GUI message that the input is incorrect directly from the model, or to set the class state as invalid? If I do the latter I'd have to check if the state is valid from the GUI so in a sense I'm checking things twice no? Eager to hear how you do it. :)


Great question for sure. I am one who believes that you should never ever put your class into an invalid state. If something goes wrong, you can catch it with one of these barriers before ever reaching your class or in its interface before reaching the private member data. The main problem I see with invalidating the class is that at some later point in time you have to revalidate it again before using it. Sure this check might be as simple as checking a property is > 0 or something, but why risk having a rogue dead class instance floating around where any number of routines that want to use it could forget to check if it is invalid and attempts to use it? I say kill bad data as soon as it comes into the system and then anywhere it is coming into a class (or function if you are dealing with a non OOP type of environment like a PHP script).

As for your MVC question, it can be a bit hard to tell. Your model can be setup to be in your "safe zone" where all data that is to go in it to begin with is safe and thus you can expect all of it that comes out is safe. I for one rather kill invalid data at the view for input validation and at the controller for business data. But if you wanted to take this approach I have outlined, I would setup the layer between your View and Controller. By the time it is entered and ready to be passed on to the appropriate model, it should be valid for input and valid for business rules.

That is just my take. Thanks for the comment! :)
6

Cyclopses Icon

23 May 2011 - 07:06 AM
Amazing idea! Am sure to use this in my following project.
Agree with Sergio, it was very pleasant to read.

Would you mind if I refer to this post in my internship report for school? ;)
0

Martyr2 Icon

23 May 2011 - 06:21 PM
Cyclopses, sure thing. Thanks! :)
0
Page 1 of 1

August 2014

S M T W T F S
     12
3456789
10111213141516
17181920212223
242526272829 30
31