Introduction
This is my validator. There are many like it, but this one is mine.
My validator is my best friend. It is my life. I must master it as I must master my life.
marinized side of Oklahomer
Sometimes, when taking over a project, you find insufficient document, ambiguous table schema and loose validation. Suppose there is a table column named "properties_json" and the validation is so loose that you can't tell what values go into it. Probably you look up a project wiki, but it's outdated or, even worse, there is no entry about it. Now things are pretty tough on you.
Sometimes I've done this to someone and sometimes someone has done this to me. With these experiences I acquired some habits:
- Tighter validation
- makes clear what values come in and out. If it is tight enough, when wiki or document is outdated, it works as living document that declares detailed specs.
- Detailed document
- POD, wiki, README.mkdn or any other kinds. If appropriate, parse *.mkdn and POD and display them on admin pages so everybody sees them and keeps them updated.
- More full-line/inline comments
- including links to wiki, quote from document and regular comments
This article covers how I use Data::FormValidator to achieve the first habit above.
E pluribus unum / Out of Many
First things first. Why Data::FormValidator? For form and/or data validation purpose, there are many modules including FormValidator::Simple, FormValidator::Lite, Smart::Args, Data::Validator to name a few. I'm not quite sure which validator is the best so I benchmarked some form validator modules. The result is as follows:
#!/usr/bin/env perl
use strict;
use warnings;
use Modern::Perl;
use FormValidator::Lite qw/Email/;
use Data::FormValidator;
use Data::FormValidator::Constraints qw/:closures/;
use FormValidator::Simple;
use Benchmark qw/:all/;
use CGI;
say "Perl: $]";
say "FormValidator::Simple: $FormValidator::Simple::VERSION";
say "FormValidator::Lite: $FormValidator::Lite::VERSION";
say "Data::FormValidator: $Data::FormValidator::VERSION";
#Perl: 5.018001
#FormValidator::Simple: 0.29
#FormValidator::Lite: 0.37
#Data::FormValidator: 4.81
# Rate FormValidator::Simple Data::FormValidator FormValidator::Lite
#FormValidator::Simple 1034/s -- -20% -66%
#Data::FormValidator 1294/s 25% -- -57%
#FormValidator::Lite 3030/s 193% 134% --
my $q = CGI->new;
$q->param(name => 'oklahomer');
$q->param(mail1 => 'sample@sample.com');
$q->param(mail2 => 'sample@sample.com');
cmpthese(
10000,
+{
"FormValidator::Simple" => sub {
my $res = FormValidator::Simple->check($q => [
name => ['NOT_BLANK', [ 'LENGTH', 5, 10 ]],
mail1 => [qw/NOT_BLANK EMAIL_LOOSE/],
mail2 => [qw/NOT_BLANK EMAIL_LOOSE/],
+{mails => [qw/mail1 mail2/]} => [qw/DUPLICATION/],
]);
},
"FormValidator::Lite" => sub {
my $res = FormValidator::Lite->new($q)->check(
name => ['NOT_NULL', [ 'LENGTH', 5, 10 ]],
mail1 => [qw/NOT_NULL EMAIL_LOOSE/],
mail2 => [qw/NOT_NULL EMAIL_LOOSE/],
+{mails => [qw/mail1 mail2/]} => [qw/DUPLICATION/],
);
},
"Data::FormValidator" => sub {
my $res = Data::FormValidator->check($q, +{
required => [qw/name mail1 mail2/],
constraint_methods => +{
name => FV_length_between(5, 10),
mail1 => email(),
mail2 => [email(), FV_eq_with('mail2')],
},
});
},
}
);
__END__
I am a bit confused to find that FormValidator::Simple is the slowest. Among these three Data::FormValidator is the oldest and FormValidator::Lite is the latest. So I assumed Data::FormValidator would be the slowest, but apparently it wasn't. FormValidator::Lite seems pretty fast, but its version is still less than 1.00 and has some experimental methods; Data::FormValidator, on the other hand, has longer history and is stable. Also, D::FV makes easy things easy and difficult things possible for me. User documentation and technical documentation are well documented, too. And my product uses Data::FormValidator after all so there I am.
Basics
Flow
When
check() is called, it checks the validation profile and creates Data::FormValidator::Result object via D::FV::Result#new. In this initialization,
_process() is called and that's where validation is done. In _process() things are done this order:
- filters are applied
- filters
- field_filters
- field_filter_regexp_map
- prepare required params
- required
- required_regexp
- require_some
- remove empty fields
- check dependencies
- dependencies
- dependency_groups
- add default values to unset fields
- defaults_regexp_map
- defaults
- check required
- required
- require_some
- check constraints
Validation Profile
Declaration Order
To me one of the most important thing is that this profile tells us what values are required and what optional values can go; It, in other words, tells us what can not go through here. So I code validation profile in this order:
- required, require_some, dependency_groups, dependencies ... these make clear what values are required.
- optional ... needless to say it declares what fields are optional
- filters, field_filters ... states how each values should be altered
- default ... declares default values for optional fields
- constraint_methods ... Last Line of Defense
In this way you can see important things on top. Required fields go first and optional comes second so you can see what field values come. Then filter part follows and it tells how to alter the input values. Default values come after filter because in _process() filters are applied before default values are set so it is easier to understand if filter and default are written in this order. Finally constraint_methods comes and checks everything.
It looks like something below.
use constant {
FALSE => 0,
TRUE => 1,
}
Data::FormValidator->check($input, {
required => [qw/user_id user_id_confirm/], # MUST be set
require_some => {
# At least 2 of these must be set
city_or_state_or_zipcode => [ 2, qw/city state zipcode/ ],
},
dependency_groups => {
# if one is given, then all are required
basic_auth => [qw/realm username password/],
},
dependencies => {
# if "oh please send me junk mails" is checked, email must be given
send_dm => {
TRUE() => [qw/email email_confirm/],
}
},
optional => [qw/website hobbies/],
filters => ['trim'],
field_filters => {
hobbies => [
# comma separated hobbies to arrayref
sub {
[split q{,}, shift]
},
# trim each hobby value after split
# BEWARE: "filters" is applied before field_filters is applied
'trim'
]
},
default => {
page => 1, # page number
rows => 30, # max rows per page
},
constraint_methods => {
},
});
Avoid Using *_regexp_map
On profile declaration, I don't really like to use *_regexp_map. As I described earlier, I expect profile to make clear what values go in. So I avoid *_regexp_map and like to clearly state each field.
Define Constraint Methods in One Place
MyApp::Validation::Rules or some place where you can always refer to from each Model.
Use Constants in Regular Expressions to Make Rules Readable
As I described in '
How and when I use constants in regular expression,' I use constants in regular expression to increase readability when I use regular expression as constraint method. I also explained using constants can decrease performance, but validation rules are vital so I think we shouldn't hesitate to make a use of it.
Sample is as follows.
package MyApp::Constants;
use strict;
use warnings;
use utf8;
use parent 'Exporter';
our @EXPORT;
use Exporter::Constants (
\@EXPORT => {
CAMPAIGN_TYPE_HALLOWEEN => 1,
CAMPAIGN_TYPE_THANKSGIVING => 2,
CAMPAIGN_TYPE_CHRISTMAS => 3,
}
);
1;
package MyApp::Validator::Constraints;
use strict;
use warnings;
use utf8;
use parent 'Exporter';
use Module::Functions;
use MyApp::Constants;
our @EXPORT = Module::Functions::get_public_functions();
sub VALID_CAMPAIGN_TYPE () {
qr/\A
(?:
${\( CAMPAIGN_TYPE_HALLOWEEN )}
| ${\( CAMPAIGN_TYPE_THANKSGIVING )}
| ${\( CAMPAIGN_TYPE_CHRISTMAS )}
)
\z/xo
}
1;
#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use lib 'lib';
use Data::FormValidator;
use MyApp::Validator::Constraints;
my $result = Data::FormValidator->check(+{
campaign_type => 2,
}, +{
required => [qw/campaign_type/],
optional => [qw//],
constraint_methods => +{
campaign_type => VALID_CAMPAIGN_TYPE,
},
});
use Data::Dumper;
warn Dumper scalar($result->valid);
#$VAR1 = {
# 'campaign_type' => 2
# };
__END__
Advanced
The profile above is self-explanatory so I am going to describe some minor futures I like to use.
Dealing with Empty Fields
By default setting, when a user provides empty string for optional fields, that field and its value are not accessible via
$result->valid(). It's O.K. as long as we are working on data creation, but when it comes to data update, this default behaviour is troublesome. If user has already set "my_previous_nickname" as his nickname and decides to remove it, empty string will be given on form submission and this nickname field is not accessible via
$result->valid(). Then the code below wouldn't update nickname field.
my $result = Data::FormValidator->check(+{
name => 'Oklahomer',
nickname => '',
email => 'nickname.is.empty@sample.com',
}, +{
required => [qw/name email/],
optional => [qw/nickname/],
});
my $valid = $result->valid;
$VAR1 = {
'email' => 'nickname.is.empty@sample.com',
'name' => 'Oklahomer'
};
$teng->update(user => $valid); # nickname is not set so this field stays as is.
Solution is pretty easy. By setting
missing_optional_valid => 1 on profile declaration, empty optional fields become accessible like example below.
#!/usr/bin/env perl
use strict;
use warnings;
use Data::FormValidator;
my $result = Data::FormValidator->check(+{
name => 'Oklahomer',
nickname => '',
email => 'nickname.is.empty@sample.com',
}, +{
required => [qw/name email/],
optional => [qw/nickname/],
missing_optional_valid => 1,
});
use Data::Dumper;
warn Dumper scalar $result->valid;
#$VAR1 = {
# 'nickname' => undef, # undef is returned now
# 'email' => 'nickname.is.empty@sample.com',
# 'name' => 'Oklahomer'
# };
1;
Dealing with Array Elements
Checking Each Element of Array
When array reference is set as field value, D::FV applies constraint method to each element.
So when user can input comma separated hobbies as part of his profile, I use
field_filter and this feature together. In this case I use plural form as key name so it seems appropriate when
$result->valid->{hobbies} becomes array reference.
my $result = Data::FormValidator->check($input, {
required => [qw/hobbies/],
field_filters => {
hobbies => [
# comma separated hobbies to arrayref
sub {
[split q{,}, shift]
},
# trim each hobby value after split
# BEWARE: "filters" is applied before field_filters is applied
'trim'
]
},
constraint_methods => {
hobbies => ALLOWED_HOBBY_RE, # checks each hobby
},
});
Checking the Number of Elements
If you are using D::FV Version 4.80 or later,
FV_num_values and
FV_num_values_between constraints are officially supported, which count the number of list elements.
constraint_methods => {
hobbies => [
FV_num_values_between(1, 5), # user can set 1-5 hobbies
ALLOWED_HOBBY_RE, # f-words are denied here
},
Of course these methods are called as many times as the number of elements because constraint methods are applied to each element. It may not be cool, but it is how constraint methods are applied in D::FV.
Validating Based on Multiple Fields
My favorite feature with D::FV is that we can validate a field based on multiple field values -- if country code says XX then phone # should be N digits and if country code says YY then phon # should be M digits.
constraint_methods => +{
country_code => VALID_COUNTRY_CODE,
phone => +{
constraint_method => sub {
my ($dfv, $country_code, $phone_number) = @_;
# check phone_number and country_code combo.
},
params => [qw/country_code phone_number/],
}
}
Conclusion
That's how I use D::FV. By declaring profile correctly it makes things easy and with appropriate use of filter/field_filter, the result is not only the valid input, but can be provided in the form that we can treat easy in later code.
My validator and I are the defenders of my data. We are the masters of our data. We are the saviors of my data.
So be it, until victory is ours and there is no enemy.
marinized side of Oklahomer