Dec 14, 2013

Getting the most out of Data::FormValidator



Introduction

This is my validator. There are many like it, but this one is mine. My validator is my best friend. It is my life. I must master it as I must master my life. marinized side of Oklahomer
Sometimes, when taking over a project, you find insufficient document, ambiguous table schema and loose validation. Suppose there is a table column named "properties_json" and the validation is so loose that you can't tell what values go into it. Probably you look up a project wiki, but it's outdated or, even worse, there is no entry about it. Now things are pretty tough on you.
Sometimes I've done this to someone and sometimes someone has done this to me. With these experiences I acquired some habits:
Tighter validation
makes clear what values come in and out. If it is tight enough, when wiki or document is outdated, it works as living document that declares detailed specs. 
Detailed document
POD, wiki, README.mkdn or any other kinds. If appropriate, parse *.mkdn and POD and display them on admin pages so everybody sees them and keeps them updated.
More full-line/inline comments
including links to wiki, quote from document and regular comments

This article covers how I use Data::FormValidator to achieve the first habit above.

E pluribus unum / Out of Many

First things first. Why Data::FormValidator? For form and/or data validation purpose, there are many modules including FormValidator::Simple, FormValidator::Lite, Smart::Args, Data::Validator to name a few. I'm not quite sure which validator is the best so I benchmarked some form validator modules. The result is as follows:

#!/usr/bin/env perl
use strict;
use warnings;
use Modern::Perl;
use FormValidator::Lite qw/Email/;
use Data::FormValidator;
use Data::FormValidator::Constraints qw/:closures/;
use FormValidator::Simple;
use Benchmark qw/:all/;
use CGI;

say "Perl: $]";
say "FormValidator::Simple: $FormValidator::Simple::VERSION";
say "FormValidator::Lite: $FormValidator::Lite::VERSION";
say "Data::FormValidator: $Data::FormValidator::VERSION";

#Perl: 5.018001
#FormValidator::Simple: 0.29
#FormValidator::Lite: 0.37
#Data::FormValidator: 4.81

#                        Rate FormValidator::Simple Data::FormValidator FormValidator::Lite
#FormValidator::Simple 1034/s                    --                -20%                -66%
#Data::FormValidator   1294/s                   25%                  --                -57%
#FormValidator::Lite   3030/s                  193%                134%                  --

my $q = CGI->new;
$q->param(name => 'oklahomer');
$q->param(mail1 => 'sample@sample.com');
$q->param(mail2 => 'sample@sample.com');

cmpthese(
    10000,
    +{
        "FormValidator::Simple" => sub {
            my $res = FormValidator::Simple->check($q => [
                name  => ['NOT_BLANK', [ 'LENGTH', 5, 10 ]],
                mail1 => [qw/NOT_BLANK EMAIL_LOOSE/],
                mail2 => [qw/NOT_BLANK EMAIL_LOOSE/],
                +{mails => [qw/mail1 mail2/]} => [qw/DUPLICATION/],
            ]);
        },
        "FormValidator::Lite" => sub {
            my $res = FormValidator::Lite->new($q)->check(
                name  => ['NOT_NULL', [ 'LENGTH', 5, 10 ]],
                mail1 => [qw/NOT_NULL EMAIL_LOOSE/],
                mail2 => [qw/NOT_NULL EMAIL_LOOSE/],
                +{mails => [qw/mail1 mail2/]} => [qw/DUPLICATION/],
            );
        },
        "Data::FormValidator" => sub {
            my $res = Data::FormValidator->check($q, +{
                required => [qw/name mail1 mail2/],
                constraint_methods => +{
                    name  => FV_length_between(5, 10),
                    mail1 => email(),
                    mail2 => [email(), FV_eq_with('mail2')],
                },
            });
        },
    }
);

__END__

I am a bit confused to find that FormValidator::Simple is the slowest. Among these three Data::FormValidator is the oldest and FormValidator::Lite is the latest. So I assumed Data::FormValidator would be the slowest, but apparently it wasn't. FormValidator::Lite seems pretty fast, but its version is still less than 1.00 and has some experimental methods; Data::FormValidator, on the other hand, has longer history and is stable. Also, D::FV makes easy things easy and difficult things possible for me. User documentation and technical documentation are well documented, too. And my product uses Data::FormValidator after all so there I am.

Basics

Flow

When check() is called, it checks the validation profile and creates Data::FormValidator::Result object via D::FV::Result#new. In this initialization, _process() is called and that's where validation is done. In _process() things are done this order:

  1. filters are applied
    1. filters
    2. field_filters
    3. field_filter_regexp_map
  2. prepare required params
    1. required
    2. required_regexp
    3. require_some
  3. remove empty fields
  4. check dependencies
    1. dependencies
    2. dependency_groups
  5. add default values to unset fields
    1. defaults_regexp_map
    2. defaults
  6. check required
    1. required
    2. require_some
  7. check constraints

Validation Profile

Declaration Order

To me one of the most important thing is that this profile tells us what values are required and what optional values can go; It, in other words, tells us what can not go through here. So I code validation profile in this order:
  1. required, require_somedependency_groups, dependencies ... these make clear what values are required.
  2. optional ... needless to say it declares what fields are optional
  3. filters, field_filters ... states how each values should be altered
  4. default ... declares default values for optional fields
  5. constraint_methods ... Last Line of Defense
In this way you can see important things on top. Required fields go first and optional comes second so you can see what field values come. Then filter part follows and it tells how to alter the input values. Default values come after filter because in _process() filters are applied before default values are set so it is easier to understand if filter and default are written in this order. Finally constraint_methods comes and checks everything.
It looks like something below.

use constant {
    FALSE => 0,
    TRUE   => 1,
}

Data::FormValidator->check($input, {
    required => [qw/user_id user_id_confirm/], # MUST be set
    require_some => {
        # At least 2 of these must be set
        city_or_state_or_zipcode => [ 2, qw/city state zipcode/ ],
    }, 
    dependency_groups => {
        # if one is given, then all are required
        basic_auth => [qw/realm username password/],
    },
    dependencies => {
        # if "oh please send me junk mails" is checked, email must be given
        send_dm => {
            TRUE() => [qw/email email_confirm/],
        }
    },
    optional => [qw/website hobbies/],
    filters => ['trim'],
    field_filters => {
        hobbies => [
            # comma separated hobbies to arrayref
            sub {
                [split q{,}, shift]
            },
            # trim each hobby value after split
            # BEWARE: "filters" is applied before field_filters is applied
            'trim'
        ]
    },
    default => {
        page => 1,  # page number
        rows => 30, # max rows per page
    },
    constraint_methods => {
    },
});

Avoid Using *_regexp_map

On profile declaration, I don't really like to use *_regexp_map. As I described earlier, I expect profile to make clear what values go in. So I avoid *_regexp_map and like to clearly state each field.

Define Constraint Methods in One Place

MyApp::Validation::Rules or some place where you can always refer to from each Model.

Use Constants in Regular Expressions to Make Rules Readable

As I described in 'How and when I use constants in regular expression,' I use constants in regular expression to increase readability when I use regular expression as constraint method. I also explained using constants can decrease performance, but validation rules are vital so I think we shouldn't hesitate to make a use of it.
Sample is as follows.

package MyApp::Constants;
use strict;
use warnings;
use utf8;
use parent 'Exporter';

our @EXPORT;

use Exporter::Constants (
    \@EXPORT => {
        CAMPAIGN_TYPE_HALLOWEEN    => 1,
        CAMPAIGN_TYPE_THANKSGIVING => 2,
        CAMPAIGN_TYPE_CHRISTMAS    => 3,
    }
);

1;

package MyApp::Validator::Constraints;
use strict;
use warnings;
use utf8;
use parent 'Exporter';
use Module::Functions;
use MyApp::Constants;

our @EXPORT = Module::Functions::get_public_functions();
    
sub VALID_CAMPAIGN_TYPE () {
    qr/\A
        (?:
            ${\( CAMPAIGN_TYPE_HALLOWEEN    )}
          | ${\( CAMPAIGN_TYPE_THANKSGIVING )}
          | ${\( CAMPAIGN_TYPE_CHRISTMAS    )}
        )
    \z/xo 
}

1;

#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use lib 'lib';
use Data::FormValidator;
use MyApp::Validator::Constraints;

my $result = Data::FormValidator->check(+{
    campaign_type => 2,
}, +{
    required => [qw/campaign_type/],
    optional => [qw//],
    constraint_methods => +{
        campaign_type => VALID_CAMPAIGN_TYPE,
    },
});

use Data::Dumper;
warn Dumper scalar($result->valid);
#$VAR1 = {
#          'campaign_type' => 2
#        };

__END__

Advanced

The profile above is self-explanatory so I am going to describe some minor futures I like to use.

Dealing with Empty Fields

By default setting, when a user provides empty string for optional fields, that field and its value are not accessible via $result->valid(). It's O.K. as long as we are working on data creation, but when it comes to data update, this default behaviour is troublesome. If user has already set "my_previous_nickname" as his nickname and decides to remove it, empty string will be given on form submission and this nickname field is not accessible via $result->valid(). Then the code below wouldn't update nickname field.  

my $result = Data::FormValidator->check(+{
    name     => 'Oklahomer',
    nickname => '',
    email    => 'nickname.is.empty@sample.com',
}, +{
    required               => [qw/name email/],
    optional               => [qw/nickname/],
});
my $valid = $result->valid;
$VAR1 = {
          'email' => 'nickname.is.empty@sample.com',
          'name' => 'Oklahomer'
        };
$teng->update(user => $valid); # nickname is not set so this field stays as is.
Solution is pretty easy. By setting missing_optional_valid => 1 on profile declaration, empty optional fields become accessible like example below.

#!/usr/bin/env perl
use strict;
use warnings;
use Data::FormValidator;
 
my $result = Data::FormValidator->check(+{
    name     => 'Oklahomer',
    nickname => '',
    email    => 'nickname.is.empty@sample.com',
}, +{
    required               => [qw/name email/],
    optional               => [qw/nickname/],
    missing_optional_valid => 1,
});
use Data::Dumper;
warn Dumper scalar $result->valid;
#$VAR1 = {
#          'nickname' => undef, # undef is returned now
#          'email' => 'nickname.is.empty@sample.com',
#          'name' => 'Oklahomer'
#        };
 
1;

Dealing with Array Elements

Checking Each Element of Array

When array reference is set as field value, D::FV applies constraint method to each element.
So when user can input comma separated hobbies as part of his profile, I use field_filter and this feature together. In this case I use plural form as key name so it seems appropriate when $result->valid->{hobbies} becomes array reference.

my $result = Data::FormValidator->check($input, {
    required => [qw/hobbies/],
    field_filters => {
        hobbies => [
            # comma separated hobbies to arrayref
            sub {
                [split q{,}, shift]
            },
            # trim each hobby value after split
            # BEWARE: "filters" is applied before field_filters is applied
            'trim'
        ]
    },
    constraint_methods => {
        hobbies => ALLOWED_HOBBY_RE, # checks each hobby
    },
}); 

Checking the Number of Elements

If you are using D::FV Version 4.80 or later, FV_num_values and FV_num_values_between constraints are officially supported, which count the number of list elements.

    constraint_methods => {
        hobbies => [
            FV_num_values_between(1, 5), # user can set 1-5 hobbies
            ALLOWED_HOBBY_RE, # f-words are denied here
    },
 
Of course these methods are called as many times as the number of elements because constraint methods are applied to each element. It may not be cool, but it is how constraint methods are applied in D::FV.

Validating Based on Multiple Fields

My favorite feature with D::FV is that we can validate a field based on multiple field values -- if country code says XX then phone # should be N digits and if country code says YY then phon # should be M digits.

    constraint_methods => +{
        country_code => VALID_COUNTRY_CODE,
        phone => +{
            constraint_method => sub {
                 my ($dfv, $country_code, $phone_number) = @_;
                 # check phone_number and country_code combo.
            },
            params => [qw/country_code phone_number/],
        }
    }

Conclusion

That's how I use D::FV. By declaring profile correctly it makes things easy and with appropriate use of filter/field_filter, the result is not only the valid input, but can be provided in the form that we can treat easy in later code.
My validator and I are the defenders of my data. We are the masters of our data. We are the saviors of my data. So be it, until victory is ours and there is no enemy. marinized side of Oklahomer