如何使用Python中的Cerberus验证数据

在开发过程中，数据校验是不可遗漏的重要环节。数据验证的目的是为了保证数据的格式正确、有效性高、数据正确性等，从而保证程序的功能正常运行。Python语言有很多验证库，其中Cerberus是一款小巧且易于使用的Python数据验证库。本篇文章介绍了如何使用Cerberus验证数据。

阅读更多：Python 教程

Cerberus库介绍

Cerberus是一个小巧且易于使用的库，它使数据验证从繁琐无味的过程变得简单。这个库名“Cerberus”来自于古希腊神话中的守门犬——塞尔柏鲁斯，这很符合Cerberus的使命——保护应用程序不受格式不正确的数据的侵害。

Cerberus可以用于验证任何Python数据结构（字典、列表、元组、字符串、数字等）。相比于其他数据验证库，Cerberus有以下优点：

语义简单易懂
API清晰简洁
可以添加自定义规则，扩展性强

下面，我们将看看如何使用Cerberus验证数据。

安装Cerberus

Cerberus是一个Python的第三方包，你可以通过pip命令安装它：

pip install cerberus

基本使用

Cerberus提供了一个Validator类，它是Cerberus的主要API。构造函数可以用一个Schema参数来初始化验证器。Schema是一个字典类型，它描述了输入数据的格式和规则。下面是一个作为例子的验证Schema，它描述了一个用于验证人员信息的Schema：

person_schema = {
            'name': {'type': 'string', 'required': True, 'empty': False},
            'age': {'type': 'integer', 'required': True, 'min': 18, 'max': 65},
            'email': {'type': 'string', 'required': True, 'empty': False, 'regex': '^.+@.+\\..+$'}
          }

这个Schema描述了三个字段：name、age和email。name是一个必填字符串，类型是string；age是一个必填整数，范围是[18, 65]；email是一个必填字符串，它必须匹配正则表达式“^.+@.+\..+$”（一个很简单的email正则表达式）。

现在，我们可以用这个Schema实例化一个Validator，验证输入数据是否符合要求：

from cerberus import Validator

v = Validator(person_schema)

person = {
        'name': 'John Smith',
        'age': 25,
        'email': 'john@example.com'
        }

if v.validate(person):
    print("Person data is valid")
else:
    print("Person data is invalid")

这段代码构造了一个person字典，然后传递这个字典给Validator实例进行验证。如果person字典符合要求，那么validate方法将返回True，否则将返回False。在这个例子中，person符合所有的要求，因此打印“Person data is valid”。

更复杂的Schema

在之前的例子中，我们演示了如何使用Cerberus对一个简单Schema进行验证。现在，我们来看一个复杂的Schema，它描述了一个人物属性：

person_attributes_schema = {
            'name': {'type': 'string', 'required': True, 'empty': False},
            'age': {'type': 'integer', 'required': True, 'min': 18, 'max': 65},
            'email': {'type': 'string', 'required': True, 'empty': False, 'regex': '^.+@.+\\..+ $'}, 'address': { 'type': 'dict', 'required': True, 'schema': { 'street': {'type': 'string', 'required': True}, 'city': {'type': 'string', 'required': True}, 'state': {'type': 'string', 'required': True}, 'zipcode': {'type': 'string', 'required': True, 'regex': '^\\d{5}(?:[-\\s]\\d{4})?$ '}
                        }
                    },
            'job': {
                    'type': 'dict',
                    'required': True,
                    'schema': {
                            'title': {'type': 'string', 'required': True},
                            'salary': {'type': 'float', 'required': True, 'min': 5000}
                        }
                    },
            'hobbies': {'type': 'list', 'schema': {'type': 'string', 'empty': False}},
            'family_members': {'type': 'list', 'schema': {'type': 'dict', 'schema': {
                                                                                'name': {'type': 'string', 'required': True, 'empty': False},
                                                                                'relationship': {'type': 'string', 'required': True, 'empty': False}
                                                                              }
                                                         }
                               }
          }

这个Schema描述了一个含有更多字段和更深层次结构的数据格式。它包含了以下内容：

name、age和email字段，与之前的Schema相同。
address字段是一个嵌套的字典，其中包含了street、city、state和zipcode字段。如果address字段未包含所有这些字段，或者其中的任何一个字段类型不正确，都将导致验证失败。
job字段是一个嵌套的字典，其中包含了title（字符串）和salary（浮点数）字段。
hobbies字段是一个字符串列表，列表中的每个元素都必须是非空字符串。
family_members字段是一个字典列表，列表中的每个字典都必须包含name和relationship字段（都是字符串类型）。

现在，我们可以用这个Schema构造一个Validator，并验证输入数据是否符合要求：

from cerberus import Validator

v = Validator(person_attributes_schema)

person_attributes = {
        'name': 'John Smith',
        'age': 25,
        'email': 'john@example.com',
        'address': {
                    'street': '123 Main St.',
                    'city': 'Anytown',
                    'state': 'CA',
                    'zipcode': '12345'
                    },
        'job': {
                'title': 'Developer',
                'salary': 8000.00
                },
        'hobbies': ['reading', 'swimming'],
        'family_members': [{'name': 'Mary', 'relationship': 'sister'}, {'name': 'Tom', 'relationship': 'brother-in-law'}]
        }

if v.validate(person_attributes):
    print("Person attributes data is valid")
else:
    print("Person attributes data is invalid")

这段代码构造了一个person_attributes字典，并传递给Validator实例进行验证。如果person_attributes符合要求，那么validate方法将返回True，否则将返回False。在这个例子中，person_attributes字段符合所有的要求，因此打印“Person attributes data is valid”。

自定义规则

Cerberus允许你添加自定义规则（validators），这些规则可以用于验证不符合标准验证函数所支持的自定义类型。添加自定义规则允许你轻松扩展验证器，以适应任何特殊需求。下面是一个自定义规则的例子，用于验证电话号码格式：

def phone_number(field, value, error):
    regex = r'^(\\+\\d{1,2}[\\s.-]?)[0-9]{10,12} $' if not re.match(regex, value): error(field, "Invalid phone number format") person_schema = { 'name': {'type': 'string', 'required': True, 'empty': False}, 'age': {'type': 'integer', 'required': True, 'min': 18, 'max': 65}, 'email': {'type': 'string', 'required': True, 'empty': False, 'regex': '^.+@.+\\..+$ '},
            'phone': {'type': 'string', 'required': True, 'empty': False, 'validator': phone_number}

在这个例子中，我们定义了一个名为phone_number的新的验证函数，它使用正则表达式来验证输入字符串是否为电话号码。然后将这个验证函数添加到了person_schema的phone字段中。这样，在验证时就会自动使用phone_number验证函数对phone字段进行验证。

异常处理

在使用Cerberus验证数据时，我们可能会遇到多种异常情况，如验证失败、Schema格式错误等。Cerberus提供了多种异常处理方式：

Validator.errors

如果某次验证失败，可以调用Validator.errors方法，获取出错的字段及相应的错误信息。例如：

if not v.validate(person):
    errors = v.errors
    print(errors)

这段代码会打印出哪些字段存在错误及其对应的错误信息。

Validator.document_error_tree

Validator.document_error_tree方法提供了一个详细的错误树形结构。例如：

if not v.validate(person_attributes):
    print(v.document_error_tree)

这段代码会打印出一份详细的错误信息，告诉你验证器在哪些地方报错了。

exceptions

Cerberus还提供了多种异常。对于验证失败等情况，你可以选择捕获和处理这些异常：

from cerberus import Validator, SchemaError

try:
    v = Validator(invalid_schema)  # 构造无效Schema，应该会出现SchemaError
except SchemaError as e:
    print(e)

if not v.validate(person):
    try:
        v.validate(person)
    except ValidationError as e:
        print(e)

这个例子阐述了异常捕获的两个示例。在第一个try-catch块中，我们故意创建一个无效的Schema，然后在Validator初始化时应该会抛出一个SchemaError异常。在第二个try-catch块中，我们故意加载一个不符合验证规则的person字典，然后在验证时应该会抛出一个ValidationError异常。