Need help solving a code design problem: API validation
I am working on this issue that I can't seem to properly be able to solve. No matter what I do I can't shake the feeling of code smell from it. It involves a chain of validations and business operations that I want to separate. The simple scenario is this: I have a web API that receives a data transfer object and needs to perform some action, but then it gets murkier and murkier.
The first implementation is pretty obvious. I get the DTO, perform checks on it, access the database to get more info, perform more checks, do more work until everything is done. It's easy to do, clear imperative linear programming, it does the job. However I get a bad smell from the code because I have mashed together validation and business logic. I would also like to reuse the validation in other areas of the project.
So the natural improvement is to create two flows instead of one, each with their own concern: a validator will perform checks so I know an operation is valid, then the manager performs the operation. But this soon hits a snag. Let say I want to validate that a record exists from the id that I send as a property in the DTO. If I do this check in the validator, then it accesses the database. It stinks! I should probably get the record with the manager, but then I have to validate that the record was returned and that the data in it is valid in the context of my DTO. What do I do?
OK, so I can split the validation and operational flows in methods that will live in their respective classes. In this scenario a validator checks the record id is not malformed, a manager attempts to get the record, then the validator executes another method to check the record exists and conforms to the expectations, and so on. Problem solved, right? But no. That means I have to pass the record to the validating method, or any number of records that I need to perform the validation. For example I want to check a large chain of records that need to be validated based on conditions in the DTO. I could get the list of all records, pass them to the validator and then check if they exist, but what if the database is very large?
Wait, I can be smart about it. I know that essentially I want to get a list of records then validate something. Can't I just create a specific method in the manager that gets only the records involved in validation? Just do manager.GetRelatedRecords(dto.Id) to get a list of records from the database, then pass them as parameters to the validator methods. Fixed it, right? Nope. What if there is a breaking condition that is related to validation? The original linear method would just go through the steps and exit when finished. It wouldn't get a bunch of unnecessary records then fail at the first later on. And there is even a worse problem: what if while I was busy getting the records and preparing for validation the records have changed? I am basically introducing a short lived cache, with all the problems arising from it, like concurrency and invalidation.
Hey, I can just new TransactionScope the shit out of it, right? Do steps of Validate, GetMoreData and repeat all inside a transaction. Anything fails, just rollback, what is the big deal? Well, concurrent access being slowed or deadlocks would be problematic, but more than that I am terrified of the result. I started with a linear flow in a single method and now I have two classes, each with a lot of methods which get more and more parameters as I move on: first I validate the DTO values, then a record in relation to the DTO, then two records in relation to each other and the DTO and so on. All these methods are being executed inside a transaction using a local ad-hoc memory cache to hold all the data I presume I will need. The manager methods are either using a big list of parameters themselves or repeatedly instantiating the same providers for the data they get. It's a lot more complex than the original, working design. It's also duplicated flow, as it needs to at least partially go through the data to populate what I need, then it goes through the same flow during the validation.
Ha! I know! Instead of passing an increasing number of parameters to validation methods, create a single context object in which I just populate properties as I go. No more mega methods. Anything can be solved by adding another layer of indirection, correct? Well, now I have three classes, not two, with the context class practically constituting a replacement of the local scope in the original method. The validation methods are now completely obscure unless you look inside them: they all receive the DTO and the context class. Who knows which properties in the context are being used and how?
Ok, maybe I rushed into things. Sometimes several layers of indirection are required to solve a single problem. In this case I will create an interface with the properties and methods used by every validation method. So the name validation will look like ValidateName(MyDTO dto, INameContext nameContext), the graph validation will look like ValidateGraph(MyDTO dto, IGraphStructure graph) and my context class will implement INameContext and IGraphStructure, even when the two interfaces have common members. I will do the same with the manager class, which will implement the interfaces required to populate the specific context interfaces.
This isn't one of those "how the junior and the senior write the same code" jokes. I actually don't know what to do. The most "elegant" solution turned a working method from an API class into three classes, transactions, a dozen interfaces and, best of all, still keeping the entire logical flow in the original method. You know how when a code block becomes too complex it needs to be split into several short methods, right? Well I feel like in this case I somehow managed to split each line in the original block into several methods. I can see the Medium article title now: "If your method has one line of code, it's too long!".
Any software developer worth his salt will write unit tests, especially for a code such as this. That means any large validation method will need to be split just in order to not increase unit test size exponentially. Can you imagine writing unit tests for the methods that use the huge blob-like context class?
So help me out here, is there an general, elegant but simple solution to API separation of concerns in relation to validation?
Simply code
The first implementation is pretty obvious. I get the DTO, perform checks on it, access the database to get more info, perform more checks, do more work until everything is done. It's easy to do, clear imperative linear programming, it does the job. However I get a bad smell from the code because I have mashed together validation and business logic. I would also like to reuse the validation in other areas of the project.
Manager and Validator
So the natural improvement is to create two flows instead of one, each with their own concern: a validator will perform checks so I know an operation is valid, then the manager performs the operation. But this soon hits a snag. Let say I want to validate that a record exists from the id that I send as a property in the DTO. If I do this check in the validator, then it accesses the database. It stinks! I should probably get the record with the manager, but then I have to validate that the record was returned and that the data in it is valid in the context of my DTO. What do I do?
Manager and Validator classes
OK, so I can split the validation and operational flows in methods that will live in their respective classes. In this scenario a validator checks the record id is not malformed, a manager attempts to get the record, then the validator executes another method to check the record exists and conforms to the expectations, and so on. Problem solved, right? But no. That means I have to pass the record to the validating method, or any number of records that I need to perform the validation. For example I want to check a large chain of records that need to be validated based on conditions in the DTO. I could get the list of all records, pass them to the validator and then check if they exist, but what if the database is very large?
Manager methods that help validation and Validator
Wait, I can be smart about it. I know that essentially I want to get a list of records then validate something. Can't I just create a specific method in the manager that gets only the records involved in validation? Just do manager.GetRelatedRecords(dto.Id) to get a list of records from the database, then pass them as parameters to the validator methods. Fixed it, right? Nope. What if there is a breaking condition that is related to validation? The original linear method would just go through the steps and exit when finished. It wouldn't get a bunch of unnecessary records then fail at the first later on. And there is even a worse problem: what if while I was busy getting the records and preparing for validation the records have changed? I am basically introducing a short lived cache, with all the problems arising from it, like concurrency and invalidation.
Transactions
Hey, I can just new TransactionScope the shit out of it, right? Do steps of Validate, GetMoreData and repeat all inside a transaction. Anything fails, just rollback, what is the big deal? Well, concurrent access being slowed or deadlocks would be problematic, but more than that I am terrified of the result. I started with a linear flow in a single method and now I have two classes, each with a lot of methods which get more and more parameters as I move on: first I validate the DTO values, then a record in relation to the DTO, then two records in relation to each other and the DTO and so on. All these methods are being executed inside a transaction using a local ad-hoc memory cache to hold all the data I presume I will need. The manager methods are either using a big list of parameters themselves or repeatedly instantiating the same providers for the data they get. It's a lot more complex than the original, working design. It's also duplicated flow, as it needs to at least partially go through the data to populate what I need, then it goes through the same flow during the validation.
Context
Ha! I know! Instead of passing an increasing number of parameters to validation methods, create a single context object in which I just populate properties as I go. No more mega methods. Anything can be solved by adding another layer of indirection, correct? Well, now I have three classes, not two, with the context class practically constituting a replacement of the local scope in the original method. The validation methods are now completely obscure unless you look inside them: they all receive the DTO and the context class. Who knows which properties in the context are being used and how?
Interfaces and Flow
Ok, maybe I rushed into things. Sometimes several layers of indirection are required to solve a single problem. In this case I will create an interface with the properties and methods used by every validation method. So the name validation will look like ValidateName(MyDTO dto, INameContext nameContext), the graph validation will look like ValidateGraph(MyDTO dto, IGraphStructure graph) and my context class will implement INameContext and IGraphStructure, even when the two interfaces have common members. I will do the same with the manager class, which will implement the interfaces required to populate the specific context interfaces.
I am not kidding!
This isn't one of those "how the junior and the senior write the same code" jokes. I actually don't know what to do. The most "elegant" solution turned a working method from an API class into three classes, transactions, a dozen interfaces and, best of all, still keeping the entire logical flow in the original method. You know how when a code block becomes too complex it needs to be split into several short methods, right? Well I feel like in this case I somehow managed to split each line in the original block into several methods. I can see the Medium article title now: "If your method has one line of code, it's too long!".
Unit Tests
Any software developer worth his salt will write unit tests, especially for a code such as this. That means any large validation method will need to be split just in order to not increase unit test size exponentially. Can you imagine writing unit tests for the methods that use the huge blob-like context class?
Cry for help
So help me out here, is there an general, elegant but simple solution to API separation of concerns in relation to validation?
Comments
I was trying to completely separate data retrieval from validation, therefore I didn't like passing repositories to the validator class. In the end, though, I would probably have to pass some sort of interface that is limited to retrieving the necessary data for validation, so it's sort of the same. Thanks for the comment.
SideriteWhat about using a repository pattern? Or is that what you mean by 'Interfaces and Flow'? You could create a repository (with interface) to retreive data from the DB. Then you could pass this repository (interface) into your validator class. Mocking the repository would allow you to unittest the validator class gl, Bart
Bart Bories