David Benoit: An Object Oriented Approach to Data Driven Software Development

Object oriented programming methodologies have existed for decades, yet there is still one area of software development where encapsulation has yet to become a solid paradigm. Much of the software written that interacts with databases relies on "raw" SQL inline with the other code. What is the alternative? sprintf statements to C buffers, overflows and SQL injection attacks, are all too common, even after being exposed as flawed design. I will present a system for interacting with databases that we have been successfully using for years. A design which brings data encapsulation and data hiding to fruition while maintaining high performance. We use stored procedures at Starscale. Here are the benefits: Encapsulation We create interfaces and APIs in the rest of our code, so why wouldn't we do the same for our databases? Nobody should know or care how the data is organized or manipulated. This allows the database experts to modify the underlying data organization without affecting the applications. In fact, when done correctly, the database can be completely refactored without the apps having to be changed or recompiled. Good encapsulation will also allow for simplified unit testing. Performance With most dataservers (including Postgresql, MS SQL and DB2), stored procedures allow the dataserver a chance to precompile, optimize, and cache all of the SQL statements that will be executed. This saves an enormous amount of time over the classic use of raw SQL directly from clients. The dataserver and/or DBA performs optimization periodically (through the use of index and table statistics) rather than on every execution. The network traffic is also reduced, since the client typically will make one stored procedure call where they would have made multiple SQL calls before. Also, the direct inclusion of frequently-changing data values in SQL is completely avoided through the use of procedure arguments. Similar benefits can be achieved through the use of prepared statements, but those typically only live as long as the dataserver connection whereas precompiled stored procedures persist on the dataserver. Parallel Development During design, the interface between clients and the database can be negotiated and agreed upon. Then everyone can implement their responsibilities in parallel. Even if there is a small number of developers who are all doing application and database work, division of responsibility allows the developers to focus on the layer they are working on and not worry about other parts of the application. This makes it far easier to write clean, well understood and easily maintainable code both in the stored procedures and the application. Good use of Expertise Not all coders are good database designers, and vice versa. Using stored procedures allows people to work in their area of expertise. Security You can drop all permissions on your tables and only grant execute on stored procedures to certain users. This makes it impossible for application logins to do anything other than call the particular stored procedures that were designed for them. Also, stored procedures (unless specifically written to do so) to not allow SQL-injection attacks. All arguments are passed in as variables and are not interpreted as SQL commands. Compile Checks The database (Sybase and SQL Server for sure, likely DB2) will actually compile the SQL in the stored procedures. This moves the bulk of the SQL errors (syntax, bad column names, etc) from app runtime to database compile time, vastly reducing the application debugging required. This is extremely valuable in an "agile" iterative development environment. Versioning a set of stored procedures with a defined interface can be more easily versioned in SVN. As long as application developers know what version of the interface they want to code against, they can checkout the set of stored procs that implement the interface and do their work. Testing We can create extensive test routines for stored procedures, ensuring that all of the necessary conditions are met for all interesting input. Regression testing is also simplified because the interface to a procedure doesn't normally change. Update the procedure to fix a bug or a performance issue and re-run the tests. In this presentation I discuss these features of how we interact with databases, and how we bring it together in real life.
Length: 40:46
Views 454 Likes: 2
Recorded on 2012-09-17 at Postgres Open
Look for other videos at Postgres Open.
Tweet this video