By Oleg Sadykhov, X by 2
Toronto, ON (Jan. 25, 2017) – Large information technology programs are often multi-year initiatives with large teams – more than 20 to 30 people, often much larger – and budgets in eight figures. Examples include an implementation of a core system such as policy administration, claims or billing, or a large data-analytics project. The problem with large initiatives is that delivering them in a reasonable amount of time requires large teams, and large teams present large challenges such as:
- Communication: Smaller groups can communicate more efficiently and directly, but large teams require more communication and documentation that increase project overhead.
- Inefficient use of resources: Team members either wait for one another to accomplish tasks they depend on or inadvertently break each other’s functionality, which introduces rework.
- Uneven skills: More-capable people can take on larger, more complex tasks, but it’s very difficult to assemble a team of all-stars.
- Ramp-up and knowledge transfer: Building the initial team, addressing turnover and other team configuration changes create the need for new people to quickly get up to speed.
On large projects all of this becomes even more challenging due to the size and complexity of the solutions. Because issues with large teams are well known, many modern software development methodologies (especially agile) stipulate that teams should be small. For example, in Scrum, an agile methodology, teams are typically cross-functional groups of seven or so. In an ideal world, large initiatives would be broken down to create a set of small teams that works almost independently and efficiently, and ultimately assembles their work products into the overall solution. Of course, most everybody tries to break large teams into sub-teams (also called work streams, tracks and the like), but very rarely is it done well, where a variety of challenges, including communication and inefficient use of resources, are truly addressed.
This is where architecture comes in – and it’s not talking about the selection of Java versus .NET, or what integration technology to employ, or what types of servers and networks to use – though these topics are all important. Instead, it’s about architecture as decomposition: how big systems must be broken down into smaller pieces that will lend themselves well to independent work by teams. These pieces must be autonomous and loosely coupled. The task is not easy, but it’s important since without a good breakdown of systems into parts (variously called subsystems, modules, services or bounded contexts; subsystems for purposes of this article) there will be no good breakdown of large teams into efficient and independent sub-teams. It’s natural for sub-teams to form around subsystems. The stakes are even higher than just the team breakdown. If the system is not well-architected, it will result in a complex and inflexible solution. Cost overruns and major failures are inevitable.
Every project team that has dealt with big system implementations has hit the “wall.” Suddenly, development tasks that used to take a week start to take a month and it’s not clear why. This is a direct result of increasing complexity and insufficient architecture. If the parts of the overall solution overlap, then the teams responsible for these components will end up performing similar tasks. This leads to duplication of effort, inconsistent solutions and similar defects that show up in various parts of the system that seem elusive and hard to eliminate. Project teams will underperform if the architecture is poor. Inefficient communication paths between teams will start dragging the initiative down. Teams will be mired in coordination and dependency nightmares and spend their valuable time adjusting to the work performed by others.
Creating Good Architecture
Let’s take a simplified example to illustrate the challenges. Assume that the project is the implementation of a policy administration system where insurance policies are stored and maintained. Further assume that customer information is also stored and updated by the same system. In order to break our system down into subsystems, a logical place to start might be to separate the functionality related to policies from the functionality related to customers. In this way we’ve created a policy subsystem that will deal with policy information, and a customer subsystem that will deal with customers. This is an example of a good architecture that has subsystems that are autonomous and loosely coupled. This creates the ability to make and release changes to these subsystems independently of one another. In this example, policy and customer subsystems should now be able to grow and evolve independently of each other.
A typical system consists of user interface components, business logic components and data. When breaking the system down into subsystems, consideration is given to each of the layers listed. In order to achieve the independence of the subsystems, all of the layers must be dealt with – UI, business and data, including their corresponding business components. Separation of the business components is probably the easiest task (though still not easy) and as such it’s the only thing that is typically attempted. Take for example the many implementations of Service Oriented Architecture over the past several years, where there is an incorrect belief that good architecture results from exposing business components such as Web services, which implies loose coupling and autonomy.
But if both the customer and the policy subsystems continue to share the same underlying database, then any work done on customer functionality subsystem can inadvertently break the policy subsystem. In this configuration, when one subsystem is released, the others must be tested as well. And when the database is down, then both subsystems are down. So where is the autonomy? The same can be said of the UI. It’s natural for the same screen to present information about both policies and customers, but the way such screens are often built further entangles the two subsystems. The challenge of architecture is to keep the criteria of autonomy and the loose coupling of subsystems very clearly in mind, and tackle the difficult issues such as data separation and the disentanglement of user interfaces to accomplish it.
Data Replication
So what’s the problem with breaking the data down and letting the subsystems own their data in order to avoid one large database that supports it all? The problem is that the policy subsystem legitimately needs customer information, and the customer subsystem may also need to know information about the policies the customer owns. Clearly, the data needs to be shared, and there are a couple of choices for accomplishing that:
- Store the data in one place and provide access to it. In this scenario the customer data is truly owned and only accessible by the customer subsystem. To share data, the customer subsystem can either expose the data via Web services, or provide UI screens or widgets for others to use in order to access the data.
- Allow the partial replication of the data. In this scenario, the replication of some of the customer data elements gets created so the policy subsystem can store it in its own database.
The idea of storing data in one place initially looks advantageous, but after a deeper look, it presents several challenges. In many data intensive situations, centralized data access presents a performance challenge. Many off-the-shelf packages expect the data they need to be stored locally in the application database (this is called a replication scenario). Some of the popular enterprise subsystems will be difficult to scale, since everyone will need the same data. They also will be hard to change without affecting everyone. Additionally, downtime of one subsystem will lead to the downtime of others that depend on it. The bottom line is that data replication is an important tool to achieve the goals of autonomy and loose coupling of subsystems.
There is no hard-and-fast rule to say when data should be centralized and when replication makes sense. To decide, think about the amount of data in play, how much data processing is done by the subsystem that owns the data, how impactful the downtime of one subsystem will be on another, and so on. There is no hard-and-fast rule to say when data should be centralized and when replication makes sense. Of course, the moment one starts down the path of data replication is the moment one needs a good framework in order to implement data ownership and the corresponding rules required for the subsystems. One such framework is master data management, which is a tried-and-true way to track owners of data and the rules that should be followed.
CQRS Aids Architecture
Another relatively recent framework that helps one think about separation between subsystems, data ownership, and much more is called the Command and Query Responsibility Separation. The idea is fairly simple: It advocates separation of data that supports modifications of the system from data that supports inquiries. According to CQRS, systems should have two parts – one for data modifications and the other for data inquiries – each with their own databases. Even without fully following CQRS, the concepts introduced by CQRS are helpful when working on and discussing architecture concepts such as:
- Commands: Requests to modify data (that is, to create a new customer). Their names should be in the present tense and sound like commands. Their processing involves data validation, execution of business rules and changing the data.
- Events: Notifications about data changes (such as, new customer created). Names of events should use past tense to indicate that they’re just a notification of a change that’s already occurred. Events notify interested parties that a data change has been approved and recorded. They’re typically used to trigger other processing or to update decentralized data replicas.
- Queries: Read-only requests to access the for specific data record details. Here’s how the CQRS approach can be applied to our example of policy and customer subsystems. Policy and customer subsystems will “own” their data and be responsible for processing commands to change their data. Upon successful changes of data, they will publish events notifying the other subsystems. To accomplish this, a new subsystem is created – the operational data store (ODS) – that will combine data from policy and customer components in a way that is optimized for inquiries. The ODS will change its data in response to events published by the policy and customer subsystems. Therefore, to build a solution that involves interactions with the policy and customer subsystems, the ODS will be used to fulfill inquiries, and when data changes are made commands will be sent to the respective subsystems (policy and customer) to process them appropriately.
In summary, software architects should consider newer approaches such as CQRS when working through the challenges of the decomposition of their systems. The goal is to achieve autonomy and loose coupling of any subsystems. If successful, it can pave the way for small, independent, and efficient teams that have a much better chance to deliver large and complex enterprise initiatives successfully.
About the Author
Oleg Sadykhov, principal at X by 2, a technology consultancy focused on the practice of architecture in the insurance and healthcare industries. For over 25 years, Sadykhov has built a proven reputation for delivering practical, scalable, and extensible architectures and solutions, and for successful architecture refactoring on large-scale enterprise initiatives. He holds a Master of Science in Computer Science from the Moscow Steel and Alloys University.
About X by 2
Founded in 1998 and with offices in the US and Canada, X by 2 is a technology consultancy focused on the practice of application and data architecture in the insurance industry. Whether Property and Casualty, Life, or Health, X by 2’s Architects and Program Leaders understand the insurance business and have proven experience planning and delivering core insurance systems, strategic business applications, and enterprise integrations. For more information, please visit xby2.com and follow us on LinkedIn or Twitter.
Source: X by 2