Financial Data Modernization: To Be or Not To Be on Public Cloud
Financial data are sensitive and regulation-conscious data in most countries. Companies that own, manage, process, and control such financial data are scared to put them on public cloud. The concern is legitimate since the attack surface of a public cloud is much larger than any private cloud or data centers. Companies that own financial data inside their private data center are missing out on monetizing their data since they are not exploiting technological advances of public cloud. It is like putting your money under a carpet because you are afraid of putting your money in a bank or investing the money that accrues interest or growth. You are afraid because you will lose the money. The risk-benefit of modern banks and other financial services has shown that your money safer in a bank and over a longer period it gains value (unlike the depreciating value of your money under your carpet). Data is the same. Data sitting inside a private data center loses its value over a period and when it is invested or put on a public cloud that data accrues value over a period. Born-in-the-cloud fintech (financial technology) companies can connect their services to access the data on a public must faster and easier. These fintech companies can then create new value out of these financial data. We will highlight some key factors that financial companies should consider when they are building their public cloud strategy and imperatives for migrating their financial data to public cloud. The strategy and the imperative to move data to public cloud is a subset of data modernization strategy.
Data Models and Data Zones
How do you decide which data should be moved to public cloud? It does not make sense to move “outdated data” or “data that has no value to anyone”. Most large financial institutions have petabytes of data stored in many different places and in many different forms. Creating data models and classifying data into different classes is one of the first steps financial institutions should undertake. This is not an easy task. Also, the data cannot be moved in one-shot. We should create a spectrum of data zones. Each data zone contains a version of the data as the data is moving to public cloud. For instance, a piece of data that is stored in a legacy storage can be one zone. We cannot move this piece of data immediately to public cloud storage (which is the target data zone) in one-shot. We will have to create some intermediate data zone where the piece of data is moved. As data is moved from one data zone to another data zone, the applications consuming the data should not be affected. Keep in mind new data are constantly created and infrastructure and solution should be designed to handle both the old and the new data. The data that is moved to the cloud should be linked to data that is not yet moved to the cloud. We need to ensure that tools (such as devops tools) can work across the old and new data zone.
Data Privacy and Regulations
This is probably the most important factor that financial companies worry about when moving sensitive data to public cloud. It is important for companies to fully understand the privacy and regulatory requirements before venturing on their journey to move them to public cloud. General Data Protection Regulation (GDPR), India Data Protection Bill (IDPB), and other regulations will affect data migration strategy and imperatives. Banks and financial institutions worry about legal and class actions due to data breaches that includes sensitive data such as personally identifiable information (PII), social security number, credit card number, etc. Post-Covid world has created new ways of working. Countries are creating new regulations that protects the security of the countries. IDPB for instance, expects financial institutions to process sensitive data within India region. Data location is becoming important and this data localization is anti-thesis to the philosophy of public cloud principles. Financial institutions are under pressure to create data zones that are local to a geographic boundaries.
Data security is another area which is a big concern for financial institutions. The attack surface of a public cloud is much bigger than in private data center. Much of data security (and privacy) relies on cryptographic technology to protect data and tamper-resistant hardware modules. When data is inside a private data center, the financial institutions have full control on security of the data. But then cost of managing security is high. These days public cloud provide much stronger security at a much lower cost. Public cloud provides technologies that allow financial institutions to fully control cryptographic keys within its private data center or isolated machines inside public cloud data centers. The data on public cloud are then encrypted using private keys (or even key chains). Security enclaves, hardware security module, quantum computing, and trusted executive environment (TEE) are available on public cloud that provide advance security controls. The notion of institution-controlled network perimeter and demilitarized zones (DMZs) are becoming extinct. Zero trust models are becoming popular even in public cloud environment. Many public clouds offer network-based and identity-based micro-perimeters and fine-grained access control to sensitive data and objects. Implementing rich security controls in a private data center is not cost effective, and public cloud offers them at a reduced cost. Many public cloud providers are expanding their public cloud and creating public cloud islands. Public cloud islands have all the features and services of a public cloud, but it behaves like a private cloud. The public cloud island can be inside an existing public cloud data center or in a separate building. Such private inside and public outside (PIPO) cloud are getting lots of tractions especially for regulated financial industries. Financial institutions have full control on public cloud islands and store the data in there with much better security controls and services.
Data governance is about managing the lifecycle of data through well-define policies and procedures. The lifecycle of a piece of data typically consists of creation, storage, usage, sharing, archiving, and destruction. Financial institutions have documented and managed the lifecycle for many years within their data center. When data is moved to public cloud, they must implement the same or equivalent policies and procedures for managing the lifecycle of data in public cloud. For instance, consider a piece of PII data that is stored in a secure data store inside a data center that has no external connections. Now when that PII data is moved to public cloud, it is visible to anyone in the world. Even when the PII data is encrypted, the risk of a data breach is higher than when the PII data is in a physically isolated and secure data store inside private data center. Imagine now that the encrypted data is leaked from a public cloud (say due to some vulnerability). Given the advancement of quantum computing, an attacker can eventually decrypt the data. When financial institutions move data to public cloud, they should fully understand the risk and ensure that the public cloud can indeed confirm to data governance defined by the financial intuitions. Consider another example of destroying the data. Public cloud for reasons of data availability and performance will replicate and cache data across different servers and across geographical boundaries. If for a regulatory reason (e.g., GDPR) the data must be destroyed, it is going to be bigger challenge than when the data is in a private data center.
Data observability is another area where public cloud has advanced much further than what can be developed inside a private data center. Data observability is a set of tools and methods that provide insight into the “health of curated data”. But not all data are (immediately) moved to public cloud. So, when designing for data observability it is important to have a hybrid model where observability is possible across public and private cloud. It is important to keep in mind that a piece of data does not exist in isolation, it almost always exists in relation to other data and context. When a change is made to a piece of data X, that change will impact other data that has relation to X. This relationship need not be a direct relationship. For instance, when the price of Intel stock changes, it can impact the price of housing Egypt. Such deep relations are typically captured through advanced data analysis such as deep machine learning, independent component analysis and copula analysis. When designing data observability, it is important to consider such deep relations and how change is propagated across observable data. In essence, as in classical observability, data observability is about observing changes to a piece data X when changes are made to some other data Y that has deeper relation with X. Data controllability is how changes to a piece of data X can be controlled by making changes to another piece of data Y that has deeper relation to X. Most financial institutions have developed deep relationship among various pieces of data to gain more value from the data. Public cloud is well suited for developing artificial intelligence models that capture such relations. Financial institutions can spin-off thousands of compute engine within few hours to run deep data simulations to extract deep relationship among data. Provisioning such compute power on a private data center can take months and is lot more expensive.
There are many aspects of data quality. Data integrity, consistency, validity, consistency, availability, etc. are some of the key aspects of data quality. Public cloud offers many tools, services, and methods for curating data to improve data quality. When data is stored on a public cloud, financial institutions can use crowd sourcing to curate and improve the data. These days financial institutions pull social and other public data to improve customer services. For instance, financial institutions analyze sentiments of their customers by analyzing twitter and other social media data. By combining social media data with financial data, the financial institutions can serve their customer better, and even personalize their financial products. This is possible when financial institutions use tools and services available in public cloud.
It is important to understand which hyperscaler landing cloud you want your data to move to. This again takes deep understanding the capabilities of different hyperscaler cloud providers. For financial institutions there is no one solution; sometimes you will have to land your data in more than one cloud and have a private cloud landing. Such hybrid solutions is needed for most large enterprise customers; anyone telling that they can handle all of your data needs will give you big surprises on your public cloud data journey.
Final Thoughts: A Step Towards Data Modernization
Moving data to public cloud is a step towards data modernization. We don’t move data to a public cloud because it is cool or other companies are doing it — we make careful evaluation of what should or shouldn’t move based on many factors. Data in a public cloud is like money in a bank; with proper planning your data “can work” for you and bring real business value.
There are many things that should be accomplished prior to moving data to public cloud. At a macro level we should focus on three things: (1) identify data and their source that could be moved to public cloud, (2) perform cost-risk-benefit analysis of the data, and (3) create a road map and an agile plan for moving the data. Cost-risk-benefit analysis is extremely critical and there are many parts to this analysis. The cost analysis should include people, process, tools, infrastructure, third-party service provider, etc. The risk analysis also has many parts such as regulatory risk, operational risk, project risk, security risk, etc. Finally, the benefit should outweigh the cost and the risk.
Cost-benefit-risk analysis is not new; we must decompose problem into manageable pieces and then figure out how they perform together. Here the problem is that of identifying which data should be moved to public cloud. The decomposition consists of identifying existing data and classifying them into different categories based on different dimensions. The dimensions should include sensitivity of the data, currency of the data, usefulness of the data, etc. For each class of data perform cost-benefit-risk analysis. Keep in mind that are complex dependence among different classes of data, and a piece of data can be included in more than one dimensions.
Please reach out to me if you want to learn more about how Kyndryl can help with your legacy data journey to public cloud. We are a flat, fast and focused company!
Many thanks to Dana Isaacs for reviewing the blog and providing valuable feedback!