Is Your Data Foundation Solid, Future-Proof, and Value-Added? | Amazon Web Services – AWS Blog

Organizations need a powerful infrastructure to realize the full value of their data. The purpose of this infrastructure is to organize data, ensure its quality, manage metadata and create a central catalog where the organization’s data can be queried. This infrastructure, called the data foundation, enables organizations to have clean, organized, and easily accessible data for better decision-making and business insights.
—Clive Robert Humby OBE, Mathematician
Humby brought awareness to big data by declaring it the “new oil.” This metaphor set the stage for data-driven innovation, AI/ML, and generative AI. Many organizations began storing structured and unstructured data at scale—sometimes obsessively. “We might need this someday” was (and still is) an oft-repeated mantra. Organizations created indiscriminate collections of data stored in file systems, databases, data warehouses, and data lakes.
—Emily Gorcenski, Data Scientist
Unfortunately, data stores often mimic flea markets: you can find many treasures there if you know what you are looking for, but you can also spend a lot of money on worthless things. Data collected without a purpose or specific use case is quickly viewed with skepticism by consumers who perceive it as a second-rate product. The origin is unclear, the quality is uncertain, and the documentation is missing. This problem is often a result of the data being managed by a separate team that lacks sufficient knowledge about the data’s origin, quality and meaning instead of the original producer.
In these cases, the data foundation is not as strong as it should be from technical and organizational perspectives. That is a problem.
It generates a lot of extra work. In my experience (at least at the companies I have worked at), up to 60% of data scientists’ time is spent organizing, cleaning, and reformatting data instead of solving business problems.
Additionally, your stored data may or may not comply with the data protection regulations of your country. Organizations must know these regulations and be able to prove their compliance. As an IT manager, I once received a seven-figure penalty notice from the data protection authorities. The reason was an employee report that we were in breach of data protection, which, thank goodness, was not the case. The fine was imposed because the data protection authority found that we hadn’t clearly documented why we were storing certain data and for how long. Fortunately we were able to refute the allegation, but having to deal with it in the first place was a lot of unnecessary and avoidable work.
Data quality is particularly important with generative AI. These foundation models produce generic data and fail to create competitive advantages because your competitors are likely using the same models and generating the same results. You have to train or customize the models with your own data, but doing this with low-quality data can generate poor results or reinforce existing biases in the model.
These data foundation issues are often underestimated and overlooked by managers for several reasons:
First, most managers and employees lack data literacy. Gartner defines data literacy “as the ability to read, write and communicate data in context, including an understanding of data sources and constructs, analytical methods and techniques applied, and the ability to describe the use case, application, and resulting value.” Poor data literacy is ranked as the second-biggest internal roadblock to the success of the CDO’s office, according to the Gartner Annual Chief Data Officer Survey.
Second, there are rarely processes put in place to regularly assess and monitor the probability and impact of data storage and use risks.
Third, there are rarely data inventory overviews that managers can understand. If there is a data inventory, it is made for data scientists using very specific technical information.
Do you know the state, risk, and value of the data in your company? And if not, who could provide you with an evaluation at the push of a button?
A strong data foundation consists of four dimensions:
Generative AI can make a valuable contribution to a future-proof data foundation. Large Language Models (LLMs) like the Amazon Titan models can assist in profiling your data, extracting and enriching metadata, maintaining your data catalog, and enhancing search with natural language. However, as with all generative AI applications, you still need to critically review the AI’s results and suggestions (e.g., is the generated metadata correct?).
Data and data infrastructures may seem complicated and confusing, but they can be used clearly and securely. Your organization’s data creates many opportunities; you just need to use them.
If you process, store, and refine data properly, you can achieve amazing results that get even better over time. If you don’t handle it carefully, it quickly loses quality and becomes useless.
What are your experiences with data foundations? I would be interested in hearing about some of them.
How to Build Data Capabilities, Ishit Vachhrajani
How to Create a Data-Driven Culture, Ishit Vachhrajani
Unmasking Your Organization’s Data Problem, Joe Chung
Matthias joined the Enterprise Strategist team in early 2023 after a stint as a Principal Advisor in AWS Solutions Architecture. In this role, Matthias works with executive teams on how the cloud can help to increase the speed of innovation, the efficiency of their IT, and the business value their technology generates from a people, process and technology perspective. Before joining AWS, Matthias was Vice President IT at AutoScout24 and Managing Director at Home Shopping Europe. In both companies he introduced lean-agile operational models at scale and led successful cloud transformations resulting in shorter delivery times, increased business value and higher company valuations
View Comments
This article was autogenerated from a news feed from CDO TIMES selected high quality news and research sources. There was no editorial review conducted beyond that by CDO TIMES staff. Need help with any of the topics in our articles? Schedule your free CDO TIMES Tech Navigator call today to stay ahead of the curve and gain insider advantages to propel your business!

