How to Do Data Like a Big Tech Company, Instantly and Without the Cost

Updated: Apr 9

Enable Easy Data Discovery, Ensure Data Quality, and Reduce Data Redundancy


In today’s fast-changing, highly competitive business environment, there’s no longer any room for relying on hunches and gut feelings. Sure, those are great places to start, but every forward-thinking leader knows that intuition must be backed up by data. Why pull the trigger on any business action at all without first confirming its validity with the numbers – especially when you already have the data needed to make the smartest decision?


If you’ve spent any time looking through your data or speaking with your analysts, you already know the answer – it isn’t always easy to access data, understand it in context, manage the increasing data complexity that comes with company growth, ensure accuracy, bring together siloed tools and departments, navigate permissions levels…the list goes on. For all the promise and power of data for businesses of any size, the fact remains that actually leveraging this power isn’t easy.


You’re likely familiar with the 80/20 rule of data science: 80% of an analyst’s time is used searching for, evaluating, cleaning, and preparing the data, while only 20% of their time is spent on actual analysis.


On top of that, think about the amount of data you have today versus even just five or ten years ago. It’s challenging for analysts to find what they need without spending hours digging and talking to subject matter experts.


To truly maximize your data – to use it quickly and effectively to make the best possible decisions, to scale it with continued growth, to keep it unsiloed, and to stay ten steps ahead of the competition – you must adopt the same data strategies used by big tech companies like Uber, Spotify, Twitter, and Airbnb. You may not be quite that big (yet); but you can still apply their streamlined approach to your own data initiatives.


So, what is it that they do so well, and how can you do it too?


Simple – with well-organized, well-managed data.


Fast, Easy Data Discovery


Suppose you assign your data team what sounds like a relatively simple task – say, determining year over year change in product sales. The first step is to identify the relevant data sets. Typically, this involves wading through a data lake (which is really just an unorganized dumping ground for data), sorting through vast amounts of data in a warehouse, contacting colleagues in different departments to ask for specific data, waiting days or weeks for it, ensuring data governance, wrangling and preparing the data, and finally performing some hasty analysis to meet a fast-approaching deadline.


Not only is this inefficient and frustrating for the data team, it also makes mistakes much more likely, keeps you in an eternal state of catch-up instead of leading the pack, and costs you between $10 - $14 million each year, according to Gartner.


For most big tech companies, data discovery is more like a Google search. Analysts simply type in the keywords of the data they’re looking for, and instantly receive high-quality, relevant data resources across the entire company ecosystem, ready for immediate use in analysis. They don’t have to wonder if the data is outdated, and they don’t have to spend hours or days searching it out and guessing the meaning behind it. They simply get straight to work uncovering the insights that keep driving the company forward.


Most businesses do not have the time or expertise to build the internal, from-scratch platforms that enable this instant analysis. But with Datalogz, you can easily mimic their strategies – without the time and cost investment.


When the Cape Cod Fishermen’s Alliance approached us, we could immediately see that they struggled with all of the data-sourcing constraints mentioned above. This disorganization led to hours spent on data discovery instead of quickly and accurately providing insights to local wildlife registries and fishermen.


As you’re likely doing too, the Alliance was working with data from a variety of sources. For them, this included vessel data, fish migration patterns, captains’ reports, gear management, and much more. Data was constantly being lost between work groups and research teams, and there were no ways for the team to quickly discover the data needed for ad-hoc projects and in-depth research.


We were able to solve their problem by helping researchers organize the data in one central source, to quickly find what they were looking for. We also managed the business context of each dataset so that researchers with different backgrounds and goals could immediately see if the data was what they needed, instead of having to ask around. Finally, we connected all documentation directly to the data itself, eliminating the need to dig around in email threads and Slack channels for tribal knowledge.


As a result of these changes, Alliance analysts and researchers save a total of 45 hours each week. Further, they ensure they’re using the most relevant, accurate data from a single source of truth.


Ensure Data Quality, Understanding, and Documentation


Let’s say that, like the big tech companies, you did have a resource that allowed your data users to quickly and instantly retrieve the data they need. It still leaves analysts with a second, equally time-consuming issue – fully understanding it.


We like to say that insights with data live with the people who use it. This means that the only people in your organization with a comprehensive understanding of your data are your superusers and your legacy employees. So even the best and brightest new analysts on your team still have to ask colleagues what a certain set of fields in a table is referring to, or whether the data is up-to-date, or how to put it in a workable format, or how to make sense of department-specific acronyms.


In fact, we recall one organization whose analyst used the acronym “NA” for “North America.” For years, the entire company thought that referred to a null value in their sales data!


Your superusers, then (the ones who provide the most business value for your company) spend half their time answering questions and helping out their colleagues. And while teamwork is certainly admirable, you want your people ultimately focused on the work that brings the most profit. The problem is only multiplied when you consider new hires, who must go through an expensive and time-consuming onboarding journey to understand your data processes before they even start working.


And even after all the questions are sorted out, we’re still missing an important element – context. Where does the data come from? When was it last updated? Who created it? How should it be used? How does it relate to other datasets? Which queries will yield the most insightful results?


Again, the big tech companies provide immediate solutions and answers to all of these concerns, as soon as a dataset is produced by a user’s search query. Think of it like a book cover: All the important information you need to know about a book’s content, audience, and usefulness is found right there on the front and back covers. You have everything you need to quickly see whether this book is what you’re looking for, and whether it has the information you’re after.


In contrast, most companies’ data “books” have no cover at all, with pages that are torn out and spread across the organization, hiding in the desks of multiple users. Analysts stumbling across a random page have little idea how it relates to the whole, where it came from, whether it’s from the chapter they need, and where to find the next page.


This is the situation Texas A&M University was in when they organized the world’s largest data-driven hackathon. Students and researchers had 48 hours to solve various enterprises’ biggest data challenges. For example, American Airlines asked students to predict flight delays, and Dell shared challenges with computation data. In short, these companies (and many others) provided tons of data, asking students to quickly make sense of it and deliver insights.


But where to start? What could be done in only 48 hours with troves of disorganized, siloed data?


Like we did for the Fishermen’s Alliance, we centralized all data into an easily-searchable platform, provided clear context, and gave reliable answers to the researchers’ data questions. This allowed users to cut their research time in half and get straight to the work that mattered – delivering insights to some of the world’s largest organizations.


This is why it’s so important to ensure data quality and understanding. It allows your team to immediately “read the book” and apply the insights instead of wandering the halls asking busy coworkers to provide missing pages and define their meaning. Equally important, data documentation means that superuser knowledge is retained, even if employees move on to another company.


Eliminate Data Redundancy and Foster Easy Collaboration


If you have a large data team spread across multiple departments – or expect to in the near future – you’re probably already familiar with another serious problem inherent with big teams: data redundancy.


Let’s say that someone in Sales is working on a data project about customer turnover, and another analyst in Marketing is working on a project about purchase history of your company’s most popular products.


There’s going to be a lot of overlap in those datasets – customer names, items purchased, reviews written, date of last purchase, and returns, for example. What happens in most companies is that each analyst goes through the tedious processes of finding data, asking questions, and creating their own tables to perform the analyses.


How much easier, faster, and more efficient would it be if each of those analysts – if all the analysts and data users across your company – could search for what they need and instantly find it, instead of creating yet more data tables that get stored in a single computer – driving up storage costs, but never to be used again.


Further, what if your team pulled up that dataset and could instantly see which analyses it’s been used for in the past, along with helpful user notes that eliminate the need for asking around?


These are some of the questions Spotify was asking when they significantly ramped up their data efforts a few years ago. With an explosion of data creation and rapidly-expanding analyst teams, the company quickly discovered that data was not only difficult to find, it also lacked ownership, documentation, and context. This led to many of the problems you’re likely currently experiencing, including redundant table creation and lots of questions to colleagues. Spotify also found that the insights they did produce would often remain unseen by hundreds of other team members, who could have also used it for informed decision-making.


Spotify solved their problem by creating their own data management platform to enable efficient knowledge exchange and collaboration. Every piece of data retrieved now comes with supplemental information such as the name of the creator, a clear description of the dataset, usage stats, data lifecycle information, an overview of the most-used schema fields, and more.


Data organized this way exposes your team to more data than their own, allowing everyone to see what other departments are doing. This in turn reduces data discovery, empowers people to uncover correlations and produce insights without being directly assigned, and naturally un-siloes your most valuable and useful data. Best of all, you don’t need to build anything yourself; Datalogz has got you covered.


Conclusion


For all the fanfare about democratization of data, the reality for most organizations is that even trained analysts and scientists struggle to access and leverage the data needed to achieve revenue and growth goals.


It doesn’t have to be this way for you. While your competitors are stumbling around in the forest in the dark of night, flashlight batteries dying and hope receding, you can be the chopper pilot with the high-powered spotlight and 10,000-foot-view.


If you’re interested in learning how we can make your data an easily-discoverable, accurate single source of truth (and lock in early-bird pricing while you’re at it) sign up for our enterprise beta today. We look forward to helping you leave your competition in the dust!