Citizen Data Science: Can It Really Work?

If you’re interested in a deeper exploration of your company data, but can’t expand your current data team, you’re probably curious about the concepts of “citizen data scientists,” or “citizen data analysts.” These are people in your company who create data models and use sophisticated analytics, but whose primary roles in the organization are outside the fields of statistics and business intelligence.


Basically, they’re non-technical team members who nonetheless use data science tools to discover business insights, solve problems, and suggest new directions.


Why are companies so interested in citizen data science? Well, as you’ve likely already discovered, trained data professionals are in high demand, but short supply. In fact, according to a job posting analysis by QuantHub, in 2020 there were 250,000 data science positions that went unfilled due to lack of qualified candidates.


On top of that, data talent is expensive. Indeed.com reports that the average data scientist’s salary is now over $120,000, and the average data analyst’s salary is close to $70,000.


If you’ve been struggling to hire a scientist or analyst, you’re probably frustrated that so much of the data your company is generating is just sitting there unused. Wouldn’t it be great if your non-data employees could take on some of the responsibilities normally fulfilled by these highly trained, expensive experts?


Of course, this leaves us with one big question: Is it really possible for Dave in Accounting or Kendra in HR to perform the complex data analysis required for accurate, profit-boosting insights?


That is, if these roles are so specialized and expensive, is it realistic to expect non-experts to do them, and do them well?


You’ll be glad to know that the answer is a resounding “Yes” – with a few important caveats. Obviously, you’ll need a self-service data tool like Power BI or Tableau, and you’ll need your citizen data scientists and analysts well-trained on it.


They’ll also need training on fundamental data science contexts and principles, as well as an understanding and awareness of company goals and challenges. You can’t simply add “data analysis” to someone’s responsibilities, sit them in front of Power BI, and expect them to deliver insight after insight. There is a good amount of upskilling involved, so you want to be sure to choose people who understand this and are eager to learn.


Once you’ve identified a few people who are interested in data science, willing to skill up via a reputable boot camp or online course, and who can master your chosen BI tool, you’re well on your way to capable citizen-led analysis. But there are a few additional, crucial elements required for your citizen data team to truly shine.


Citizen Data Science Requires Easily-Accessible, High-Quality Data


Just like a chef needs quality ingredients to make a 5-star dish, your team needs clean, relevant, reliable data to make game-changing insights and analyses. Unfortunately, this kind of data isn’t often readily accessible, even at Fortune 500 companies with dozens of scientists and analysts on staff.


Much more common is data that’s full of missing values, typos, extra rows, and redundancies. All too often, you don’t know what the acronyms mean, whether the data is up-to-date, what the context is, and what other datasets are related.

This is another reason why your citizen analysts need training – they have to understand how to clean the data to make it useful for their analysis, and the analysis of anyone who comes after them.


For example, let’s say that your citizen data analyst has found, cleaned, and prepped several datasets that will be useful for analyses now and into the future (say, for a year-over-year profit analysis). What should she do with those datasets?


In most companies, they stay stored on her computer, never to be seen by others. In fact, it’s quite likely that someone else will pull the exact same datasets from the company lake and clean them all over again, not knowing that the work has already been done.


A much better option is to connect these clean datasets to a data management platform like Datalogz, where they can be searched and indexed by any other citizen users. In addition, users can provide clear contexts and explanations for the data, link it to related datasets, show its history of use, and identify the creator.


This makes it simple for the user to search for the exact data he needs, locate it within the company, instantly understand the context, and immediately get to work. If he has questions on the dataset, he can easily contact its creator and previous users.


As time goes on, and more and more datasets are cleaned, prepped, and connected to Datalogz for use company-wide, your citizen data science and analysis team will naturally grow itself – anyone with an understanding of analysis and your BI tool can now locate quality data, interpret it, discover insights, and create reports.


Citizen Data Analysts and Scientists Can’t Operate in Silos


Ok, so you’ve got your core group of well-trained citizen data scientists and analysts. They’re cleaning datasets and sharing important information and context about the data in Datalogz. Your quality data resources are ever-expanding, and more and more of your colleagues are able to access this data and perform some straightforward analysis. You’re off to a great start!


Unfortunately, all this progress comes to a crashing halt if your company is siloed, or if your data isn’t documented and shared.


Let’s say that Bob, a citizen data analyst from the Finance team, is curious about the revenue generated from a recent marketing initiative. The way this would unfold in most organizations is like this: Bob might ask someone with appropriate permissions to poke around the data lake or warehouse in search of the data he needs, which could take anywhere from a day to a week, depending on how often Bob needs to follow up with his colleague.


If the data is available, it probably needs to be cleaned. Bob doesn’t know how to do this, so he’d have to ask someone for help, which again adds to his timeline.


If the data isn’t found in the company lake or warehouse, Bob now needs to connect with someone in the Marketing department to inquire about the data, find out when it will be ready, and get it sent over to him. As we know, chasing this down can take some time.


Once Bob has the data, what does he do if he has a question about it? The cycle of contacting colleagues begins again.


Finally, 2-4 weeks after his initial inquiry, Bob is ready to begin his analysis. He creates a useful report in your company’s BI tool, but guess what?


Bob didn’t know that there were five other datasets he could have used in conjunction, for a much richer and more nuanced analysis. Because these datasets were stored in individual workers’ computers all across the company, he didn’t even know to ask about them in the first place.


This scenario is very common in companies with siloed departments, where tribal knowledge lives inside people’s heads and people’s computers, instead of being documented and shared across the organization.


And again, this is where a data management system comes in. With Datalogz, relevant data access is as simple as a basic search. And even better, users provide metadata for each dataset – including notes about context, past projects, related data, who to contact for questions, helpful tips, and previous questions asked.


It’s easy to see how this would have made Bob’s life much easier, and his conclusions much more insightful. Even more important, it’s easy to see how a data management platform is crucial to ensuring the success of your expanding citizen data team.


Citizen Data Teams Need Strong Data Governance and Management


Alright, now you’re rocking and rolling with an empowered citizen data team producing the exact insights and analyses you were hoping for. However, there’s one more piece of the puzzle required for well-organized and compliant data uses: data governance.


In simple terms, data governance sets the standards and rules that all data must follow in its lifecycle. It’s the set of policies that determine the data’s availability, usability, integrity, and security.


Governance answers questions like:

  • Who can access the data?

  • How is privacy being protected?

  • Are we adhering to evolving regulations?

  • Which data sources are users approved to draw from?


So far, it might seem like governance has nothing to do with your citizen data team; after all, they aren’t the ones who will be creating and maintaining your data governance policies.


However, directly tied to data governance is data management – and that has everything to do with your citizen data team.


Data management is the implementation of data governance. It’s the process through which you manage all aspects of your data. Governance sets the rules; management enacts them.


Informatica shares this helpful analogy:


“Data governance is the blueprint for a building, and data management is the physical construction of the building. Without data management, there is no physical building. And while you could construct a building without a blueprint, it would be less efficient and less effective, with a greater likelihood of problems down the line.”


Data management involves the following elements:

  • Preparing, cleaning, and transforming data to ready it for analysis

  • Managing metadata to make data easier to find and track

  • Archiving and deleting data to adhere to retention and privacy requirements

  • Identifying relationships among datasets and sources

  • Setting permissions for straightforward data retrieval

  • Updating data across the organization

  • Ensuring data is accurate and usable


It’s easy to see how data initiatives suffer without solid data management. Users struggle to access the data they need, they aren’t sure if the data is recent or reliable, they don’t understand how datasets relate to one another, and they may even use a dataset that should have been deleted long ago.


However, as we have seen, a robust data management platform solves all these problems and more. Imagine if your citizen data team could simply enter a few search terms, instantly retrieve quality data, understand everything about its context, see the other projects it’s been used in, and get straight to work delivering the insights that skyrocket company profit.


Conclusion: Set Your Citizen Data Teams Up for Success

Just five years ago, it would have been impossible for anyone but a highly qualified expert to process and analyze data. But with the advent of self-service tools, it really is possible for your non-specialist team members to provide the kinds of insights you’ve been after.


As we have seen, the key to success is high-quality data that’s easily found, well-organized, understandable, and shared across departments – all of which is achieved with a data management platform.


It used to be that only the biggest players, like Spotify, Airbnb, and Twitter, could take advantage of all that data management platforms provide, because they built their own internal, proprietary data management tools.


But with Datalogz, you can mimic the data strategies of these big tech companies, all while keeping costs low and leveraging the knowledge and enthusiasm of your existing employees.


Want to learn more? Book a Datalogz demo.


37 views0 comments

Recent Posts

See All

Database Documentation

Data documentation isn’t sexy. But it matters—big time. Data documentation is paramount for any data team. Without accurate and up-to-date documentation, how will your team understand data to make acc

2022 Biggest Data Challenges

Since 2010, data created, captured, copied, and consumed globally increased from 1.2 trillion gigabytes to 59 trillion gigabytes, an almost 5,000% growth. The rapidly growing volume and complexity of