Week 3: Data Analysis

Data Analysis - MOOC Summaries - University of Illinois (Urban Champaign) - Digital Analytics for Marketing Professionals: Marketing Analytics in Practice

Week 3: Data Analysis

“Data Collection Part II: Structured Data…Data Management in Practice, Part 1…Lesson 5: Data Management in Practice, Part 2…
(Source)

Summaries

  • Lesson 4: Data Collection Part II: Structured Data
  • Lesson 5: Data Management in Practice, Part 1
  • Lesson 5: Data Management in Practice, Part 2

Lesson 4: Data Collection Part II: Structured Data

  • Second is that structured and unstructured data differ in some very key ways and important ways to understand.
  • The third thing is that clarity organizations are really the hallmarks of structured data.
  • Fourth, bias is something that needs to be avoided when you’re collecting data.
  • The data management tool that you’re going to use to manage the information that you’re pulling down is going to be something that balances your needs for power and simplicity.
  • Linking back again to those business objectives, key questions and things that we’ve already answered, we are now looking at structured data.
  • So the Googled trends data is important for us, the segmentation study that we have talked about and then the company internet set are the sources that would be structured in this example that we’ll be interested in getting.
  • So now how does structured data differ from unstructured data? In some very simple but important ways, structured data is data that does include a data model.
  • It is typically very well defined and organized and has an expected format consistent with that data model.
  • Generally but not always it’s much easier to collect than unstructured data which makes it a valuable source, and it can frequently be imported directly into a data management system without any kind of changes or transformation at all.
  • So what does it look like? Here is a beautiful table of structured data.
  • Everything is very neat and you know the width of the data that you’re pulling in.
  • Let’s take a second and do a quick demo on how you access data like this and how you pull structured data down.
  • So we’re going to take a look at how to use a simple online tool to collect some really clean structured data for our analysis.
  • Now I can very easily export this data for analysis and I simply come here.
  • There’s the data for the search, the index to search volume for ice cream by week, a little further down as I scroll I can see the index search volume for different regions, both those in the world as well as cities.
  • This data is very clean and ready for me to conduct some analysis on by getting it into some data management program.
  • So we’ve seen how a tool like Google Trends can be used to collect structured data.
  • There are so many sources of data out there and available to you.
  • You as an analyst just need to know and figure out which sources of data work for me and how do I access them.
  • You want to reduce bias in your data so that you’re not poisoning that data which would then indeed poison the analysis.
  • Questionnaire bias, that’s a case where the questions you’re asking consumers, if they’re using a survey to collect your data, kind of lead them towards a certain response.
  • Interpretation bias is where you are walking into an analysis with some sort of preconceived notion of where it needs to go and then by doing that you’re going to find data that substantiates your points rather than just letting the data tell you what it wants to say.
  • Let the data tell you the story and you’ll be fine.
  • Now as you have your data and you’re starting to move it into a management tool, there are a number of them out there for you.
  • Now there’s nothing wrong with using Excel to manage your data.
  • If you need something a little beefier because your dataset is bigger or the statistical capabilities you need are larger, here’s some things you might want to think about.
  • A relational database tool that lets you link tables together and do some pretty hefty statistical analysis and it carries some pretty strong amount of data with it as well.
  • If you still need something even bigger than that, Stata is a great product, although there is a command line that you’ll need to use that feels a little bit like you’re programming, which might turn some people off.
  • What you need to do here is just choose which one do I need? Don’t go for the overkill.
  • What were the five things that we talked about? We saw that linking through your plan is critical for your data collection to make that efficient.
  • We saw that structured and unstructured data differ in some very key and important ways.
  • We found that clarity and organization are the hallmarks of structured data, what sets it apart from unstructured data.
  • Finally, data management tools are going to balance power and simplicity, ease of use, and it’s up to you to decide what you need.

Lesson 5: Data Management in Practice, Part 1

  • First one we will see how tidy data is a key to ensure error free analysis.
  • We will see that consistency and accuracy are hallmarks of tidy data.
  • We will see that there are five characteristics of messy data, important for us to know those, so that we know what the tidy data looks like.
  • We will see that there are four likely times that error occurs, and so those are the four times that we need to be particularly diligent when collecting data.
  • Then finally we will learn a simple philosophy that will help us get through our data quality issues.
  • We are at the analyze phase, and we are at a point now where we’ve got data, we’ve pulled it down, and now we have to make sure that it is ready for analysis.
  • It is nice and tight and confined to that single row of data.
  • So you’ve got some common keys, some column of data that will act as a join between those two different columns.
  • Now, what does that look like if it is not tidy? If it is messy? Well this is a case of messy data here.
  • What has happened here is that minor emotion of power/control/responsibility, when we are importing it, the program decided that the slash indicated a new column of data, and pushed everything out.
  • If we start to tie to those different data, the results are clearly not going to be what we are expecting.
  • Here are the other ways that the data becomes messy.
  • Column headers are values, not variable names, and anyone who has ever sorted a table and had a row of data pop up to the top knows the feeling of having that occur.
  • The title “Messy Data.” Here are some guidelines, just some simple ways that you can turn messy data into tidy data.
  • The first is just to keep a very close eye on your data when you are importing it, and finding errors, and fixing them right away.
  • Second, every time you move from the raw data that you’ve collected to some step in the cleaning process, create a new file either by saying the first one is the original, next one was column clean when I was cleaning the columns, next one with row clean, when I’m cleaning the rows, or version 1, 2, 3, whatever you want to do.
  • Understanding where errors commonly come in to play or introduced will help you to be particularly diligent at that time and work to limit mistakes that would result in messy data.
  • When are those times? Right? When do data errors typically come in.
  • So when errors occur when data is being directly input into a data set, that is where a simple slip of the finger can change the entry from the correct entry to a bad entry, creating messy data for you as well.
  • Data integration errors-so when you join different tables together, you can, at times, wind up with a conjoined table that doesn’t have the characteristics that you were expecting, resulting in messy data.
  • So making sure that those joins are done accurately is a way to ensure clean data.
  • So when you’ve taken data and you’ve introduced metrics or calculations based on that data, if there are any inaccuracies in the formula, it can change the data from good, tidy data to messy data very quickly.

Lesson 5: Data Management in Practice, Part 2

  • We need to know that we have enough marketing experience and know-how to take a directional data set, assuming that there are maybe some errors in it, and make some decisions.
  • One is to assume that those people who are managing the data have a level of comfort with the data.
  • If data has been handed to you, the best thing you can do is trust that data and begin- as you begin to work with it.
  • Second is start making decisions that you’re comfortable with off of that data.
  • So look for simple decisions based on the data that you do feel very comfortable with.
  • Over time, drill into deeper micro specific areas to learn more as over time, you’re just getting more and more comfortable and trusting the data more and more.
  • What were the five things that we talked about? We saw that tidy data is key to ensuring error free analysis.
  • Then we saw that there are four times that data errors are likely introduced and it’s important for us to be diligent at that time.
  • Some supplemental reading to help us and to learn a little bit more about the topics that we discussed in this lesson, a great blog post from Avinash on data quality and getting comfortable with the decisions that you’re making there and how you can be proactive in addressing data error issues.
  • Then a great, very in-depth analysis of how you take steps to tidy data that can lead you from messy data through to clean data, tidy data in a pretty efficient way.

Return to Summaries.

(image source)

 

Print Friendly, PDF & Email