Jeff Jonas is Chief Scientist of IBM Entity Analytics and was on IT Conversations to discuss a whole bunch of kinda cool stuff relating to data, its relationships and quality. If you are providing data to another group / organization to consume, or are a consumer of data, then its got a whole bunch of useful tidbits in it. But even if you don’t there are some valuable bits. Here are the ones that caught my attention.

  • What if its not a smart question today, but its a smart question on Thursday?
  • What are the chances you can ask every smart question every day?
  • Data finds the data, and the relevance finds the user
  • There is a lot of systems that don’t have to be real-time
  • Data…
    • Either doesn’t fit with anything
    • Can be grouped with similar pieces
    • Fits with another pieces
  • New observations need to change your assumptions
    • New information proves
    • ew information disproves
  • Can you extract key features from your incoming observations?
  • How much do you trust your sources of information? And do those levels change
  • Truth is in the eye of the beholder
  • Bad data is good
    • Misspelled name, address
    • Natural variability lets you understand the central pattern
    • Need to spend less time of quality of data when you are stitching sources together. you need to care about the level of trust of the source when the consumer gets to it
  • The truth changes
  • If you make decisions based on training data, then your decisions will be wrong as soon as reality happens until you retrain with new data
  • You need speed in information so that companies can quickly learn about their past
  • There is a limit to how much information you can extract from a single transaction
  • Can you evolve your schema without ‘landing the plane’
  • Three principles
    • If you treat data as a question, you will never know if it matters until someone asks
    • If you treat queries as data, you don’t have to ask every question every day
    • Its is computationally more efficient to make sense of information on streams as it is happening than trying to boil the ocean and do big batches later
  • Organizations will not get smarter until they look at information in context
  • Can you collaborate with yourself?
  • If you don’t have source attribution, it is impossible to deploy systems consistent with the un declaration of human rights
  • Starting point: what are the laws of identity?