Can You Tell Debug Data and BI Data Apart?

A few blogs posts ago I wrote about new BI for digital companies and in that blog I alluded that quite a bit of that BI is based on log data. I wanted to follow up on the topic of logs, why they exist and why they contain so much data that is relevant to BI. As I said in that post, logs are an artifact of software development and they are not premeditated, they are generated by developers almost exclusively for the purpose of debugging pre-production code. So how is it that logs are so valuable for BI? Let’s spend a little time examining how developers choose what to put in them and why and then lets take a look at a few logs of our own.

The nature of what developers are writing has a lot to do with what’s in the logs. Developers building customer-facing, revenue-generating applications have to debug them before they are ready for prime-time. Many of these applications are built using modern architectural paradigms such as microservices which are typically owned by a small team, relatively atomic in terms of their purpose, are organized around business capabilities.

As developers write their microservices to achieve that business purpose, they have test and debug them to ensure right outcomes. If a developer is say, writing a credit validation service for a retail application, successful debugging will require that their logs to contain bits of data required to validate that only the credit worthy users are allowed to continue. Hence, the log line is likely to contain user identity/address/etc., transaction amounts, time of transaction, transaction id, credit score, and likely a lot more related to technical execution of the code.

With this data, a developer will be able to tell how their code is working for their test subjects in pre-production. This same data can also be leveraged in production to provide quite a bit of information about business outcomes of the application. Looked at properly, an analyst can understand what mix of customers they receive, where customers are coming from, how much revenue app generates get per unit of time, at which times of the day, on which days, average size of transaction, etc etc.

But let me get practical and dissect a log line from our own logs, somewhat redacted for privacy and brevity. Here is one below that I particularly like and get insights from for our product team.

2019-12-29 00:57:20,913 -0800 INFO [hostId=▧▧▧▧▧] [module=STREAM] [localUserName=▧▧▧▧▧] [logger=▧▧▧▧▧.internals.▧▧▧▧▧$] [auth=User:░░░░░░░░░░:▒▒▒▒▒▒▒▒▒▒:███████████:false:5:UNKNOWN] [sessionId=▧▧▧▧▧] [callerModule=autoview] [remotemodule=stream] explainJsonPlan.ETT {"version" : 2.0, "customerId" : "▧▧▧▧▧", "sessionId" : "▧▧▧▧▧", "buildEngineDt" : 108, "parseQueryDt" : 18, "executionDt" : 193, "ett" : 122, "isInteractiveQuery" : false, "exitCode" : 0, "statusMessage" : "Finished successfully", "isAggregateQuery" : true, "isStreamScaledAggregate" : true, "isStreamScaled" : true, "callerSessionId" : "▧▧▧▧▧", "savedSearchIdOpt" : "None", "autoviewId" : "▧▧▧▧▧", "numViews" : 14, "paramCt" : 0, "rangeDt" : 59999, "processDt" : 59999, "kattaDt" : 1417, "mixDt" : 5, "indexRetrievalCriticalDt" : 8, "kattaCountDt" : 0, "kattaSearchDt" : 0, "kattaDetailsDt" : 0, "streamSortDt" : 0, "kattaNonAggregateQueryDt" : 0, "kattaNonAggregateSortDt" : 0, "kattaQueryDt" : 1417, "kattaTimingInfo" : {"totalTime" : 1502, "fetchTime" : 0, "countComputationTime" : 0, "hitsComputationTime" : 0, "docsRetrievalTime" : 146, ……., "searcherCreationTime" : 1, "prepQueueTime" : 0, "ioQueueTime" : 3, ……..}}}, "inputMessageCt" : 36676, "messageCt" : 15, "rawCt" : 55, "queryCt" : 15, "hitCt" : 36676, "cacheCt" : -1, "throttledSleepTime" : 0, "activeSignatureCt" : -1, "totalSignaturesCt" : -1, "initializationDt" : 0, "lookUpInitializationDt" : 1, …………..."indexBatchCt" : 1, "kattaClientException" : [], "streamProcessingDt" : 0, "operatorTime" : 0, "operatorRowCount" : [{"name" : "expression", "rowCount" : 25, "time" : 0}, {"name" : "saveView", "rowCount" : 25, "time" : 1}], "pauseDt" : 0, "gcTime" : 0, ………."numberRegexTimeout" : 0, "bloomfilterTimeTaken" : 3, "viewExtractionTimeTaken" : 11, "kattaSearcherTranslateDt" : 0, "executionStartTime" : 1577609838975, "executionEndTime" : 1577609840908, ………."scanAndRetrieveData" : {"scannedMessages" : 33004, "scannedBytes" : 21222615, "retrievedMessages" : 15, "retrievedBytes" : 9645}, "isCompareQuery" : false, "numOfShiftedQueries" : 0, "maxShiftInMilliseconds" : 0, "isBatchlessExecution" : false, "viewCountByType" : {"partitionCt" : 13, …….., "unknownCt" : 0}, "childQueriesSessionIds" : [], "analyticsTiersQueried" : ["Enhanced"]}

This line is generated by the developers of our search engine. It tells them who ran which search, what type of search, how long did the internals of the engine take to get things done, etc etc, all very useful when developers are working to write a reliable and fast search engine. But once this log line made it to production, many other teams latched onto it. Product managers measure adoption by type of search being run in order to determine where to focus new development efforts. Customer success team keeps track of customer health scores by keeping track of search performance and unique users running searches. Sales team monitors adoption during early days or customer lifecycle and proofs of concept. I will expand on further internal and customer examples in future blogs on this topic.

Unique benefit of this type of BI built on top of debug data is that in the modern world of agile development, this data changes as quickly as new code gets pushed into production but since this BI does not follow the rigid ETL data warehousing model, new intelligence can be gleaned from new bits of data developers add as they extend and debug new capabilities of the application. On the other hand, a unique challenge is that systems used to extract this type of business intelligence from debug data must be able to cope with that rate of change by enabling analysis of highly unstructured and often unknown bits of data. Done right, business analysis of debug data for enables business intelligence to evolve into continuous intelligence to facilitate leveraging new business signals to guide real-time decisions at the rate of change in digital business.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Start free trial

Can You Tell Debug Data and BI Data Apart?

Complete visibility for DevSecOps

Categories

Spotlight

Sumo Logic cloud-native SaaS analytics

Bruno Kurtic

People who read this also enjoyed

Everything you need to know about HAProxy log format

Getting MTTR to zero: the failed promise of observability

Work faster with Sumo Logic: Mo Copilot, Otel Remote Management and more

Complete visibility for DevSecOps

Categories

Spotlight

Share

Sumo Logic cloud-native SaaS analytics

Bruno Kurtic

Around the web

You're in good company