SQLogs

OpenMarket – June 3, 2021

by Parker DeWilde

 

I love logs!

  • Easy to add
  • Logger configurable to give you useful information for free: time, server, class, request id, etc.
  • Simple, easy to understand
  • For the dev team’s eyes, changes are low impact
  • Fantastic tool for debugging, troubleshooting, and general visibility

 

Even better than logs, there are tools around logs that are awesome! My experience is with ELK (Elasticsearch, Logstash, Kibana), but there are other similar amazing tools such as Splunk, Graylog, and many others. Just NOT Cloudwatch.

ELK is pretty much magic in my book. It reads huge volumes of logs, and then allows almost instant searching of them, returning vast amounts of results in a split second. It can break up your logs to multiple fields, and filter logs based on them. Its full text search works pretty darn well (as long as you are searching for entire tokens), and the UI in Kibana is very intuitive. I don’t even use json logs, which can make it even more powerful!

Beyond the magic logging tools, there are still many questions that they are not great at answering:

  • Find all duplicate log lines
    • If there are duplicate log lines, find out which request IDs are duplicate
    • Find out how many distinct log lines were duplicated, and how many times each was duplicated, and return the 10 most duplicated along with count
  • Find all request IDs that went through step A but not step B
  • Find all requests that took more than 50ms*
  • Find the first and last log lines related to every user

*This may be possible with json logging, I only use plain text logging and I don’t think it is possible

Pretty much anything that needs to look at multiple log lines to find an answer, or things like range queries when you only have text indexing.

If only there was a tool that was good at taking structured data, and doing things like joining multiple records, range queries, and flexibly grouping data…

 

SQL! I love it almost as much as logs! 47 years later it’s still an amazingly accessible and versatile way to query structured data. There may be more sophisticated data models and query languages which all have their own pros and cons, but none as widely used and universally understood as SQL.

So if I love SQL, and I love logs, why can’t I SQL my logs? Silly question, of course you can SQL your logs! You could probably bug ops to find some specific log management solution that supports SQL queries, or could export json logs to some DB that supports SQL syntax, but if you are like me, the need comes up so rarely that it might be faster and easier to just hack something up on your dev box.

The first thing you will need to do is somehow convert your logs into a relational form. If you are using JSON, it’s probably pretty easy, just import the json, pick the fields you want, and away you go. If you are like me and use plaintext logs, you need to do a little more manual work, but conversion to relational form is generally pretty straightforward.

My favorite way to do this is with Python. When I’m feeling really fancy, I have Python grab the logs directly from Elasticsearch using the python client. Other times, it may be easier to simply read log files directly.

For the purpose of an example, let’s assume there have been complaints from customers about issues with their login sessions being reset, requiring them to log in multiple times. You are tasked with figuring out what is going on, as well as get a good grasp on the scope of the issue.

The first place to start is to isolate users with the issue. If their sessions are being reset, we would be expecting to see the same user logging in multiple times within a small window. There unfortunately is not a single log line that can help us out here, but our friend SQL can come to the rescue!

Let’s say we have a log file that looks something like this:

021-05-28T09:11:11,501 INFO pool-1-thread-6 IdentityServiceClient 000-X6HAJ-U32KP-6PD6C-4HIUJ-LGN - User banner with account id 94404 successfully logged in
2021-05-28T09:13:03,469 INFO pool-0-thread-9 IdentityServiceClient 000-R992R-4KV1C-BEE3Y-FIGRD-GST - User spock with account id 16059 earned a gold star!
2021-05-28T09:14:15,727 INFO pool-2-thread-15 SandPounder 000-DZSXL-MT6K8-Y574Z-HLYOO-SPD - Pounded 5 pounds of sand.

Each log line has the same elements:

  1. Timestamp
  2. Log Level
  3. Thread
  4. Class/Function
  5. Request Id
  6. Payload/message

 

You could just split the log lines into these parts and call it a day, but that would make it hard to query based on things like user for logins. I usually like to pre-process my logs to only get the lines I care about. This could be done in Python, but I find it easier to just grep to pull the log lines I want.

grep "IdentityServiceClient" example.log | grep "successfully logged in" > example_filtered.log

Now that I only have the lines I care about, and I know they will all be the same format, I need to parse them.

from datetime import datetime, timezone

# open our file and iterate through the lines
line_dicts = []
with open("example_filtered.log", 'r') as log_file:
   for line in log_file:
       parts = line.split(' ') # space delimited for first fields

       time_string = parts[0]
       log_level = parts[1] # we already filtered so same for every line
       thread = parts[2] # not relevant to our investigation
       function = parts[3] # we already filtered so same for every line
       request_id = parts[4]

       # useful to have numeric form of time for doing math in sql
       epoch_millis = int(
           datetime.strptime(time_string, "%Y-%m-%dT%H:%M:%S,%f")
           .replace(tzinfo=timezone.utc)
           .timestamp() * 1000)

       # notice " - " before every payload. Split on first instance since
       #  string will not appear before the payload, but may appear inside
       payload = line.split(' - ', 1)[1]

       # system does not allow spaces in username/id,
       #  so let's just tokenize same way
       payload_parts = payload.split(' ')
       username = payload_parts[1]
       user_id = payload_parts[5]

       # lets keep the parts we care about in a dictionary
       line_dict = {
           "timestamp": time_string, "epochMillis": epoch_millis,
           "requestId": request_id, "username": username, "userId": user_id
       }

       line_dicts.append(line_dict)

Now we need to set up the SQL database. I usually just use sqlite, as its super easy to set up, and powerful enough for most things I do with logs. Plus you can do it all from inside Python!

# continuing on from above file
import sqlite3
import os

DBNAME = 'techblog.db'

# Remove old DB if exists
try:
   os.remove(DBNAME)
except OSError:
   pass

# connect to the DB
with sqlite3.connect(DBNAME) as conn:
   # Get a cursor, which is required to use the connection
   c = conn.cursor()

   # setup the schema
   c.execute('''
   CREATE TABLE logins (
   timestamp TEXT,
   epochMillis INTEGER,
   requestId TEXT,
   username TEXT,
   userId TEXT)
   ''')

   # add the data - takes a string with ?'s and an iterable to fill in the ?'s
   for cur_line in line_dicts:
       c.execute('''
       INSERT INTO logins VALUES(?, ?, ?, ?, ?)
       ''', (cur_line['timestamp'], cur_line['epochMillis'],
             cur_line['requestId'], cur_line['username'], cur_line['userId']))

   # run the optimize command to ensure efficient queries in the future
   # this is sqlite specific -- for some reason it doesn't keep statistics
   # for the optimizer unless you tell it to
   c.execute('PRAGMA analysis_limit=400')
   c.execute('PRAGMA optimize')

   # commit, connection will close automagically due to using "with" to open
   conn.commit()

Now we have our SQL database and are ready to do our investigation! I usually use a tool called DBeaver to look at my databases, though any sqlite-compatible client will work. You can also query the DB directly from Python (though I would only do that if automating something, as it is less ergonomic than a purpose-made database client).

Connecting to SQLite in DBeaver is easy. New Database Connection -> SQLite -> Next -> Path set to the file you made (in my case techblog.db) -> Finish

You should be able to see the data in your database. I will show one example of how I might use this data, but as long as you know SQL, the world is your oyster!

Recall we were having issues with users having their sessions expire and needing to log in again. We wanted to see if this was affecting multiple users, as well as the scope of the issue. I wrote a little SQL Script:

SELECT a.username, COUNT(*) as 'number of repeated logins within a minute'
FROM logins a, logins b -- simple full join
WHERE a.requestId  != b.requestId  -- don't want to match self
AND a.userId  == b.userId  -- same user logging in
AND a.epochMillis < b.epochMillis -- let’s ensure a is always before b
AND (b.epochMillis - a.epochMillis) < 60000 -- let’s only look at logins less than a minute apart
GROUP BY a.username -- get stats by user
ORDER BY COUNT(*) DESC -- highest counts first

Which gives us output something like this:

Given we only have 7 users, this seems like a pretty big issue, yikes! Better start looking for a solution! Maybe those request IDs in our database could help us out…

SELECT COUNT ( DISTINCT username ) AS "total users in system"
FROM logins

I hope you can keep this as another tool in your toolbox. It does take a little time to set up, but sometimes it can give you something that would not be easy with a tool like Kibana. Coworkers have also shown me some other cool tools such as q  — text as data which allows you to query text files that are well formatted as sql directly. This isn’t something I use often, maybe once or twice a year, but when I need it, I’m glad that I have this technique at my disposal.

My example logs and code are available as a gist for your convenience: https://gist.github.com/pdewilde/86c4d3d1cc718cbf44cdeb09f3e66b56

 

See all tech blog posts

Related Content