For those who are short on time please find a minimal snippet for correctly logging to CSV with Python3.8:
*READ BELOW TO SEE HOW WE ADD A HEADER AND DO LOG ROTATION
import logging
import csv
import io
class CSVFormatter(logging.Formatter):
def __init__(self):
super().__init__()
def format(self, record):
stringIO = io.StringIO()
writer = csv.writer(stringIO, quoting=csv.QUOTE_ALL)
writer.writerow(record.msg)
record.msg = stringIO.getvalue().strip()
return super().format(record)
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# loggingStreamHandler = logging.StreamHandler()
loggingStreamHandler = logging.FileHandler("test.csv",mode='a') #to save to file
loggingStreamHandler.setFormatter(CSVFormatter())
logger.addHandler(loggingStreamHandler)
logger.info(["ABDCD", "EFGHI"])
#READ BELOW TO SEE HOW WE ADD A HEADER AND DO LOG ROTATION
Let’s explain some of the above code further, the implications and then let’s also see how to incorporate writing a header. Parts about the logger, log levels and the formatter are explained in the previous post, feel free to catchup quickly if you would like: https://everythingtech.dev/2021/03/python-logging-with-json-formatter/
CSV Formatter
Just like we did for the JSON Formatter, we are extending the logging Formatter class to modify the override the format method. We are using the CSV and the IO Library here also means that this has a few performance implications. If performance is critical for you and/or logging forms a significant part of your system then I would recommend that you benchmark your system before and after adding the CSV logging part.
Anyway, moving on, we are calling csv.writer which needs a destination to write the CSV lines and quoting settings. In this case we have provided a buffered stream of bytes of character and numbers, a StringIO object from the IO library. We do this because we are already saving the log to disk with our FileHandler.
Adding CSV Header & Rotating Logs
Adding a CSV Header is trickier than it looks, specially because we are looking to add log rotation. This means that we have write a header before a log is created and just after it is rolled over. Turns out that TimedRotatingFileHandler deletes the stream it is writing to just before rolling over and the stream is not accessible before the initialisation. There are 2 options to bypass this that I thought about (may be there is a better one? If you know it please comment below):
Option 1: Rewrite the constructor of TimedRotatingFileHandler. But this also means that any breaking change in the library could prevent our application to stop working.
Option 2: Check the file size before writing to the log and if it is empty write the header. This is expensive though because we basically have an additional operation before every write (emit function). In this post, until I find a better way we used this method.
import logging
from logging import handlers
import csv
import io
import time
import os
from datetime import datetime
class CSVFormatter(logging.Formatter):
def __init__(self):
super().__init__()
def format(self, record):
stringIO = io.StringIO()
writer = csv.writer(stringIO, quoting=csv.QUOTE_ALL)
writer.writerow(record.msg)
record.msg = stringIO.getvalue().strip()
return super().format(record)
class CouldNotBeReady(Exception):
pass
class CSVTimedRotatingFileHandler(handlers.TimedRotatingFileHandler):
def __init__(self, filename, when='D', interval=1, backupCount=0,
encoding=None, delay=False, utc=False, atTime=None,
errors=None, retryLimit=5, retryInterval=0.5,header="NO HEADER SPECIFIED"):
self.RETRY_LIMIT = retryLimit
self._header = header
self._retryLimit = retryLimit
self._retryInterval = retryInterval
self._hasHeader = False
super().__init__(filename, when, interval, backupCount, encoding, delay, utc, atTime)
if os.path.getsize(self.baseFilename) == 0:
writer = csv.writer(self.stream, quoting=csv.QUOTE_ALL)
writer.writerow(self._header)
self._hasHeader = True
def doRollover(self):
self._hasHeader = False
self._retryLimit = self.RETRY_LIMIT
super().doRollover()
writer = csv.writer(self.stream, quoting=csv.QUOTE_ALL)
writer.writerow(self._header)
self._hasHeader = True
def emit(self, record):
while self._hasHeader == False:
if self._retryLimit == 0:
raise CouldNotBeReady
time.sleep(self._retryInterval)
self._retryLimit -= 1
pass
super().emit(record)
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
loggingStreamHandler = CSVTimedRotatingFileHandler(filename="log.csv", header=["time","number"]) #to save to file
loggingStreamHandler.setFormatter(CSVFormatter())
logger.addHandler(loggingStreamHandler)
import random
while True:
today = str(datetime.now().strftime("%m/%d/%Y, %H:%M:%S"))
logger.info([today,random.randint(10,100)])
time.sleep(1)
So the first part of the code is quite straightforward, we are extending the TimedRotatingFileHandler class. You will notice other useful parameters like When and Interval. These 2 are important will determine when your logs will rotate. The following are possible parameters for When taken from original code.
# Calculate the real rollover interval, which is just the number of
# seconds between rollovers. Also set the filename suffix used when
# a rollover occurs. Current 'when' events supported:
# S - Seconds
# M - Minutes
# H - Hours
# D - Days
# midnight - roll over at midnight
# W{0-6} - roll over on a certain day; 0 - Monday
# Case of the 'when' specifier is not important; lower or upper case
# will work.
We have added additional constructor parameters like the Header, retryCount and retryInterval. The Header is to make sure you can specify header a CSV Header. The retryCount and retryInterval have been added to make sure that the Header is written first in the log files. Logs are written asynchronously and if you take a look at the emit code we have to make sure that the Header is written first before any logs are written.
Of course, we do not want to be stuck indefinitely in a loop. At every retryInterval a check is performed to make sure we have written the header and this is retried a default number of 5 times before it raises a dummy exception that we created. You can then handle it as you would like but it should work fine.
This has to be repeated every time we are rolling over as well that is why we assigned a constant at first to restore the retry to its current value. The way we are writing the header is to use the stream created by the logger directly and calling the csv library to write to that stream. The other csv options also has to be provided for your header. You can add additional parameter to the constructor to do that.
Disclaimer
This has not been tested in production, has to be used for learning purposes only and I recommend that you either use a third party library or adapt the code to your situation. For example, create a better exception that you can handle when the logs are in the wrong order(if you modify the code). Anyway I hope this has saved you time and feel free to comment if you want to achieve anything else and you are stuck. Thanks for reading! 🙂
If you would like to learn more Python check out this certificate course by Google: IT Automation with Python