Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

parsing with python

  • 24-03-2011 5:18pm
    #1
    Registered Users, Registered Users 2 Posts: 3,745 ✭✭✭


    Hi guys,

    I was wondering if anybody has any advice about parsing log files with python.

    I want to parse a bunch of logs and accumulate counts for various strings / conditions in dicts.

    My first task is to get lines that match the current date and are in the previous hour.

    The entire day is logged to one file and will be millions of lines long so running over the files is taking a very long time.

    Is there a faster way than I have shown below to check if the line contains the time?
    #!/usr/bin/python -tt
    
    import os
    import glob
    import sys
    
    
    path = '/the/files/to/parse/'
    for infile in glob.glob( os.path.join(path, 'server*/information.2011-03-24.log') ):
      print  infile
      input_file = open(infile, 'r')
      output_file = open ('/tmp/my_test.tmp', 'a')
      for line in input_file:
        if "24/Mar/2011:09"  in line:
          output_file.write(line) 
      input_file.close()
      output_file.close()
    


Comments

  • Closed Accounts Posts: 4,564 ✭✭✭Naikon


    Regex matching with sed or perl are probably the best for this type of task.


  • Registered Users, Registered Users 2 Posts: 1,419 ✭✭✭Cool Mo D


    laugh wrote: »
    Hi guys,

    I was wondering if anybody has any advice about parsing log files with python.

    I want to parse a bunch of logs and accumulate counts for various strings / conditions in dicts.

    My first task is to get lines that match the current date and are in the previous hour.

    The entire day is logged to one file and will be millions of lines long so running over the files is taking a very long time.

    Is there a faster way than I have shown below to check if the line contains the time?
    #!/usr/bin/python -tt
    
    import os
    import glob
    import sys
    
    
    path = '/the/files/to/parse/'
    for infile in glob.glob( os.path.join(path, 'server*/information.2011-03-24.log') ):
      print  infile
      input_file = open(infile, 'r')
      output_file = open ('/tmp/my_test.tmp', 'a')
      for line in input_file:
        if "24/Mar/2011:09"  in line:
          output_file.write(line) 
      input_file.close()
      output_file.close()
    


    One obvious rewrite for clarity would be to use with statements. If you are using python 2.6+ the with statement is included. On python 2.5 you need to put
    from __future__ import with_statement
    
    at the top of your script.

    The with statement will take care of closing the files properly, even if the script hits an error.
    #!/usr/bin/python -tt
    
    import os
    import glob
    import sys
    
    
    path = '/the/files/to/parse/'
    for infile in glob.glob( os.path.join(path, 'server*/information.2011-03-24.log') ):
      print  infile
      with open(infile, 'r') as input_file, open('/tmp/my_test.tmp', 'a') as output_file:
        for line in input_file:
          if "24/Mar/2011:09"  in line:
            output_file.write(line) 
    

    If you want to speed it up, look at the threading module, and split your script into a thread for reading the log file, and one for writing the parsed input.


  • Registered Users, Registered Users 2 Posts: 1,889 ✭✭✭evercloserunion


    Naikon wrote: »
    Regex matching with sed or perl are probably the best for this type of task.
    Python also has its own re module for regex.


Advertisement