Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Java OutOfMemoryError

  • 15-06-2017 01:17PM
    #1
    Closed Accounts Posts: 1,744 ✭✭✭


    Wondering if anyone can help me out here, not sure if my code is causing this or just not having a particularly hi-spec machine.

    I'm trying to iterate over a month's worth of files in a directory, there's likely 4300+ files for any given month and each file could have between 1-5000 records. The idea is to split each record of each file into a string array and add to a list of string arrays for further processing elsewhere.

    This is the error I get...

    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.lang.String.split(Unknown Source)
    at java.lang.String.split(Unknown Source)
    at processedCdr.ProcessedCdrFiles.processedFileCdr(ProcessedCdrFiles.java:42)
    at processedCdr.ProcessedCdrFiles.processedFiles(ProcessedCdrFiles.java:27)
    at processedCdr.ProcessedTest.main(ProcessedTest.java:15)

    My PC has 4GB DDR3L 1600.

    My code is as follows...

    [PHP]
    // Get files from directory and pass them to FileCdr method
    public List<String[]> processedFiles(String date) throws FileNotFoundException {

    File dir = new File(processedDir);
    File[] directoryListing = dir.listFiles();

    if (directoryListing != null) {
    for (File child : directoryListing) {
    if(child.getName().contains(date)){
    processedFileCdr(child);
    fileCount++;
    }
    }
    }
    return cdrArray;
    }

    // iterate over file and add records to cdrArray list
    public void processedFileCdr(File CDR) throws FileNotFoundException{

    Scanner file = new Scanner(new FileReader(CDR));

    while (file.hasNextLine()) {
    String[] line = file.nextLine().split(",");
    if(line.length > 2){
    cdrArray.add(line);
    }
    }
    file.close();
    }

    [/PHP]


Comments

  • Registered Users, Registered Users 2, Paid Member Posts: 2,032 ✭✭✭lynchie


    Increase the heap size using -Xmx. Default is probably not big enough for the data you are keeping in memory.


  • Registered Users, Registered Users 2 Posts: 3,781 ✭✭✭heebusjeebus


    Worth logging how many files you're iterated through before you hit the error. Will help you determine the max heap size you'll likely need.


  • Closed Accounts Posts: 1,744 ✭✭✭Pelvis


    Gave that a try, upped it to 3GB and no change at all, stopping after 1390 files. Hmmm...


  • Registered Users, Registered Users 2 Posts: 895 ✭✭✭Dubba


    Would wrapping the FileReader in a BufferedReader help?

    https://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html


  • Registered Users, Registered Users 2 Posts: 1,110 ✭✭✭Skrynesaver


    If we estimate line size at 256 char, your String array will be around 5G.

    While that might be a maximum it's a worthwhile sizing exercise when deciding Xmx

    (4300 * 5000 * 256 ) / (1024 * 1024 * 1024) = 5.126 Gb


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 3,781 ✭✭✭heebusjeebus


    If it's a 32bit jvm then be careful that you wont be able to increase the heap to much over 1400 MB.


  • Closed Accounts Posts: 1,744 ✭✭✭Pelvis


    It's a 64 bit jvm. No luck so far, using BbuffeReader actually result is slightly less performance, upping the heap size makes no difference.

    At this stage I'm at a loss. I guess running it in monthly batches is a no go.

    Alternatively I can limit the fields added to the array, the files have 43 fields, would have preferred to add all though.


  • Registered Users, Registered Users 2 Posts: 5 laowai


    I don't know what your data looks like but unless it is in csv or something, perhaps you could store/represent your data more efficiently than simply using a string?


Advertisement