Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

best way to view a csv data file with 4 million rows ?

  • 08-03-2012 09:16AM
    #1
    Registered Users, Registered Users 2 Posts: 27


    Hello nice people!
    Excel 2010 is restricted to about 1 million rows, so i cannot open the file it completely, any suggstions on how to view file with 4 million rows ?
    i ultimately need to do a text to columns on one of the columns to fx it.
    would something like 'ultra edit' be able to do this ?
    thanks


Comments

  • Registered Users, Registered Users 2 Posts: 2,222 ✭✭✭Explosive_Cornflake


    sed and awk would fly through it.


  • Closed Accounts Posts: 5,835 ✭✭✭Torqay


    LTFV

    Viewer only.


  • Closed Accounts Posts: 5,835 ✭✭✭Torqay


    If you need to edit it, Vim should do the trick.


  • Closed Accounts Posts: 4,037 ✭✭✭Nothingbetter2d




  • Registered Users, Registered Users 2 Posts: 1,929 ✭✭✭PrzemoF


    OP, If you don't know how to use sed/swk/vim import the file into a database.

    Or split the file, do the work and merge it back.


    Anyway I'd suggest to get familiar with vim - it's the ultimate edior and it can handle pretty much what your system can handle in terms of file size.


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 27 saj1011


    can anybody please tell me directions on how to use vim ?

    Like how to add in a comma and edit 4,000,000 rows, for example

    yyyy.mm.dd hh.mm.ss

    to

    yyyy.mm.dd, hh.mm

    thanks


  • Closed Accounts Posts: 5,835 ✭✭✭Torqay


    Assuming that this pattern is nowhere else to be found in your CSV file, you can do this via search and replace but you will have to repeat the the process 20 or 30 times (depending on the hour format, 12 or 24), i.e. replace "1 0" with "1, 0", "2 0" with "2, 0", and so on.


  • Registered Users, Registered Users 2 Posts: 1,929 ✭✭✭PrzemoF


    vim includes a tutorial. it's well worth to do it (it's in a form of an excercise)

    To answer you question: if the whole file contains only that kind of data (that's not true?) all you have to do is replace " " <SPACE> with
    ", " <COMMA><SPACE>

    In vim:

    :%s/ /, /

    http://vim.wikia.com/wiki/Search_and_replace


  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    saj1011 wrote: »
    can anybody please tell me directions on how to use vim ?

    Like how to add in a comma and edit 4,000,000 rows, for example

    yyyy.mm.dd hh.mm.ss

    to

    yyyy.mm.dd, hh.mm

    thanks


    You mention VIM, so I'm assuming you have access to some unix tools / a command line. What you want is sed. Your one command to do that for the 4 million rows would be:

    sed -i -r 's/([0-9]{4}\.[0-9]{2}\.[0-9]{2}) ([0-9]{2}\.[0-9]{2}\.[0-9]{2})/\1\, \2/' <filename.csv>

    Should only take a few seconds to complete. Make a backup of your original. Use at your own risk. I am not responsible for any balls ups.

    edit: If you don't have access to sed, you can email me a zip of the .csv and I'll happily run it for you.


  • Registered Users, Registered Users 2 Posts: 2,781 ✭✭✭amen


    use a regular expression


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    Khannie wrote: »
    Should only take a few seconds to complete.

    Might actually take a bit over a minute, depending on how fast your processor is. I just tested with 2 million lines, each of which had only the date on it and it took 16 seconds. Old processor on this box though.


  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    amen wrote: »
    use a regular expression

    Ignore this IMO. It raises more questions for you at this juncture than it answers. The garble after -r in that sed command I gave you is a regular expression. Learning how to use them properly can be somewhat time consuming (there are books on the subject). They are all powerful however.


  • Registered Users, Registered Users 2 Posts: 1,929 ✭✭✭PrzemoF


    OP, can you post a few sample lines (change the data if it's confidential)


  • Registered Users, Registered Users 2 Posts: 27 saj1011


    ah! ok i got more responses!, sorry i missed this!

    so what i need is this raw data line :

    2012.02.14 15:59:00,1.31323,1.31331,1.31297,1.31313,341.53

    turned into this:

    2012.02.14,15:59:00,1.31323,1.31331,1.31297,1.31313,341.53

    need a comma in between the date + time 4,000,000ish times!


  • Closed Accounts Posts: 5,835 ✭✭✭Torqay


    saj1011 wrote: »
    ah! ok i got more responses!, sorry i missed this!

    so what i need is this raw data line :

    2012.02.14 15:59:00,1.31323,1.31331,1.31297,1.31313,341.53

    turned into this:

    2012.02.14,15:59:00,1.31323,1.31331,1.31297,1.31313,341.53

    need a comma in between the date + time 4,000,000ish times!

    Replace all blank spaces with a komma.


  • Registered Users, Registered Users 2 Posts: 27 saj1011


    Hello, thanks, but how to do this in VIM or something similar that can handle 4,000,000 rows? as ms excel cannot.
    thanks


  • Registered Users, Registered Users 2 Posts: 1,144 ✭✭✭Ballyv24


    Did you try opening the file using http://cream.sourceforge.net


  • Registered Users, Registered Users 2 Posts: 1,929 ✭✭✭PrzemoF


    vim way:
    open the file. type

    :%s/ /,/

    press enter and save using

    :wq

    or

    :x


  • Registered Users, Registered Users 2 Posts: 19,531 ✭✭✭✭Krusty_Clown


    You could try EmEditor, which features large file support, is GUI based, and will do search and replace using regular expressions. You can also create and run Macros (Javascript or VBScript). It supports files up to 248GB (or 2.1 Billion lines).


Advertisement