Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Java/Excel for large volume of data

  • 08-02-2011 1:38pm
    #1
    Registered Users, Registered Users 2 Posts: 505 ✭✭✭


    Hi Guys,

    I hope this is the right forum for this. I currently run several reports using the same template in Excel, each has a different data set. The raw data for each is of order 100,000+ rows and I'm adding about another 1000 per week on average. Each row has 8 entries. There are several complex formulae used repeatedly throughout the template using sumifs, vlookups, hlookups, match, index and others I can't remember off the top of my head. This makes it quite complex for my machine to compute.

    To run this report takes in the region of 10-20 minutes for each one, and I've written macros and a .vbs script to pretty much automate the whole updating process, which is sweet - I can double click my vbs script and go to lunch and when I come back, the whole thing's done (hurrah!).

    The thing is, the demand for these reports is growing, and more and more variations of them are needed, but because of the dependency on local hardware, I'm wondering is excel the best way to go? I could get access to servers running linux where I think I could place a java program that accepts the raw data as a .csv or whatever and run the program from there instead of using excel on local machines. This would mean I could also give access to the entire team and they could pull reports themselves. Currently we have a very wide range of hardware varying to mega, mega slow laptops, to decent enough Macs, so using the excel template has been unsuccessful for the most part so far. Scaling the project across the entire team would be simply fantastic!

    So my question really is this: would a java program be capable of calculating this type of data any faster than excel, or would it be even slower?

    Thanks!


Comments

  • Registered Users, Registered Users 2 Posts: 569 ✭✭✭none


    Java is grand but I feel the best solution might be on the backend. We have requirements for tabular data based on complex calculations and it's mainly done in the DB. Doesn't really matter the vendor as most of them offer some advanced set processing features. It's all based on the assumption your original data is in the DB, too? If that's the case, examine your DB programming options (ANSI SQL and its extensions), process your data internally and then either save it on the DB server or send to the application where the only thing left to do would be to save it to the filesystem.
    You can do it in Java (both CSV and XLS) but it's better to process it at source, i.e., in the DB if that's the source.


  • Moderators, Technology & Internet Moderators Posts: 1,336 Mod ✭✭✭✭croo


    There are open source ETL (Extract/Transform/Load) tools that will do what you want... they are written in java and most allow the ability to use transform using javascript :)

    I've used Penatho's kettle (Spoon) with postgresql & oracle DBs with great success.
    http://www.youtube.com/watch?v=EjzgzOanq1o
    Pentaho's approach of having a limited "community edition" but the full functionality only in the "enterprise edition" is annoying. So I have begun to shift to SpagoBI which is as powerful but just 1 version for all and they instead make profit from services.

    I am sure there are many more ETL tools available as well, but these are the one's I have recent experience with.


Advertisement