Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

How long is too long for an XML file

  • 15-09-2011 12:31pm
    #1
    Registered Users, Registered Users 2 Posts: 7,893 ✭✭✭


    I have an XML file that is over 52k lines long in the formate:
    <root>
    <subroot>
    <value>
    blah
    </value>
    <vaue>
    meh
    </value>
    <subroot>
    </root>


    Is that too long? Should I switch to SQLite? I'll be adding small bits of data to it every week so its gonna grow (slowly over the next while).

    Thanks.


Comments

  • Registered Users, Registered Users 2 Posts: 2,040 ✭✭✭Colonel Panic


    Way too long. XML is more suitable as a means of data exchange than it is for storing stuff that is essentially a table of data that will grow and be modified.

    Performance will plummet quickly!


  • Closed Accounts Posts: 2,930 ✭✭✭COYW


    Performance should be the barometer here. If it interaction/s with the xml file result in the application performance being slow then move it elsewhere eg : DB.


  • Registered Users, Registered Users 2 Posts: 7,893 ✭✭✭The_B_Man


    Well it'll be written to twice weekly, and then some calculations will be done on it, the result of which will be written to a text file after each pass. This output will be only a few characters, ie less than 10. That text file will then be the one that gets accessed a heap of times. Its for a mobile app so any time someone pressed "update" then that text file will be the one thats read.

    I suppose I could move it to a database, but writing the code might be hard so I only wanna do it if its necessary.


  • Registered Users, Registered Users 2 Posts: 1,311 ✭✭✭Procasinator


    Can I ask the format of the data? Cause if it's simple as your example, a value per line seems to fit the bill.


  • Moderators, Society & Culture Moderators Posts: 9,689 Mod ✭✭✭✭stevenmu


    The_B_Man wrote: »
    I suppose I could move it to a database, but writing the code might be hard so I only wanna do it if its necessary.

    Probably easier than the XML code anyway :)

    Using a database will have serious advantages over an XML file in this scenario. Performance should be much, much better. It will be far more stable and reliable, imagine if your file got corrupted (which could happen surprisingly easily).


  • Advertisement
  • Moderators, Technology & Internet Moderators Posts: 1,336 Mod ✭✭✭✭croo


    I don't know... people are quick to use DBs when they are not needed. A Db is just files on a disk too!
    The advantages I see that a DB might add would be;
    1. indexing & built in caching for performance
    2. concurrent access
    Sounds like concurrency of access and update is not required. And if, when the file in processed, it is read in its entirety then indexing and caching would likely have no impact either.
    I think I'd side with COYW and say if you see a performance issue then look to change. Perhaps you could create a test file with 100k lines to see if performance is an issue then.


  • Registered Users, Registered Users 2 Posts: 2,040 ✭✭✭Colonel Panic


    SQLite isn't the database to use if you want concurrent access. I think using a large flat file is perfectly fine but with something like XML, you're at the mercy of the library you use to parse it!

    That said, premature optimization is counterproductive, so yeah, see how you fare with XML and move to SQLite later if required.


  • Registered Users, Registered Users 2 Posts: 898 ✭✭✭OREGATO


    I've had to process xml files that are over 500mb in size and its a nightmare even opening the file, it'll all depend on how big your dataset is going to grow to imo.

    If it is a cumulative file that will grow, I would recommend a database all the way for the reasons above.


  • Closed Accounts Posts: 10,012 ✭✭✭✭thebman


    The_B_Man wrote: »
    Well it'll be written to twice weekly, and then some calculations will be done on it, the result of which will be written to a text file after each pass. This output will be only a few characters, ie less than 10. That text file will then be the one that gets accessed a heap of times. Its for a mobile app so any time someone pressed "update" then that text file will be the one thats read.

    I suppose I could move it to a database, but writing the code might be hard so I only wanna do it if its necessary.

    What mobile platform?


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    The_B_Man wrote: »
    Well it'll be written to twice weekly, and then some calculations will be done on it, the result of which will be written to a text file after each pass. This output will be only a few characters, ie less than 10. That text file will then be the one that gets accessed a heap of times. Its for a mobile app so any time someone pressed "update" then that text file will be the one thats read.

    I suppose I could move it to a database, but writing the code might be hard so I only wanna do it if its necessary.

    Where will be in 5 yrs time....


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 2,021 ✭✭✭ChRoMe


    BostonB wrote: »
    Where will be in 5 yrs time....

    Spoken like someone who has had to clean up this sort of **** later on ;)


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    Its a nightmare.

    Any security issues with this data?


  • Registered Users, Registered Users 2 Posts: 7,893 ✭✭✭The_B_Man


    Its actually this:
    Lotto Helper IE

    So basically my python/django server parses the lotto site, gets the numbers, appends them to the XML file, then I do some fancy stuff and get my results.
    I'm the only one who's able to run the code that modifies the XML file, so we're talking a few times a week after every draw. The users click on the "update" button in the app, which calls a URL on my REST server, which reads from a text file, that then displays it to the screen in JSON, which is consumed by the app.

    I'm thinking that concurrency might be important here. I'm not sure what will happen if 2 users click "update" at the same time. If the text file can be opened more than once, then i'm fine, else I think I'll have to switch to MySQL.


  • Registered Users, Registered Users 2 Posts: 11,989 ✭✭✭✭Giblet


    The_B_Man wrote: »
    Its actually this:
    Lotto Helper IE

    So basically my python/django server parses the lotto site, gets the numbers, appends them to the XML file, then I do some fancy stuff and get my results.
    I'm the only one who's able to run the code that modifies the XML file, so we're talking a few times a week after every draw. The users click on the "update" button in the app, which calls a URL on my REST server, which reads from a text file, that then displays it to the screen in JSON, which is consumed by the app.

    I'm thinking that concurrency might be important here. I'm not sure what will happen if 2 users click "update" at the same time. If the text file can be opened more than once, then i'm fine, else I think I'll have to switch to MySQL.

    Are you loading the file into memory for each read? I think you need a database, quick.
    Database output can be cached easier too. You should be having a
    DB -> Cache -> Container Object -> JSON Serialised Object -> HTTP stream application/json
    stack.
    That's so trivial nowadays it's pretty much baked into most frameworks.


  • Closed Accounts Posts: 577 ✭✭✭Galtee


    The_B_Man wrote: »
    Its actually this:
    Lotto Helper IE

    So basically my python/django server parses the lotto site, gets the numbers, appends them to the XML file, then I do some fancy stuff and get my results.
    I'm the only one who's able to run the code that modifies the XML file, so we're talking a few times a week after every draw. The users click on the "update" button in the app, which calls a URL on my REST server, which reads from a text file, that then displays it to the screen in JSON, which is consumed by the app.

    I'm thinking that concurrency might be important here. I'm not sure what will happen if 2 users click "update" at the same time. If the text file can be opened more than once, then i'm fine, else I think I'll have to switch to MySQL.

    That looks pretty nifty. The only thing I could think of that may detract from it is that you could have potentially X amount with same numbers picked so if they do win big they will most likely have to share it or do you have a mechanism built in to ensure this doesn't happen? Also, I think I read somewhere that the lotto quickpick line is supposed to be unique (Obviously up to max permutations of numbers) so you're guaranteed that nobody else will have the same numbers unless someone happens to pick the same numbers themselves etc.


  • Moderators, Technology & Internet Moderators Posts: 1,336 Mod ✭✭✭✭croo


    quickpick line is supposed to be unique
    I don't think it is. I think they're just random selections.


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    To expand that app you're going to need to work with the data, I would prefer to do that in a DB. If you were looking at this as a portfolio piece I'd be thinking that way too.

    I dunno what quickpick uses, but in my experience my picking numbers blindfold gets a better hit ratio than the lotto machines. Irish or Euro. Nice idea for an app. You doing one of the euro lotto? Again I get better results on that than the Irish one.


  • Registered Users, Registered Users 2 Posts: 1,311 ✭✭✭Procasinator


    Do you need the XML at all? Seems you are just keeping cumulative data (i.e. the frequency of each number). A basic key-value infrastructure could do this on the cheap. You would just increment the counts on each update, rather than regenerate the whole results.

    XML seems overkill either way. XML shines as a human readable format and/or where a standard protocol is needed between third parties. You don't seem to have either use case, as the XML never leaves the confines of your web application.

    Something like a CSV line per draw might be better suited. Or even a number per line/space if it is all related to number frequency and not groupings per draw.


  • Registered Users, Registered Users 2 Posts: 7,893 ✭✭✭The_B_Man


    Giblet wrote: »
    Are you loading the file into memory for each read? I think you need a database, quick.
    Database output can be cached easier too. You should be having a
    DB -> Cache -> Container Object -> JSON Serialised Object -> HTTP stream application/json
    stack.
    That's so trivial nowadays it's pretty much baked into most frameworks.

    No, not each read.
    I am generating the XML file after each draw. Then it puts the relevant values into a text file.
    When the user requests an update from the app, it reads directly from that file. The XML isnt touched. But I take your point about the DB. Thats definitely something I'll be looking into.

    BostonB wrote: »
    To expand that app you're going to need to work with the data, I would prefer to do that in a DB. If you were looking at this as a portfolio piece I'd be thinking that way too.

    I dunno what quickpick uses, but in my experience my picking numbers blindfold gets a better hit ratio than the lotto machines. Irish or Euro. Nice idea for an app. You doing one of the euro lotto? Again I get better results on that than the Irish one.

    Ye, its definitely something I'll be putting on my CV. I'm a final year student so thats where my focus is now!
    Regarding the part in bold, the euromillions numbers are already in it!
    Do you need the XML at all? Seems you are just keeping cumulative data (i.e. the frequency of each number). A basic key-value infrastructure could do this on the cheap. You would just increment the counts on each update, rather than regenerate the whole results.

    XML seems overkill either way. XML shines as a human readable format and/or where a standard protocol is needed between third parties. You don't seem to have either use case, as the XML never leaves the confines of your web application.

    Something like a CSV line per draw might be better suited. Or even a number per line/space if it is all related to number frequency and not groupings per draw.


    As I said above, it looks like I'll be moving that way, on the advice from the others on here. Just need to find the time.

    Once I get it into the DB, I'll be doing statistical analysis on the numbers, and number generator with weighting on certain criteria.


  • Closed Accounts Posts: 1 CregLion


    The Long Path Tool is developed to reduce all kinds of managing problems and provide you with a sleek interface that users can work on with deleting, renaming files. http://longpathtool.com/


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 131 ✭✭CuAnnan


    The_B_Man wrote: »
    I suppose I could move it to a database, but writing the code might be hard so I only wanna do it if its necessary.

    Have you thought about using something like eXist?

    You could put all of your fragments into separate files and use xQuery to select and update them.


  • Registered Users, Registered Users 2 Posts: 4,857 ✭✭✭shootermacg


    I personally would have no hesitations calling database on this one. If your app has access to a db, then it's a no brainer.

    Some one earlier said there was indexing/caching advantages to a db, do not forget you also have a rdms dedicated to returning the data in the optimal amount of time, you also have the magic of using transactions for safety sake.

    No worrying about file handles, network performance hits or any other crap that can and will go wrong. I actually do a lot of xml related work and a good rule of thumb for xml is: if you aren't transforming the xml or sending it to a legacy app that expects xml, then don't.
    There's a lot of overhead with xml, in a lot of cases the actual elements are larger than the data they are describing.


  • Registered Users, Registered Users 2 Posts: 2,751 ✭✭✭MyPeopleDrankTheSoup


    100% I'd use an Sqlite DB for this seeing as it's an Android app.

    It's easier to use SQLite than mess around with XML Saxparser in Android IMO.


  • Registered Users, Registered Users 2 Posts: 26,584 ✭✭✭✭Creamy Goodness


    if you have to ask how big is big, then 99.99% of the time it will be too big.


  • Registered Users, Registered Users 2 Posts: 1,311 ✭✭✭Procasinator


    While the OP might still find new information useful, this is an old thread that was bumped by a spammer. How long is too long for an old thread?


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    I don't get the idea that old threads go out of date. Its either useful information or its not, regardless of date.


  • Registered Users, Registered Users 2 Posts: 2,751 ✭✭✭MyPeopleDrankTheSoup


    While the OP might still find new information useful, this is an old thread that was bumped by a spammer. How long is too long for an old thread?

    jesus i didn't even notice. i was wondering why the supposed app he was writing was fully developed in his signature!


  • Closed Accounts Posts: 18,056 ✭✭✭✭BostonB


    LOL.


  • Registered Users, Registered Users 2 Posts: 7,893 ✭✭✭The_B_Man


    Haha well its still useful info! I've a couple of apps in the pipeline and they'll be collating xml data from numerous sources so while the thread is old, the info is current!

    When I originally made this thread, I was storing everything in XML and recreating the XML file every time. The more I think about it though, a database was always the most logical method, since I can just parse the lotto site and add this weeks numbers!


    However, what I'd be interested to know now is which is generally considered faster, in terms of producing the output that the app will be consuming:
    1: Performing a SELECT statement and displaying the number data, or
    2: Reading from a text file containing the latest aggregated data and outputting.

    What say you?


    EDIT: While everyone is here, any feature suggestions or improvements for the app in question would be appreciated. While the data is updated 4 times a week (2 lotto draws, 2 euro draws), the app hasnt been updated in a while, so think I need to start enhancing it.


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 5,246 ✭✭✭conor.hogan.2


    A DBs sole purpose is the storage and retrieval of data. I would always use it when I want to store and retrieve data besides configuration files and the like.


  • Registered Users, Registered Users 2 Posts: 4,857 ✭✭✭shootermacg


    Not able to actually download the app at the moment, but I presume you are saving each draw somewhere. Database is definitely the way to go.

    As far as features go, would a weekly number frequency add some interest?
    If you are saving the date along with the numbers and you have a few years data, then you could have the numbers which were the most frequent for this week last year.
    Ah well you asked ^ ^, could be the summer heat affecting the balls and all that haha!


  • Registered Users, Registered Users 2 Posts: 7,893 ✭✭✭The_B_Man


    Not able to actually download the app at the moment, but I presume you are saving each draw somewhere. Database is definitely the way to go.

    As far as features go, would a weekly number frequency add some interest?
    If you are saving the date along with the numbers and you have a few years data, then you could have the numbers which were the most frequent for this week last year.
    Ah well you asked ^ ^, could be the summer heat affecting the balls and all that haha!

    Well there's an overall frequency, and percentages etc. in there already. You can see how many times a number has come up, and what that works out as the percentage of overall times its come up. I don't think people would be interested in the stats from last year, since they'd be very outdated.
    The only feature I have at the moment is just to add in a button to retrieve this weeks numbers. There's whole apps dedicated to doing that, but they don't have my stats stuff in them.


Advertisement