Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

So what happened this time?

Options
2

Comments

  • Registered Users Posts: 17,797 ✭✭✭✭hatrickpatrick


    Jaysus lads, I thought I'd been sitebanned without explanation.
    Given that I do things on a daily basis which would definitely justify such a ban, this incident unnerved me considerably. :D


  • Registered Users Posts: 3,745 ✭✭✭laugh


    Do you just have one big unsharded schema?

    How many read DBs do you guys use?


  • Closed Accounts Posts: 8,840 ✭✭✭Dav


    Tonight's kudos go to Chris, Colm, Conor and Alvis.

    We're flipping all the switches back to where they were this morning and we'll continue to monitor if for the next couple of hours.

    Our servers sit in Digiweb in Blanchardstown, there is no way I could have poured anything on them and I don't drink coffee :p


  • Posts: 0 [Deleted User]


    Seriously bad luck to have both disks in a RAID 1 fail at the same time. :(

    Well done for getting everything up and running again.


  • Moderators, Education Moderators Posts: 21,730 Mod ✭✭✭✭entropi


    GJ guys! Managed to work hard at it again to return us back to relative normality. Kudos :)


  • Advertisement
  • Moderators, Entertainment Moderators, Social & Fun Moderators Posts: 14,009 Mod ✭✭✭✭wnolan1992


    Well done to the tech team again. Certainly earning their paycheques this week. :pac:


    FYI, the Talk To... fora have reverted to the old style instead of the new swanky style.


  • Registered Users Posts: 17,399 ✭✭✭✭r3nu4l


    Dav wrote: »
    ...and I don't drink coffee :p
    Yeah, I'm gonna have to ask you to hand back your nerd badge. Sorry it had to come to this :(

    :pac:


    Fair play to one and all involved. Thanks for the hard work and effort :)


  • Registered Users Posts: 51,054 ✭✭✭✭Professey Chin


    Good work guys :)
    Horrible luck with the disks but nice to be back!


  • Registered Users Posts: 33,257 ✭✭✭✭Princess Consuela Bananahammock


    Dav wrote: »
    You go all year with no outages...

    What, 14 days?

    Everything I don't like is either woke or fascist - possibly both - pick one.



  • Moderators, Technology & Internet Moderators Posts: 4,621 Mod ✭✭✭✭Mr. G


    In fairness it was unexpected very rare for both to disks to fail. Fair play for getting it all back up and running.


  • Advertisement
  • Moderators, Technology & Internet Moderators Posts: 4,621 Mod ✭✭✭✭Mr. G


    entropi wrote: »
    GJ guys! Managed to work hard at it again to return us back to relative normality. Kudos :)

    Here's for another Sheldon pic :D

    sheldon-cooper-7.jpg


  • Moderators, Motoring & Transport Moderators Posts: 6,521 Mod ✭✭✭✭Irish Steve


    Dav wrote: »
    You go all year with no outages and then 2 come along within 10 days.

    So what happened today?

    First of all, it wasn't me! :D
    Ya reckon?:D:D

    It was all those 40 million messages being put into one bucket last week.....

    polished all the oxide off the surface of the discs, it was only a matter of time.....:P

    Seriously, that was not good news, though it makes the MTBF concept of new discs a little challenging, they're not supposed to fail quite that close together. Any danger that it was power supply related, rather than mechanical, as that could take several drives out at the same time.

    Whatever, well done to get it back that quickly.

    Shore, if it was easy, everybody would be doin it.😁



  • Registered Users Posts: 9,414 ✭✭✭irishgeo


    Are boards not using SSD drives?


  • Subscribers Posts: 4,075 ✭✭✭IRLConor


    laugh wrote: »
    Do you just have one big unsharded schema?

    I don't know about now, but as of 2 years ago sharding the boards.ie database would have been hilariously difficult to do. Pretty much every page served joined against the post table which accounts for the majority of the data. It's quite tricky to identify an axis along which the post table could be efficiently sharded without either rewriting large swaths of the code or creating maintenance nightmares.

    Ross and I learned a lot about sharding the data when Ross was building the search system and that was a much simpler schema, with no joins and no legacy code to convert.
    laugh wrote: »
    How many read DBs do you guys use?

    Two years ago it was one master and two slaves.


  • Registered Users Posts: 20,830 ✭✭✭✭Taltos


    Hi guys.

    When someone gets a chance can you please re-open the "Separation & Divorce" forum? Currently marked as closed.

    Cheers.


  • Registered Users Posts: 4,759 ✭✭✭cython


    Karsini wrote: »
    Seriously bad luck to have both disks in a RAID 1 fail at the same time. :(

    Well done for getting everything up and running again.

    Definitely. I presume that the possibility of a controller issue resulting in an earlier failure somehow not being reported has been ruled out? I've seen a lot stranger happen with RAID 1, to be fair, such as one of the disks being weeks out of date and suddenly being switched over to as the read source. It resulted in the (temporary) apparent loss of all data entered in the meantime until it could be identified that the disks had been out of sync, and the up to date one was still working, just not in use.


  • Registered Users Posts: 1,012 ✭✭✭route66


    Dav wrote: »
    One of the database slaves had a major failure with it's hard disks. Before anyone asks, yes they were in RAID (1 to be exact), but both disks failed. It's rare that your redundancy fails at the same time as the main device, but not unheard of.

    For both disks to fail at the same time would be - I guess - a "winning the lotto" type chance.

    More common would be a failed shared component - a backplane, a disk controller, a cable, etc. If this is the case, then the failure may come back. :eek:

    Another common scenario with RAID 1 is that one disk (or bank of disks) fails, goes unnoticed/unreported, then the other disk fails - BANG!

    Must go now and check my Lotto numbers ;)


  • Boards.ie Employee Posts: 12,597 ✭✭✭✭✭Boards.ie: Niamh
    Boards.ie Community Manager


    Taltos wrote: »
    Hi guys.

    When someone gets a chance can you please re-open the "Separation & Divorce" forum? Currently marked as closed.

    Cheers.
    Alvis has re-opened that now :)


  • Registered Users Posts: 18,524 ✭✭✭✭kippy


    Mr. G wrote: »
    In fairness it was unexpected very rare for both to disks to fail. Fair play for getting it all back up and running.

    They usually don't all right.
    What tends to happen is one disk fails...........there is more pressure then on the other disk and that fails also within a shorter enough period of time, so it's critical to know that one disk failed as soon as possible in order to replace it before things get more awkward!
    Been caught like that myself in the past on a RAID 5.

    Well done on sorting it.


  • Registered Users Posts: 1,012 ✭✭✭route66


    kippy wrote: »
    They usually don't all right.
    What tends to happen is one disk fails...........there is more pressure then on the other disk and that fails also within a shorter enough period of time, so it's critical to know that one disk failed as soon as possible in order to replace it before things get more awkward!
    Been caught like that myself in the past on a RAID 5.

    Well done on sorting it.

    With RAID 1, if a disk or bank of disks fail, the remaining healthy one(s) just continue to do their normal work; the extra copy of data just doesn't get written anywhere.

    The exception is read activity on a RAID 1 setup: many make use of both sides of the setup to reduce read time. If there is a failure, then this extra efficiency is no longer available but I would expect this to just increase read time rather than causing the remaining healthy one(s) to die!

    RAID 5 is completely different with data and parity data being written across all disks in the array.


  • Advertisement
  • Registered Users Posts: 22,646 ✭✭✭✭Sauve


    NNNEEERRRRRDDDSSSS :D

    (Sorry :p)


  • Registered Users Posts: 6,771 ✭✭✭knucklehead6


    Sauve wrote: »
    NNNEEERRRRRDDDSSSS :D

    (Sorry :p)


    says the mod of 5 different forums.....

    Pot.. kettle..... :p


  • Moderators, Category Moderators, Arts Moderators, Business & Finance Moderators, Entertainment Moderators, Society & Culture Moderators Posts: 18,279 CMod ✭✭✭✭Nody


    route66 wrote: »
    For both disks to fail at the same time would be - I guess - a "winning the lotto" type chance.
    If you want to talk lotto numbers try this one on for size (yes it happend, yes I was impacted by it as was several hospitals etc. and I got the official and unoffical report from the event).

    A country site is due to go through yearly emergency power test. Site is wired with 3 pairs of batteries (one pair is enough to run it for 30 min) and two diesel generators (one enough to power the whole site). Batteries are always on but only kick in if power gets broken, generators set to be running with in 1 min of power being cut.

    Power is cut as planned at 1am local and wham all three pairs of batteries fail AND both generators refuse to start. Every single server and fibe connection goes down inc. every MUX at customer site etc. loses sync.

    Oh happy happy days (it took over 8h to get everything up and running once the main power was turned on again)...


  • Registered Users Posts: 1,012 ✭✭✭route66


    Nody wrote: »
    If you want to talk lotto numbers try this one on for size (yes it happend, yes I was impacted by it as was several hospitals etc. and I got the official and unoffical report from the event).

    A country site is due to go through yearly emergency power test. Site is wired with 3 pairs of batteries (one pair is enough to run it for 30 min) and two diesel generators (one enough to power the whole site). Batteries are always on but only kick in if power gets broken, generators set to be running with in 1 min of power being cut.

    Power is cut as planned at 1am local and wham all three pairs of batteries fail AND both generators refuse to start. Every single server and fibe connection goes down inc. every MUX at customer site etc. loses sync.

    Oh happy happy days (it took over 8h to get everything up and running once the main power was turned on again)...

    Ooops ...


  • Registered Users Posts: 10,758 ✭✭✭✭TeddyTedson


    Why don't you guys just delete all the threads older than 5 years. They're Zombie threads and you'd save space on your hard drive :)


  • Moderators, Social & Fun Moderators, Society & Culture Moderators Posts: 30,873 Mod ✭✭✭✭Insect Overlord


    Why don't you guys just delete all the threads older than 5 years. They're Zombie threads and you'd save space on your hard drive :)

    Don't be silly. The source of social history, sense of community and plain old hilarity of some of the old content are what makes this site so great. :)


  • Closed Accounts Posts: 31,967 ✭✭✭✭Sarky


    Dav wrote: »
    Our servers sit in Digiweb in Blanchardstown, there is no way I could have poured anything on them and I don't drink coffee :p

    Well obviously if you drank coffee you'd have none to pour over the servers. J'ACCUSE!


  • Registered Users Posts: 44 damned_junkie


    IRLConor wrote: »
    Ross and I learned a lot about sharding the data when Ross was building the search system and that was a much simpler schema, with no joins and no legacy code to convert.

    The sharded search set-up was ultimately ditched in favour of a single index with replication onto a second machine. Turns out the overhead from sharding is way higher than the gain from smaller indexes. AFAIK the second machine is a cold standby these days. The relatively low query load means you get better cache performance with a single node handling all the queries than with two nodes handling half each.

    But yeah sharding the post table... I put a lot of noodle scratching into that one too, there's no clear way to do it easily. Chucking RAM and faster disks at it will probably keep it working for another few years though!


  • Closed Accounts Posts: 8,840 ✭✭✭Dav


    Why don't you guys just delete all the threads older than 5 years. They're Zombie threads and you'd save space on your hard drive :)

    It's nothing to do with space actually. The totallity of the posts table is about 25GB, PMs take about 10GB (I think) and attachments are running around 15GB.

    So it's not huge volumes of data by any stretch of imagination, but as it's, in the case of the posts and PMs tables, plain text, that is a vast amount of information to be processed at any given time.

    For those of you who don't know, 1 character of plain ASCII text = 1 byte.
    1 Kilobyte = 1024 characters.
    1 Gigabyte = approx 1 billion characters

    So the boards post table contains about 25 billion characters of text in it.

    Imagine now that you have to try and work with all that in some meaningful way and you'll understand why our databases are so difficult to work with.

    Besides which, deleting the history of the site seems abhorrent to me, I notion of the thoughts, ideas, discussions and nonsense etc of the many thousands of people who've used the site over the years just being gone fills me with a sense of dread and "wrongness" that I can't put into words.


  • Advertisement
  • Closed Accounts Posts: 16,396 ✭✭✭✭kaimera


    Still on php & vb dav?

    Do you have stats on how often old or 'archived' data is accessed? (what's considered old by the team?)

    By users and/or unregs (is it google searches bringing views to threads or current users searching boards)

    Can anything over 3/5 years (example) be shunted off to a separate disk as 'archive' and given RO perms? (save zombie threads being dug up for eg)


Advertisement