Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Suggestions for system checks

  • 24-06-2008 8:52am
    #1
    Registered Users, Registered Users 2 Posts: 23,212 ✭✭✭✭


    I am an Oracle DBA (allegedly) and yesterday was presented with a problem on a client's Linux machine that had me confused to say the least.

    I won't bore you with the DBA stuff, but I eventually found that the system had gone a bit funny and a process called "SCSI_eh3" was taking up 100% of the cpu, hence why my database wasn't responding. Anyway, it was fixed with the old sledgehammer approach, init 6.

    Which got me thinking - I have the Database morning checks scripted, I was thinking of scripting some form of OS morning checks. I already have a script checking disk space, but I was wondering what else I should be checking for on a daily basis? I have full root access to the machines (there are five of them).

    All suggestions greatly welcomed, but remember, I am far from a Unix expert.


Comments

  • Registered Users, Registered Users 2 Posts: 1,606 ✭✭✭djmarkus


    uptime will tell you the load average, or you can use tools like vmstat and iostat to tell you what load the machine is under.


  • Registered Users, Registered Users 2 Posts: 16,288 ✭✭✭✭ntlbell


    This going to be of these "it depends" answer's

    Firstly forget about "morning scripts" if the DB is important to you; you pretty much will want to know straight away if performance is below par.

    Firstly take the roll of the box so in this case it's a DB server so what's important to you? disk speed? read/write's? memory useage? CPU states/ network performance whatever it is that you rely on most for your DB to be at peak perfromance for the majority of the time and then look into the tool.

    You can get some all-in one wonder tools that claim to do all the above which will run and regular intervals every 15 minutes for example and email you when something's not meeting your criteria for example the CPU has been running over 80% for more than 5 minutes etc

    If you work in a college or have linux admin's around you it might be an idea to have chat with them and see what they monitor.


    but as has been mentioned things like top/iostat/vmstat will all be very helpfull


  • Registered Users, Registered Users 2 Posts: 564 ✭✭✭fishfoodie


    Tom Dunne wrote: »
    I am an Oracle DBA (allegedly) and yesterday was presented with a problem on a client's Linux machine that had me confused to say the least.

    I won't bore you with the DBA stuff, but I eventually found that the system had gone a bit funny and a process called "SCSI_eh3" was taking up 100% of the cpu, hence why my database wasn't responding. Anyway, it was fixed with the old sledgehammer approach, init 6.

    Which got me thinking - I have the Database morning checks scripted, I was thinking of scripting some form of OS morning checks. I already have a script checking disk space, but I was wondering what else I should be checking for on a daily basis? I have full root access to the machines (there are five of them).

    All suggestions greatly welcomed, but remember, I am far from a Unix expert.


    Get you client to check out this box immediately, as when a SCSI related process starts hogging the CPU it can mean that there is an underlying H/W issue. Many moons ago I ignored a similar situation on a HP box only to find that a controller was steadily dying & was dumping endless errors hence the CPU transactions. Result dead controller & corrupt disks.

    No smoke without fire & all that...!


  • Registered Users, Registered Users 2 Posts: 2,694 ✭✭✭Dingatron


    I use Nagios for some real-time alerts. You can setup alerts to email you if certain conditions are met etc. Loads of plugins available but if your talking 5 machines it may not be worth the bother but all the same I find it invaluable for our particular set-up.


  • Registered Users, Registered Users 2 Posts: 23,212 ✭✭✭✭Tom Dunne


    Thanks for the feedback, but these guys aren't really interested in real-time monitoring.

    In fact, I doubt they actually have any kind of monitoring, bar the DB checks I do.

    The DB is pretty basic - read-only, apart from a nightly update, 30-40 users doing basic queries through a front-end, plus they are in pairs, so if one goes pear-shaped, there is an auto-failover of sorts to the other.

    To give you an idea of what they are like, while I was on the phone to one of their support people in the actual computer room, standing beside the machine, he asked me "They run like, some mad Unix operating system, don't they?". :D


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 2,694 ✭✭✭Dingatron


    Tom Dunne wrote: »
    "They run like, some mad Unix operating system, don't they?". :D


    Ha I know what your talking about, sounds very familiar :) I'm sure their DR policy would make some fine reading so.


    ***DR policy? Scratches head :confused:***


  • Registered Users, Registered Users 2 Posts: 16,288 ✭✭✭✭ntlbell


    Tom Dunne wrote: »
    Thanks for the feedback, but these guys aren't really interested in real-time monitoring.

    In fact, I doubt they actually have any kind of monitoring, bar the DB checks I do.

    The DB is pretty basic - read-only, apart from a nightly update, 30-40 users doing basic queries through a front-end, plus they are in pairs, so if one goes pear-shaped, there is an auto-failover of sorts to the other.

    To give you an idea of what they are like, while I was on the phone to one of their support people in the actual computer room, standing beside the machine, he asked me "They run like, some mad Unix operating system, don't they?". :D

    just setup a cronjob so to mail someone in the morning stating

    "everything's grand"


  • Posts: 5,589 ✭✭✭ [Deleted User]


    To hijack this thread -

    is there a script that I can get which will show the me the CPU usage, RAM usage and whats using it in Linux?


  • Registered Users, Registered Users 2 Posts: 2,694 ✭✭✭Dingatron


    Have a look at top

    http://www.linuxforums.org/misc/using_top_more_efficiently.html

    You can probably create a script grep'ing out what you want. I just use a plugin in nagios to check cpu load.


  • Registered Users, Registered Users 2 Posts: 2,694 ✭✭✭Dingatron


    To hijack this thread -

    is there a script that I can get which will show the me the CPU usage, RAM usage and whats using it in Linux?


    Also came across this today which may help you regarding top


    http://www.howtoforge.com/extract-values-from-top-and-plot-them


  • Advertisement
  • Closed Accounts Posts: 1,506 ✭✭✭Jackz


    Here's something to gather info about a linux system, it's a bit old (May need some commands updated) I used it in the past when we had a new customer to get an easy to read overview of systems. I got it from somewhere on the web. It's not performance monitoring but it has a nice html output.

    Where I worked we used vmstat, iostat and top etc and awked the output.

    With the output:

    - We checked to see if it was over the threshold for 2/3 consecutive checks and raised an alart on our monitoring system (PHP with Oracle backend) if it was.

    - We sents stats on a constant basis to our central stat db's for web based and pdf reports foor our customers.

    I'v been testing out groundworks/nagios to use for this kind of monitoring and statistic recording in the future but a simplified version of cpu,memory,io and diskspace monitoring for linux, solaris etc would be handy, just raising a simple e-mail alert. Not real monitoring but just some info if a threshold is breached. Lightweight with no db external to the customer system.
    #!/bin/bash
    # Description: Linux System Information Report in HTML format
    # Usage: script-name > /root/output.html
    
    cat << HEAD
    <HTML>
    <HEAD><TITLE>System Report</TITLE></HEAD>
    <BODY bgcolor="#c0c0c0" text="#000000">
    <HR size=5>
    <P>
    <H1>System Report for "
    HEAD
    echo `hostname`
    
    cat << BODY1
    "</H1>
    <P>
    <HR size=5>
    <P>
    BODY1
    
    cat << HOSTNAME
    <H3>OS System Configuration:</H3>
    <B>hostname: </B> 
    HOSTNAME
    /bin/hostname
    
    cat << RELEASE
    <P>
    <B>OS Release: </B> 
    RELEASE
    /bin/cat /etc/*-release
    
    cat << HOSTID
    <P>
    <B>hostid: </B> 
    HOSTID
    /usr/bin/hostid
    
    cat << UNAMEO
    <P>
    <B>Kernel OS: </B>
    UNAMEO
    uname --operating-system
    
    cat << UNAMER
    <P>
    <B>Kernel release: </B>
    UNAMER
    uname --kernel-release
    
    cat << UNAMEV
    <P>
    <B>Kernel version: </B>
    UNAMEV
    uname --kernel-version
    
    cat << UNAMEK
    <P>
    <B>Harware Platform: </B>
    UNAMEK
    uname --hardware-platform
    
    cat << UNAMEPM
    <P>
    <B>Processor Architecture: </B>
    UNAMEPM
    uname --processor
    
    cat << CHKCONFIG
    <P>
    <B>System services: (chkconfig)</B> 
    <DL><DD>
    <SMALL>
    <PRE>
    CHKCONFIG
    /sbin/chkconfig --list|grep on
    cat << CHKCONFIGEND
    </PRE>
    </SMALL>
    </DL>
    <P>
    CHKCONFIGEND
    
    cat << CRONTAB
    <P>
    <B>File: <TT>/etc/crontab</TT></B> 
    <DL><DD>
    <SMALL>
    <PRE>
    CRONTAB
    cat /etc/crontab
    cat << CRONTABEND
    </PRE>
    </SMALL>
    </DL>
    <P>
    CRONTABEND
    
    echo "<P><HR><P>"
    echo "<H3>Network Configuration:</H3>"
    
    cat << HOSTS
    <P>
    <B>File: <TT>/etc/hosts</TT>: </B> 
    <DL><DD>
    <SMALL>
    <PRE>
    HOSTS
    cat /etc/hosts
    cat << HOSTSEND
    </PRE>
    </SMALL>
    </DL>
    <P>
    HOSTSEND
    
    cat << SWITCH
    <B>File: <TT>/etc/nsswitch.conf</TT>: </B> 
    <DL><DD>
    <SMALL>
    <PRE>
    SWITCH
    cat /etc/nsswitch.conf
    cat << SWITCHEND
    </PRE>
    </SMALL>
    </DL>
    <P>
    SWITCHEND
    
    cat << RESOLV
    <B>File: <TT>/etc/resolv.conf</TT>: </B> 
    <DL><DD>
    <SMALL>
    <PRE>
    RESOLV
    cat /etc/resolv.conf
    
    cat << RESOLVEND
    </PRE>
    </SMALL>
    </DL>
    <P>
    RESOLVEND
    
    cat << IFCONFIG
    <B>ifconfig: </B> 
    <DL><DD>
    <SMALL>
    <PRE>
    IFCONFIG
    /sbin/ifconfig
    cat << IFCONFIGEND
    </PRE>
    </SMALL>
    </DL>
    <P>
    IFCONFIGEND
    
    cat << ROUTE
    <B>/sbin/route: </B> 
    <DL><DD>
    <SMALL>
    <PRE>
    ROUTE
    /sbin/route
    cat << ROUTEEND
    </PRE>
    </SMALL>
    </DL>
    <P>
    ROUTEEND
    
    if [[ -r /etc/sysconfig/network ]];
    then
    cat << IFCFGN
    <B>Network Configuration File: <TT>/etc/sysconfig/network</TT>: </B> 
    <DL><DD>
    <SMALL>
    <PRE>
    IFCFGN
    cat /etc/sysconfig/network
    cat << IFCFGENDN
    </PRE>
    </SMALL>
    </DL>
    <P>
    IFCFGENDN
    fi
    
    cat << IFCFG
    <B>Files <TT>/etc/sysconfig/network-scripts/ifcfg-eth*</TT>: </B> 
    <DL><DD>
    <SMALL>
    <PRE>
    IFCFG
    cat /etc/sysconfig/network-scripts/ifcfg-eth*
    cat << IFCFGEND
    </PRE>
    </SMALL>
    </DL>
    <P>
    IFCFGEND
    
    echo "<P><HR><P>"
    
    if [[ -r /etc/mail/local-host-names || -r /etc/sendmail.cw || -r /etc/aliases || -r /etc/mail/virtusertable ]];
    then
    echo "<H3>Mail Server Configuration:</H3>"
    
    if [[ -r /etc/mail/local-host-names ]];
    then
    # Redhat 7.1 - Fedora Core X
    cat << SENMAILCFGN2
    <B>Mail Hosts File: <TT>/etc/mail/local-host-names</TT>: </B> 
    <DL><DD>
    <SMALL>
    <PRE>
    SENMAILCFGN2
    cat /etc/mail/local-host-names
    cat << SENMAILCFGN2
    </PRE>
    </SMALL>
    </DL>
    <P>
    SENMAILCFGN2
    
    elif [[ -r /etc/sendmail.cw ]];
    then
    # Redhat 6.x
    cat << SENMAILCFGN
    <B>Mail Hosts File: <TT>/etc/sendmail.cw</TT>: </B> 
    <DL><DD>
    <SMALL>
    <PRE>
    SENMAILCFGN
    cat /etc/sendmail.cw
    cat << SENMAILCFGN
    </PRE>
    </SMALL>
    </DL>
    <P>
    SENMAILCFGN
    fi
    
    if [[ -r /etc/mail/virtusertable ]];
    then
    cat << SENMAILCFGV
    <B>Sendmail Virtual Table File: <TT>/etc/mail/virtusertable</TT>: </B> 
    <DL><DD>
    <SMALL>
    <PRE>
    SENMAILCFGV
    cat /etc/mail/virtusertable
    cat << SENMAILCFGV
    </PRE>
    </SMALL>
    </DL>
    <P>
    SENMAILCFGV
    fi
    
    if [[ -r /etc/aliases ]];
    then
    cat << SENMAILCFGN
    <B>eMail Aliases File: <TT>/etc/aliases</TT>: </B> 
    <DL><DD>
    <SMALL>
    <PRE>
    SENMAILCFGN
    cat /etc/aliases
    cat << SENMAILCFGN
    </PRE>
    </SMALL>
    </DL>
    <P>
    SENMAILCFGN
    fi
    
    fi
    
    echo "<P><HR><P>"
    
    cat << DF
    <H3>Storage:</H3>
    <B>df -k: </B>
    <DL><DD>
    <SMALL>
    <PRE>
    DF
    df -k
    cat << DFEND
    </PRE>
    </SMALL>
    </DL>
    <P>
    DFEND
    
    cat << FDISK
    <B>Disk Partitions: <TT>/sbin/fdisk -l</TT>:</B>
    <DL><DD>
    <SMALL>
    <PRE>
    FDISK
    /sbin/fdisk -l
    cat << FDISKEND
    </PRE>
    </SMALL>
    </DL>
    <P>
    FDISKEND
    
    cat << FSTAB
    <B>File: <TT>/etc/fstab</TT>: </B> 
    <DL><DD>
    <SMALL>
    <PRE>
    FSTAB
    cat /etc/fstab
    cat << FSTABEND
    </PRE>
    </SMALL>
    </DL>
    FSTABEND
    
    echo "<P><HR><P>"
    
    cat << HARDWARE
    <H3>Hardware Configuration:</H3>
    <B>CPU info: </B> 
    <DL><DD>
    <SMALL>
    <PRE>
    HARDWARE
    cat /proc/cpuinfo
    
    cat << SWAP
    </PRE>
    </SMALL>
    </DL>
    <P>
    <B>Total Swap Memory: </B> 
    <DL><DD>
    SWAP
    grep SwapTotal: /proc/meminfo
    
    
    cat << MEM
    </DL>
    <P>
    <B>System Memory: </B> 
    <DL><DD>
    MEM
    grep MemTotal /proc/meminfo
    cat << MEMEND
    </DL>
    <P>
    MEMEND
    
    cat << PCI
    <B>/sbin/lspci: </B> 
    <DL><DD>
    <SMALL>
    <PRE>
    PCI
    /sbin/lspci
    cat << PCIEND
    </PRE>
    </SMALL>
    </DL>
    <P>
    PCIEND
    
    cat << HWCONF
    <B>Devices:</B>
    <DL><DD>
    <B>File: <TT>/etc/sysconfig/hwconf</TT></B> 
    <SMALL>
    <PRE>
    HWCONF
    cat /etc/sysconfig/hwconf
    cat << HWCONFEND
    </PRE>
    </SMALL>
    </DL>
    <P>
    HWCONFEND
    
    
    cat << BODYEND
    <P>
    <HR>
    <P>
    </BODY>
    </HTML>
    BODYEND
    
    


Advertisement