Skip to topic | Skip to bottom
Home
Main
Main.JeremyCothranNotesServerr1.6 - 29 Feb 2008 - 20:09 - JeremyCothrantopic end

Start of topic | Skip to actions
Notes intended for internal reference on server issues.

Adding a listserv

As root user:

copied existing listserv files under /usr2/home/majordomo ('nodes' listserv for example) to new ('hypoxia' for example). Changed permissions similar to other listservs. Removed email references to just myself for /usr2/home/majordomo/lists/hypoxia and /usr2/home/majordomo/lists/hypoxia_post

[root]# ls -l /usr2/home/majordomo/lists/hypoxia*
-rw-r--r--    1 majordomo daemon         20 Jul  8 16:40 /usr2/home/majordomo/lists/hypoxia
-rw-r-----    1 majordomo daemon      16498 Jul  8 16:35 /usr2/home/majordomo/lists/hypoxia.config
-rw-r--r--    1 majordomo daemon         20 Jul  8 16:41 /usr2/home/majordomo/lists/hypoxia_post
-rw-r-----    1 majordomo daemon      16541 Jul  8 16:35 /usr2/home/majordomo/lists/hypoxia_post.config

/usr2/home/majordomo/lists/hypoxia.archive:
total 4
-rw-rw-r--    1 majordomo daemon       1660 Jul  8 17:07 hypoxia.0507

Used perl search and replace to swap references to 'nodes' to 'hypoxia'

[root]# perl -pi -e 's/nodes/hypoxia/g' hypox*

Copied 'nodes' section of /etc/aliases file for 'hypoxia'

hypoxia:              "|/usr2/home/majordomo/wrapper resend -h caro-coops.org -l hypoxia hypoxia-outgoing",hypoxia-archivehypoxia-archive:      "|/usr2/home/majordomo/wrapper archive2.pl -f /usr2/home/majordomo/lists/hypoxia.archive/hypoxia -m -a"
hypoxia-outgoing:     :include: /usr2/home/majordomo/lists/hypoxia
owner-hypoxia:        pseal
hypoxia-request:      "|/usr2/home/majordomo/wrapper request-answer hypoxia"
hypoxia-approval:     pseal

Refresh alias references so new aliases are active

[root]# newaliases

Under /var/www/cgi-bin/.wilma copy .cf, .rc files similar and replace 'nodes' references using perl search and replace again

-rw-r--r--    1 root     root         3954 Jul  8 17:16 hypoxia.cf
-rw-r--r--    1 root     root         2862 Jul  8 17:16 hypoxia.rc

[root]# perl -pi -e 's/nodes/hypoxia/g' hypox*

Add archive directory folder at /var/www/html/misc/pub_listserv_archives/hypoxia

From /var/www/cgi-bin run

[root]# perl wilma_reindex hypoxia

Should be able to send a test message to the hypoxia listserv and be notified of the post as well as finding it in the listserv archives at http://caro-coops.org/cgi-bin/wilma/hypoxia

Making a listserv private

If the listserv policy is to be private:

  • change the subscribe policy to 'closed' in the .config file for the listserv. For example, change the setting in /usr2/home/majordomo/lists/hypoxia.config)

subscribe_policy    =   closed

  • change the policy references in the .config file where set to 'open' to 'list'

  • remove the references for the archive listings .cf and .rc files under /var/www/cgi-bin/.wilma

Cron schedule for removing old files

substitute cron schedule, find expression and number of minutes as needed

# Remove all NDBC data files older than 24 hours (1440 minutes)
1 4 * * * find /usr2/prod/buoys/noaa/perl/data -maxdepth 1 -amin +1440 -exec rm -f {} \;

Server/disk failure notes

Remember to watch nautilus and other systems for new/dead processes and locks which get hung when nemo goes down. Check system loads with 'top' command.

ok to hot swap bay drive if amber

##begin
#don't need to perform the following steps, run danger of accidentally reconfiguring system
#try just regular reboot first to see what happens
#normal reboot takes around 10-20 minutes, if hard drives are completely rebuilt can take several hours

ctrl-a during bootup to get to scsi management
disk utilities - verify say 5% on each disk
look back at containers to see if any members missing, system should be scrubbing(rebuilding)

Feb 29, 2008 (2 amber lights, bay 1,4 - /usr2 (410 GB) is dead)

ctrl-a during bootup to get to scsi management
goto 'Manage Containers' select 'dead' container Ctrl-R to restore (will get ominous message about losing data, but will start scrubbing restoring once chosen and missing partitions should re-appear.) May take several hours to restore.

also note that lost keyboard response for some reason, unplugging/replugging KVM switch fixed

##end

server stats (checking conditions prior to failure) are located on nemo under
/usr2/home/jcothran/public_html/stats

A subdirectory called 'hung' has files/stats relating to earlier hangs.

links of interest

google search terms:
dell poweredge disk fail
dell poweredge amber rebuild

remote directory mounting - possible source of crashes?
http://www.ma.utexas.edu/users/stirling/computergeek/lufs.html

http://mss.net/Links/DellDocs/PowerEdge2650/InstallTroublesh/5g375c50.htm

Below link very similar to my support experience, once you get to an OS prompt(not a hardware issue) you're on your own
http://lists.us.dell.com/pipermail/linux-poweredge/2004-April/013950.html

July 14, 2005
Nemo is hung. No ssh and head monitor at server is blank.

Bay #5 was flashing amber indicating possible problem with drive, bay or controller. Hot swapped amber bay with same type spare drive. Waited 20 minutes to see if system would restore console, but monitor still blank. Power off/on and system came back up with a 'rebuilding' status scrolling by when referencing the partitions. System seems to be working fine.

Restoring remount mounts after reboot

In the event that all power is lost and restored to the facilities or some other scenario where all the servers are powered off/on, make sure that the remount mounts to the other servers and samba mount point(external hard drives) are restored.

After all the servers and workstations are restored(and remote drives are available), as root do

mount -a

to reload the mounts listed under /etc/fstab for each server.

For samba remote mounts used on nautilus, nemo and trident, there is a different step used - locate the file rc.local, should be under /etc/rc.local

As root, copy and run the samba commands found in the rc.local file like below

/usr/sbin/smbmount //blazer.asg.sc.edu/blazer_backup /blazer_backup -o username=backup,password=b@ckup,workgroup=BLAZER
/usr/sbin/smbmount //accord.asg.sc.edu/accord_backup /accord_backup -o username=backup,password=b@ckup,workgroup=ACCORD

which should run the necessary command in this file to restore the samba mount.

Important Note:Be sure to check for and remove files(which will consume available server space and bring the server to a halt if left unmounted for too long) which may have incorrectly been copied to the placeholder samba directory like /blazer_backup or /accord_backup while the remote mounts were unavailable.

-- JeremyCothran - 08 Jul 2005
to top


You are here: Main > JeremyCothranNotesServer

to top

Copyright © 1999-2008 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Carocoops? Send feedback