mlogger, lazylogger, mhttpd and analyzer
On other pages there might be instances of odbedit, which can be stopped with "quit".
The two frontends down at the area are also stopped with "!", or with Ctrl-C if the this doesn't work. If they are frozen completely (and only then!), thy need a "hard reset" via the main PC power switch. To restart them, doulbe-click on their frontend icon after rebooting. You have to switch the screen between the two computers at the PC switch box.
Once all programs have been stopped, start a new odbedit and enter "cleanup" to remove hanging clients from the ODB. Then all programs can be restarted.
However, in some cases, when you subsequently go into odbedit and check for surviving clients by typing ``scl'', you find a loose client ``hanging on''.
As stated above, this is normally remedied by typing ``cleanup'' inside odbedit. However, there are times when the ``cleanup'' command does not work and, instead, hangs up indefinitely. According to SR this appears to be caused by a confused system of communication with the various subprocesses.
More often than not, in my experience (DP), this condition did not require a full recreation of the ODB. I recommend ignoring the hanging client and trying to restart all of the processes in an orderly way. Usually, this procedure would take care of the superfluous ``hanging'' client.
[local]> save tmp.odb
This creates an ASCII version of the ODB wich can later be used to recreate the ODB. If this does not work, you can later load a recent ODB file from one of the last runs at /data/runxxx.odb. Then delete the ODB at the UNIX prompt with:
$ cd ~/online $ rm .ODB.SHM
This deletes the disk backup of the database. Now check if the shared memory still exists with:
$ ipcs ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x4e4c4e4f 1664 pibeta 666 20000000 1 0x4d013763 1537 pibeta 666 2027708 8 0x4d01376a 1538 pibeta 666 109532 8 0x4d01376b 1539 pibeta 666 1058108 4 ------ Semaphore Arrays -------- key semid owner perms nsems status 0x4d013684 1280 pibeta 666 1 0x4d013760 1281 pibeta 666 1 0x4d013763 1282 pibeta 666 1 0x4d01376a 1283 pibeta 666 1 0x4d01376b 1284 pibeta 666 1 0x4d013862 1285 pibeta 666 1 ------ Message Queues -------- key msqid owner perms used-bytes messagesThe Semaphore Arrays and Message Queues are not relevant for this discussion. To understand which entry refers to what segment of the shared memory, it is best to go to
/home/pibeta/onlineand list the .*.SHM files:
[pibeta@pc2106 ~/online]$ ls -l .*SHM -rw-r--r-- 1 pibeta users 0 May 3 21:04 .ALARM.SHM -rw-r--r-- 1 pibeta users 0 May 3 21:04 .ELOG.SHM -rw-r--r-- 1 pibeta users 527708 May 10 17:22 .Hl.SHM -rw-r--r-- 1 pibeta users 0 May 4 21:46 .LAZY.SHM -rw-r--r-- 1 pibeta users 2027708 Jun 15 11:28 .ODB.SHM -rw-r--r-- 1 pibeta users 109532 Jun 13 12:45 .SYSMSG.SHM -rw-r--r-- 1 pibeta users 1058108 Jun 13 12:45 .SYSTEM.SHM -rw-r--r-- 1 pibeta users 27708 May 22 13:01 .?2.SHMIn the above example it is clear that the segment with size 2027708 corresponds to the ODB, the one with 109532 bytes to the system message buffer, and the one with 1058108 to the system event buffer. Not appearing above is the PAWC shared memory segment, which is the first one listed by "ipcs" with size of 20 million bytes. Thus, the shared memory region with 2027708 belongs to the ODB and must be deleted with
$ ipcrm shm [id]where [id] is 1537 in the above case. You might have to login as root to do that (type su - and then same pw as pibeta). After the memory has disappeared (check with ipcs), the ODB can be recreated with:
$ odbedit -s 2000000 [local]/>load tmp.odb (or /data/runxxx.odb)
[Note: if one does not intend to run the DSC in the RAW mode, it is enough to start $ odbedit -s 1000000 . However, for normal running with the DSC we want to have this option available and always start ODB with the size of 2 million.]
After that, all other programs can be restarted as described in the previous section. Once all programs are running (no red sections in the Web status page) a new run can be started
If the PC still has to be rebooted (maybe to SCSI problems or power failure), it rewinds the tape. In order not to overwrite the tape with the next run, it has to be spooled to the end of data with
$ mt /dev/nst0 seod (upper drive) $ mt /dev/nst1 seod (lower drive)
which can take some while. The next run will then be appended at the end of the previous data.
-----
A logbook entry about a reboot and recover process you may find in the Elog on date August, 17/18 2000.
Whenever you will have to reboot the Backend PC. Be careful in which order you startup the clients. You should always start to wind the tape until the end of the data. Whatever happens then, you cannot overwrite your data. in the second step start odbedit and load your odb-file. If this is done you can start your mhttpd to get all the other clients running.
The Frontends you should start from their machines to make sure, that they startup properly without error messages.
$ cd ~/online $ setenv DISPLAY :2 $ webpaw -D -p 8080Notes:
(1) If you don't start webpaw in the `online' directory, it won't find the PAW macros.
(2) After each rebooting of `pc2106' you have to start the VNCserver before you can start WebPAW:
$ vncserver
rsh pc812 ps
, find out the PID of trigger frontendrsh pc812 'kill PID'
, PID is obtained from above command.rsh pc812 frontend.exe
vncviewer pc812:0
vncviewer pc809:0
S. Ritt, August 13, 1999.