Siebel Administration >  Analyze Siebel component crashes

Crashes in Siebel and their summary

What is a crash - When a component in Siebel crashes, the component stops working, all the tasks are stopped/killed.
When this happens to an object manager, the users are kicked out.

In UNIX, a core file will be generated. In the Siebel enterprise, we should see a kill message
e.g. Process exited because of a segment violation (SIGSEGV)

I have seen messages with SIGSEGV, SIGBUS,SIGABRT and SIGUSR in Enterprise logs.
By default, there should also be a .FDR file, a callstack file and sometimes a crash.txt file.
I have not seen any crash.txt files being generated in Solaris in Siebel 8x.

Crash analysis is quite different in Windows and Solaris environments. In this article I will describe
crash analysis in Solaris.

The analysis: The first things that we need to do is to generate files and get the thread id.

We should go to the server where the crash happened and in the siebsrvr/bin folder you should find a .fdr file and a callstack file
Note that if environment variable SIEBEL_CRASH_HANDLER is set to 0, the siebel crash handler can be disabled so that fdr/callstack are never generated.
If this variable is not set or set to 1, we should normally generate FDR files.


The fdr file will have a last name which will identify the crashing process
e.g. T2762342080_P014468.fdr
It means process ID 014468 crashed.

Now we should convert the fdr file to a more readable csv file. Go up one folder , to the siebsrvr folder
and load the siebenv.sh file(i.e. . ./siebenv.sh)
Run this command
sarmanalyzer -o <output_csv_file> -x -f <fdr_file>

e.g. sarmanalyzer -o T2762342080_P014468.csv -x -f T2762342080_P014468.fdr
and you should get a file called T2762342080_P014468.csv

Open the .csv file, use the Data menu item and select Filter > Auto Filter sub menu items.

We now need to see the entries related to only the crashing thread, filter the SubAreaDesc column by the value ** CRASHING THREAD **.

Select the ThreadID column and filter on the value (in this example, the value is 4068) that appears there for the record.


Now unset the filter on the SubAreaDesc column. This should show you all the records in the crashing thread.
After this, sort the records using the FdrID column. You should see the actions in sequence till the crashing thread.

We have all the files now.

Try to figure out what might have happened by checking the columns - SubAreaDesc, UserStr1 and UserStr2.
There is no exact way to find out what happened everytime and it takes some guessing. The easiest way would be to
find out the user who caused the crash and then contact the user :). So lets find out the user.

Note down the thread id from the ThreadID column. We can find out who caused the crash in many ways.
The easiest is by running this SQL in the database.

select * from S_SRM_TASK_HIST where srvr_host_name = 'server'
and srvr_status = 'ERROR'
and srvr_thread_id_val= 'threadid'
and srvr_proc_id_val = 'processid';

The column srvr_user_name usually has the user ID.

It is also possible to get the user who caused the crash from the csv file generated.
First get the application name of the Siebel component that crashed.
If the component is eComm_ENU, get the configuration file first
--This gets the configuration file for Siebel parameters
list param ConfigFile for comp eComm_ENU
config file - scomm.cfg

The application name is in scomm.cfg
[Siebel]
RepositoryFile = siebel_sia.srf
ApplicationName = Siebel Power Communications
ApplicationTitle = Siebel Communications
ApplicationSplashText = Siebel Communications

In the csv file search for the application name. If the application name is Siebel Power Communications,
then search for Siebel+Power+Communications.

There should be a row id next to it. This should be the row_id of s_user table
of the user who caused the crash.

 

Ocassionally there will be no user ID. This means that the crash would have been caused by some process.
The most common process is usually starting and stopping the server. For these cases, the thread ID will be 1.

Usually there is a log file associated with the user who caused the crash. This file has been more or less useless for us.
However you can find that out and check for yourself.

Go to the enterprise log folder of the server where the crash happened, and search for the file that has the process and thread id

e.g. grep 'processid' *.log | grep 'threadid'
Or check the result of the sql above, usually the log file name is also there.


The callstackfile is usually full of hexadecimal numbers and c file messages. They will not make much sense if you read them at first.
However, crashes caused by the same reason (e.g. multiple tabs, zombie multiple sessions, dbcursor error) have the same callstack size.
You need to check the exact size, not the size shown in Windows folders since these are rounded up to the nearest Kilobyte.

Also the contents of these files can be compared to guess which processes were called.
an example of a callstack file content will be like

/opt/ibm/ldap/v6.0/lib/libc.so.1:0xc8d08
/opt/ibm/ldap/v6.0/lib/libc.so.1:0xbd3a0
/opt/ibm/ldap/v6.0/lib/libc.so.1:0xbd588
siebel/8.1/siebsrvr/lib/libsslcshar.so:__1cGCCFMap4nKCCFElemStr_nKCCFElemPtr4Cv___GLookup6kMkpkHrpv_b_+0x4 [ Signal 11 (SEGV)]
siebel/8.1/siebsrvr/lib/libsscfdm.so:__1cNCSSTaskRecordGLookup6kMrknISSstring_r1kb_b_+0x34
siebel/8.1/siebsrvr/lib/libsscfdm.so:__1cOCSSTaskLogUtilHExecute6MpnJCSSSqlObj_knISSstring__I_+0x1044
siebel/8.1/siebsrvr/lib/libsscfdm.so:__1cJCSSSqlObjEHome6MpnJCSSRecord__I_+0x114
siebel/8.1/siebsrvr/lib/libsscfom.so:__1cKCSSBusCompHSqlHome6MpnJCSSRecord__I_+0x38
siebel/8.1/siebsrvr/lib/libsscfom.so:__1cKCSSBusCompEHome6M_I_+0x1ac
siebel/8.1/siebsrvr/lib/libsscfomlg.so:__1cQCSSELBusCompImplMExecuteQdDuery6Mii_I_+0xac

Crash.txt files also have hex values similar to the callstack files.

The enterprise log file is useful to get two things - it is a confirmation of the crash. Sometimes callstack, crash and fdr files are not generated.
The enterprise file will always generate a crash message and also tell us the process ID.


Lastly we have a core file. This is a huge file generated in the /var/core folder of Solaris under root. If there is a crash, we will always get a core dump.
This core file can be analysed only in the same server. We need to first change the owner to siebel, move it to a folder where siebel can access it.
We should load the siebenv.sh and cfgenv.sh values and we can run 4 commands on the core file.

pstack corefile
pflags corefile
pmap corefile
pldd corefile

most of the output will look very hard to understand.

e.g. a pstack output could look like this

pstack Corefile

----------------- lwp# 1 --------------------------------
fc7cc4ac ???????? (ffbfe5e0, 1, ffbfe810, 0, 0, 0)
----------------- lwp# 2 --------------------------------
fc7c8c80 ???????? (1, fb00f4a8, fc2fbe48, 0, 6da08, 0)
----------------- lwp# 3 --------------------------------
fc7c8c80 ???????? (0, fcdf4ec8, 0, 0, 6da08, 0)
----------------- lwp# 4 --------------------------------
fc7cc4ac ???????? (2c5e78, 1, 0, 0, 0, 2c5480)
----------------- lwp# 5 --------------------------------
fc7cc4ac ???????? (2d5590, 2, 0, 0, 0, 2d4d88)
----------------- lwp# 6 --------------------------------
fc7cc4ac ???????? (fa6fd620, 1, fa6fd808, 0, 0, 0)
----------------- lwp# 7 --------------------------------
fc7c8c80 ???????? (1, fa5fe2e8, fa5fe198, 0, 6da08, 0)
----------------- lwp# 8 --------------------------------
fc7cbd1c ???????? (7e2fa8, 7e2ff0, 1000, fdae1c8c, fdae1c9c, fe98d4dc)
----------------- lwp# 9 --------------------------------
fc7c8c80 ???????? (1, f86ffb98, f86ffa48, 0, 6da08, 0)
----------------- lwp# 10 --------------------------------
fc7c8c80 ???????? (1, f85ff530, f85ff3e0, 0, 6da08, 0)
----------------- lwp# 11 --------------------------------
fc7c8c80 ???????? (1, f84ffa50, f84ff900, 0, 6da08, 0)
----------------- lwp# 12 --------------------------------
fc7c8c80 ???????? (0, f83ffa60, 0, 0, 6da08, 0)
----------------- lwp# 13 --------------------------------
fc7c8c80 ???????? (1, f437fa58, f437f908, 0, 6da08, 0)
fc7c2c90 ???????? (f437fa48, f437fa58, f437f908, 0, 0, 0)
fc7c30d4 ???????? (f437fa48, f437fa58, f437f908, 0, 0, 0)
fc7c3268 ???????? (f437fa48, f437fa58, f437fa30, 0, 0, 0)
fc7c335c ???????? (f437fa48, f437fa58, f437fa30, 1c00, 0, 6e907b8)
fc7c339c ???????? (f437fa48, f437fa58, f437fa30, 0, 7d0, 10)
feb17178 ???????? (416af30, 3e80, f437fa38, f437fa30, f437fa48, f437fa58)
feb17848 ???????? (416af30, 3e80, 3, 1f0, 17, fdae2370)
0022d8c0 __1cPsmiFacilityThrdDRun6M_i_ (27d944, 326af4, 200, 2, 27d97c, 10) + c0
0022e1c0 SmiCreateFacTask (326af4, 1, 0, ff312b8c, d, 400) + 100
fe7dc4a8 ???????? (0, ff36a420, 22e0c0, 43f0d48, d, 43f0ef8)
fe095730 ???????? (ffbfe9bc, 309588, 0, 1, 0, 17289c)
fccf1730 ???????? (1, 0, 1, 0, fcdecd10, 44dc700)
fc7c8bdc ???????? (0, 0, 0, 0, 0, 0)


A pflags output will show us the thread id which crashed. In case we do not have the fdr file, we can find from here
the thread id. The pflags output will also show which other threads stopped because of the crash (they will be in state suspended)
e.g.

%asi = 0x00000000 %fprs = 0x00000000
/2461: flags = DETACH|STOPPED lwp_park(0x4,0x0,0x0)
why = PR_SUSPENDED
%g0 = 0x00000000 %g1 = 0x0000004D %g2 = 0x00000001 %g3 = 0x00000200
%g4 = 0xFA207A34 %g5 = 0x00000000 %g6 = 0x00000000 %g7 = 0xFC30A200
%o0 = 0x00000004 %o1 = 0x00000000 %o2 = 0x00000000 %o3 = 0x00000000
%o4 = 0xFE2002D0 %o5 = 0x00000001 %sp = 0xDCF7F2D0 %o7 = 0xFC7C2ACC
%l0 = 0xDCF7F4C8 %l1 = 0xFC8303BC %l2 = 0xFC30A200 %l3 = 0x00000001
%l4 = 0x00000001 %l5 = 0x00000000 %l6 = 0x01000000 %l7 = 0x00001CC4
%i0 = 0x00000000 %i1 = 0xDCF7F4D8 %i2 = 0x00000000 %i3 = 0x00000000
%i4 = 0x0006DA08 %i5 = 0x00000000 %fp = 0xDCF7F330 %i7 = 0xFC7C2C90
%ccr = 0xFE501000 %pc = 0xFC7C8C80 %npc = 0xFC7C8C84 %y = 0x00000000
%asi = 0x00000000 %fprs = 0x00000000
/2462: flags = DETACH|STOPPED pollsys(0x4,0x1,0xdcd7d860,0x0)
why = PR_SUSPENDED
%g0 = 0x00000000 %g1 = 0x000000B7 %g2 = 0x00000000 %g3 = 0x00001388
%g4 = 0x00000005 %g5 = 0x0000009B %g6 = 0x00000000 %g7 = 0xE9A58A00
%o0 = 0x00000004 %o1 = 0x00000001 %o2 = 0xDCD7D860 %o3 = 0x00000000
%o4 = 0x12BA3BB0 %o5 = 0x00000000 %sp = 0xDCD7B528 %o7 = 0xFC7BC174
%l0 = 0xE9A58A00 %l1 = 0x00000000 %l2 = 0x00000000 %l3 = 0x00000001
%l4 = 0xFC837504 %l5 = 0x00000000 %l6 = 0x00000001 %l7 = 0xFC8303BC
%i0 = 0xDCD7B5F8 %i1 = 0x00000001 %i2 = 0xDCD7D860 %i3 = 0x00000000
%i4 = 0x00000000 %i5 = 0x00000000 %fp = 0xDCD7B588 %i7 = 0xFC76844C
%ccr = 0xFE501005 %pc = 0xFC7CC4AC %npc = 0xFC7CC4B0 %y = 0x00000000
%asi = 0x00000000 %fprs = 0x00000000
/2463: flags = DETACH
sigmask = 0xfffffeff,0x0000ffff cursig = SIGSEGV
%g0 = 0x00000000 %g1 = 0x0000000B %g2 = 0x00000000 %g3 = 0x00000000
%g4 = 0x1838F714 %g5 = 0xF8471338 %g6 = 0x00000000 %g7 = 0xE9A5BA00
%o0 = 0x16C0A984 %o1 = 0xF9245760 %o2 = 0xF9665C0C %o3 = 0xFFFFFFFF
%o4 = 0x00000000 %o5 = 0x167C42C4 %sp = 0xD7CFFD80 %o7 = 0xF9245784
%l0 = 0xFEBA0448 %l1 = 0xFEB4B9CC %l2 = 0xFE751EA8 %l3 = 0xFEB4DFEC
%l4 = 0xFEB4B9CC %l5 = 0x002A37E0 %l6 = 0x00000000 %l7 = 0x00000000
%i0 = 0x14FC0704 %i1 = 0xF9245760 %i2 = 0xF9665C0C %i3 = 0xFFFFFFFF
%i4 = 0x00000000 %i5 = 0x14FC0704 %fp = 0xD7CFFDE0 %i7 = 0xF9245784
%ccr = 0xFE401007 %pc = 0xF9245760 %npc = 0xF9245764 %y = 0x00000000
%asi = 0x00000000 %fprs = 0x00000000

The SIGSEGV word indicates that this is the thread that crashed. Here the crashing thread ID is 2463. pmaps and pldd outputs were used rarely and can be used to check memory issues.
We will analyze these in another article.

There are several columns in the FDR/CSV file generated. This is a short explannation of the columns

Column

Description

FdrID

The id assigned to a particular FDR entry. Each entry has a different id value.

ThreadID

The Operating System thread id. Each entry is associated with a thread, some entries may have the same or different thread id depending on whether the process is multi-threaded or not, and whether more than one thread is in use at the time of a crash.

AreaSymbol

Categorization for a particular subsystem so all entries can be grouped together.

AreaDesc

Descriptive text of what product area each entry is associated with.

SubAreaSymbol

Similar to the area symbol, used to assign a unique categorization within a particular area for different functionality.

UserInt1, UserInt2

Integer values assigned by internal instrumentation that may store values like internal pointer references; this is normally only useful to Oracle Engineering.

UserStr1, UserStr2

These columns provide contextual information that is germane to understanding the significance of each entry and that may store object names, parameter values, row_ids or other messages that help indicate some context within the area and sub-area.