对于大规模的企业级应用来讲,没有集群是不太现实的,考虑到可扩展性和高可用性,
在通常的生产环境中,都会应用到中间件集群这种技术。
本文主要讲述了10gas相关的集群配置,一般的管理方面的内容。
10gas的集群分为两大类
1.基于管理的集群
基于管理的集群主要通过 Repository 来记录 整个集群得相关配置信息,比如集群发布的程序阿等等。
这个方式下,管理员需要手工操作的东西比较少。比如发布一个程序,只要对整个集群做一次发布即可。
2..手工管理的集群方式
在这种方式下,除了最基本的session复制以及ejb cluster 外,10gas不提供更多的管理手段,
如果需要发布程序,你需要对参与集群的所有instance做发布。你可以想象再你有10几台服务器的情况下,如果每次发布程序,都需要一台台去发布,不但工作量大,而且很容易搞错。
所以建议采用基于管理的方式来做集群。
对于创建人工管理的集群,请参考我以前的贴子
http://www.itpub.net/248375.html
基于管理的集群主要用到需要 Repository 来存储相关的配置信息。
由于 Repository 的重要性,这里花点时间讲一下 Repository得相关信息。
Repository 有两种存储方式,databased-Repository 和 filebased-Repository.
其中 filebased-Repository 是 10gas才新心出来的功能。主要是解决以前只能创建
databased-Repository cluster 的问题,而 databased-Repository 需要安装 Infrastructure 。
这个Infrastructure 是大家颇为头疼的东西,不太庞大,而且极容易出问题。所以才有了filebased-Repository。
存放在 Repository 的信息主要有下面几种
1.Product metadata
2.Management metadata
3.Idendity Management metadata (databased-Repository 才有)
对于集群主要需要用到的信息是 1.Product metadata 和 2.Management metadata。
还记得我10gas 系列文章的第一篇吗
http://www.itpub.net/250581.html(oracle 10gas安装攻略) 里面的安装方式
J2EE andWeb Cache installation type ,这个安装方式缺省就是采用 filebased-Repository .当然也可以
迁移到 databased-Repository。
所以本文主要讲如何创建 filebased-Repository 的cluster。对于基于 databased-Repository 的方式的集群,以后再深入讲。
先讲一下创建集群的要求
1.所有的 application server instance 必须再同一个farm ,也就是要用同一个 Repository来存放管理信息。
2.所有的参与集群的 application server instance 都必须安装同样的os。
3.每个参与集群的 application server instance 只能由一个ohs server.
4.每一个 application server instance 可以有多个 oc4j instance. 每个oc4j instance可以有多个 oc4j process.
5.当然要求所有的application server instance 版本一样了。
关于如何安装,请参考
http://www.itpub.net/250581.html(oracle 10gas安装攻略) ,我就直接从如何创建集群开始讲开。
涉及的 application server instance 有两个
ip 都是 10.1.18.1 ,在同一 物理server 用两个用户安装两个 application server instance ,
每个 application server instance 用不同的端口。
操作系统是 redhat 3.0 update 2.
用户 ias10g 启用 ohs ,端口 7778 ,用户 ias10g2只启用oc4j.
其中 ias10g用户 作为 filebased-Repository host.
1.测试该 application server instance 是否已经属于某个 farm .
检查 instance 1
su - ias10g
[ias10g@finproduction home]$ dcmctl whichFarm
Standalone instance
显示 该 instance还没有加入 farm.
如果显示已经加入farm,请执行下面的命令离开farm
dcmctl leaveFarm
检查 instance 2
su - ias10g2
[ias10g2@finproduction ias10g2]$ dcmctl whichFarm
Standalone instance
[ias10g2@finproduction ias10g2]$
2.初始化 Repository
su - ias10g
获得当前的 id 号码
[ias10g@finproduction home]$ dcmctl getRepositoryid
finproduction.tplife.com:7101
[ias10g@finproduction home]$
初始化farm
dcmctl joinFarm -r finproduction.tplife.com:7101
其中 finproduction.tplife.com:7101 就���前面通过 getRepositoryid 获得信息。
现在在来看看 是否已经加入 farm
[ias10g@finproduction home]$ dcmctl whichFarm
Farm Name: .tpdata.ias10g.OraHome1.dcm.repository
Host Instance: iastest.finproduction.tplife.com
Host Name: finproduction.tplife.com
Repository Type: Distributed File Based (host)
SSL In Use: false
可以看到 已经加入 File Based farm .而且这个 instance是 host.
3.加入 instance2
su - ias10g2
dcmctl joinFarm -r finproduction.tplife.com:7101
加入成功,看看 现在的 farm 信息。
[ias10g2@finproduction ias10g2]$ dcmctl whichFarm
Farm Name: .tpdata.ias10g.OraHome1.dcm.repository
Host Instance: iastest.finproduction.tplife.com
Host Name: finproduction.tplife.com
Repository Type: Distributed File Based
SSL In Use: false
[ias10g2@finproduction ias10g2]$
可以看到,已经加入成功 file-based Repository .
.创建集群
创建集群有两种方式,
一:通过 em 管理界面创建
二:通过dcmctl手工创建
由于通过 em 创建比较简单,且创建集群属于比较高阶的内容,我这里主要讲如何手工创建。
su - ias10g
看看是否已经创建集群
dcmctl listclusters
没有输出,说明还没有创建。
创建一个集群
[ias10g@finproduction home]$ dcmctl createcluster -cl mycluster
1 mycluster
-cl 指定 集群的名字,可以随便取。
[ias10g@finproduction home]$ dcmctl listclusters
1 mycluster
可以看到已经创建一个cluster。
先把 instance1 加入集群
[ias10g@finproduction home]$ dcmctl joincluster -cl mycluster
1 iastest.finproduction.tplife.com
接着把 instance2加入集群
su - ias10g2
[ias10g2@finproduction ias10g2]$ dcmctl joincluster -cl mycluster
1 iastest2.finproduction.tplife.com
2 iastest.finproduction.tplife.com
可以看到已经再集群 mycluster 中已经有两个 instance 了。
进入em 管理界面看看,现在可以看到已经可以在一个farm里面管理所有的instance了。
接下来设置一些 session复制的信息,
点击进入 iastest.finproduction.tplife.com 管理界面,然后选择 home,进入 oc4j的管理界面,
接着选择Administration ,选择Replication Properties ,进入配置 复制 页面。
由于multicast的地址只能在 224.0.0.0 到239.255.255.255之间 。请注意填写
Replicate session state 得 Multicast Host (IP)
和 EJB Applications 得Multicast Host (IP)
由于这些设置是对整个集群都生效的,所以不需要单独对每个instance做设置。
RMI Server Host 就填写 instance1的ip就好了。
到此为止,集群已经配置完成,接下来讲如何发布基于基于集群的程序。
注意事项
1.由于是 file-based repository ,考虑到性能没有 data-based repository的性能好。再这种方式下,最好不要创建多于六个node的集群。
2.如果你的应用涉及到大量得session操作,
需要耗用很多资源进行session复制,那么需要对instance进行分组,减少session复制带来的资源消耗。
你好,在配置IAS Cluster过程中,当进行joinfarm的时候出现了错误。
在两台物理server中分别安装了IAS,相同的操作系统rhel3,相同的用户,相同的安装路径。同样j2ee and webcache类型,构建file-based群集。
server1:192.168.18.51 test1.myias.com server2:192.168.18.52 test2.myias.com
完成安装后,进行群集的配置,test1作为Repository host
[cluster@test1 cluster]$dcmctl getrepositoryid
test1.myias.com:7101
[cluster@test1 cluster]$dcmctl joinfarm -r test1.myias.com:7101
//操作是成功的
然后登陆test2服务器
[cluster@test21 cluster]$dcmctl joinfarm -r test1.myias.com:7101
出现错误,错误消息如下:
dcmctl joinfarm -r hutu.bannerline.com:7101
//----------------------------------------------
ADMN-202500
The exception 202037, has occurred in the cache layer of the persistence manager
"The repository was not found.
Resolution: The most likely cause is the DCM daemon at the repository host is not running. If the daemon is running verify the correct cache repository id is being used. Run dcmctl getRepositoryId at the current instance and at the repository instance. Both ids should be the same. Also make sure that the current instance and the repository instance have the same SSL mode in $ORACLE_HOME/dcm/config/dcmCache.xml. All the instances should have consistent SSL mode.".
Please, refer to the base exception for the details.
oracle.ias.sysmgmt.exception.CachePersistenceException: The exception 202037, has occurred in the cache layer of the persistence manager
"The repository was not found.
Resolution: The most likely cause is the DCM daemon at the repository host is not running. If the daemon is running verify the correct cache repository id is being used. Run dcmctl getRepositoryId at the current instance and at the repository instance. Both ids should be the same. Also make sure that the current instance and the repository instance have the same SSL mode in $ORACLE_HOME/dcm/config/dcmCache.xml. All the instances should have consistent SSL mode.".
Resolution: Please, refer to the base exception for the details.
at oracle.ias.sysmgmt.persistence.cache.CacheTopology.getCurrentCacheInventory(Unknown Source)
at oracle.ias.sysmgmt.persistence.cache.CacheTopology.getCurrentCacheInventory(Unknown Source)
at oracle.ias.sysmgmt.persistence.cache.CacheTopology.getCacheInventory(Unknown Source)
at oracle.ias.sysmgmt.persistence.cache.CacheTopology.farms(Unknown Source)
at oracle.ias.sysmgmt.persistence.cache.CacheTopology.instances(Unknown Source)
at oracle.ias.sysmgmt.persistence.PersistenceManager.getPersistenceInstance(Unknown Source)
at oracle.ias.sysmgmt.persistence.PersistenceManager.getPersistenceInstance(Unknown Source)
at oracle.ias.sysmgmt.persistence.PersistenceManager.getPersistenceInstance(Unknown Source)
at oracle.ias.sysmgmt.persistence.PersistenceManager.create(Unknown Source)
at oracle.ias.sysmgmt.persistence.PersistenceManager.<init>(Unknown Source)
at oracle.ias.sysmgmt.task.FarmManager.joinFarm(Unknown Source)
at oracle.ias.sysmgmt.clustermanagement.IASInstanceImpl.joinFarm(Unknown Source)
at oracle.ias.sysmgmt.cmdline.DcmCmdLine.joinFarm(Unknown Source)
at oracle.ias.sysmgmt.cmdline.DcmCmdLine.execute(Unknown Source)
at oracle.ias.sysmgmt.cmdline.DcmCmdLine.main(Unknown Source)
Local Stack:
oracle.ias.sysmgmt.exception.PersistenceException: The repository was not found.
Resolution: The most likely cause is the DCM daemon at the repository host is not running. If the daemon is running verify the correct cache repository id is being used. Run dcmctl getRepositoryId at the current instance and at the repository instance. Both ids should be the same. Also make sure that the current instance and the repository instance have the same SSL mode in $ORACLE_HOME/dcm/config/dcmCache.xml. All the instances should have consistent SSL mode.
at oracle.ias.sysmgmt.persistence.cache.InventoryLoader.load(Unknown Source)
at oracle.ias.cache.CacheHandle.findObject(Unknown Source)
at oracle.ias.cache.CacheHandle.locateObject(Unknown Source)
at oracle.ias.cache.CacheAccess.get(Unknown Source)
at oracle.ias.sysmgmt.persistence.cache.CacheTopology.getCurrentCacheInventory(Unknown Source)
at oracle.ias.sysmgmt.persistence.cache.CacheTopology.getCurrentCacheInventory(Unknown Source)
at oracle.ias.sysmgmt.persistence.cache.CacheTopology.getCacheInventory(Unknown Source)
at oracle.ias.sysmgmt.persistence.cache.CacheTopology.farms(Unknown Source)
at oracle.ias.sysmgmt.persistence.cache.CacheTopology.instances(Unknown Source)
at oracle.ias.sysmgmt.persistence.PersistenceManager.getPersistenceInstance(Unknown Source)
at oracle.ias.sysmgmt.persistence.PersistenceManager.getPersistenceInstance(Unknown Source)
at oracle.ias.sysmgmt.persistence.PersistenceManager.getPersistenceInstance(Unknown Source)
at oracle.ias.sysmgmt.persistence.PersistenceManager.create(Unknown Source)
at oracle.ias.sysmgmt.persistence.PersistenceManager.<init>(Unknown Source)
at oracle.ias.sysmgmt.task.FarmManager.joinFarm(Unknown Source)
at oracle.ias.sysmgmt.clustermanagement.IASInstanceImpl.joinFarm(Unknown Source)
at oracle.ias.sysmgmt.cmdline.DcmCmdLine.joinFarm(Unknown Source)
at oracle.ias.sysmgmt.cmdline.DcmCmdLine.execute(Unknown Source)
at oracle.ias.sysmgmt.cmdline.DcmCmdLine.main(Unknown Source)
//-------------------------------------------------
可是通过哦opmnct stats命令查看,两个ias instance中的dcm-daemon均为alive,尝试过好几次,均出现改问题。不知道是什么问题!!!
感谢你的帮助
检查一下 /etc/hosts文件
把 test1.myias.com 的解析加上
我刚才在双机上测试过了,没有问题得
node2
[ias10g@findev Disk1]$ dcmctl whichFarm
Farm Name: .tpdata.ias10g.OraHome1.dcm.repository
Host Instance: iastest.finproduction.tplife.com
Host Name: finproduction.tplife.com
Repository Type: Distributed File Based
SSL In Use: false
node1
[ias10g@finproduction home]$ dcmctl whichfarm
Farm Name: .tpdata.ias10g.OraHome1.dcm.repository
Host Instance: iastest.finproduction.tplife.com
Host Name: finproduction.tplife.com
Repository Type: Distributed File Based (host)
SSL In Use: false
我刚才在双机上测试过了,没有问题得
node2
[ias10g@findev Disk1]$ dcmctl whichFarm
Farm Name: .tpdata.ias10g.OraHome1.dcm.repository
Host Instance: iastest.finproduction.tplife.com
Host Name: finproduction.tplife.com
Repository Type: Distributed File Based
SSL In Use: false
node1
[ias10g@finproduction home]$ dcmctl whichfarm
Farm Name: .tpdata.ias10g.OraHome1.dcm.repository
Host Instance: iastest.finproduction.tplife.com
Host Name: finproduction.tplife.com
Repository Type: Distributed File Based (host)
SSL In Use: false
cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
10.1.18.1 finproduction.tplife.com finproduction
10.1.18.2 findev.tplife.com findev
erver 1
cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
10.1.18.1 finproduction.tplife.com finproduction
10.1.18.2 findev.tplife.com findev
server 2
[root@findev root]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
10.1.3.10 pard_standby.tplife.com pard_standby
10.1.18.2 findev.tplife.com findev
10.1.18.1 finproduction.tplife.com