`
fanrey
  • 浏览: 252196 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

8000 nodes continously disconnect in large network environment

    博客分类:
  • JAVA
 
阅读更多
根本原因是socket资源不足,导致连接失败, 下面是对exception的分析:
1. java.net.BindException: Address already in use.
It may be caused by the port resources are not released quickly after we call socket.close(), but be in TIME_WAIT state for some time. Below is the result.

380g7x09:/opt # netstat -na|grep TIME_WAIT|wc -l
28851
380g7x09:/opt # netstat -na|grep TIME_WAIT|grep 8080|wc -l
22326

The solution is maybe to set following parameter to 1. Or do we have other solutions?

380g7x09:/opt # sysctl -a|grep net.ipv4.tcp_tw
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 0

2. java.net.SocketException: Too many open files

Could we enlarge the "open files" ulimit? Currently it is 1024. FYI.

380g7x09:/opt # ulimit -a
core file size          (blocks, -c) 1
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127424
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) 13873868
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 127424
virtual memory          (kbytes, -v) 26479520
file locks                      (-x) unlimited

/etc/security/limits.conf
例如,把用户nmcman的limit设成8192:
nmcman soft nofile 8192

从代码角度,可以设置SO_REUSEADDR这个Socket Option:
socket.setReuseAddress(true);

使用lsof可以找出所有打开的FD, 发现很多Socket显示"can't identify protocol
"。 这个通常是由于socket用完后没有close导致的。Code改动如下:
finally{
  if(socket != null)
try {
socket.close();
} catch (IOException e) {
}
}

Wed Jun 20 17:02:03 CEST 2012
->4205<-
=== begin lsof -p 22160 ===
COMMAND   PID   USER   FD      TYPE             DEVICE SIZE/OFF      NODE NAME
。。。

java    22160 nmcman  712u     sock                0,7      0t0 882473171 can't identify protocol
java    22160 nmcman  713u     sock                0,7      0t0 882472188 can't identify protocol
java    22160 nmcman  714u     sock                0,7      0t0 882467758 can't identify protocol
java    22160 nmcman  715u     sock                0,7      0t0 882472189 can't identify protocol
java    22160 nmcman  716u     sock                0,7      0t0 882470568 can't identify protocol
java    22160 nmcman  717u     sock                0,7      0t0 882473172 can't identify protocol
java    22160 nmcman  718u     sock                0,7      0t0 882468704 can't identify protocol
java    22160 nmcman  719u     sock                0,7      0t0 882470569 can't identify protocol
java    22160 nmcman  720u     sock                0,7      0t0 882467759 can't identify protocol
java    22160 nmcman  721u     sock                0,7      0t0 882472190 can't identify protocol
java    22160 nmcman  722u     sock                0,7      0t0 882474125 can't identify protocol
java    22160 nmcman  723u     sock                0,7      0t0 882473173 can't identify protocol
java    22160 nmcman  724u     sock                0,7      0t0 882468705 can't identify protocol
java    22160 nmcman  725u     sock                0,7      0t0 882469535 can't identify protocol
java    22160 nmcman  726u     sock                0,7      0t0 882472191 can't identify protocol
java    22160 nmcman  727u     sock                0,7      0t0 882473174 can't identify protocol
java    22160 nmcman  728u     sock                0,7      0t0 882470570 can't identify protocol
java    22160 nmcman  729u     sock                0,7      0t0 882474126 can't identify protocol
java    22160 nmcman  730u     sock                0,7      0t0 882469536 can't identify protocol
java    22160 nmcman  731u     sock                0,7      0t0 882468706 can't identify protocol
java    22160 nmcman  732u     sock                0,7      0t0 882467760 can't identify protocol
java    22160 nmcman  733u     sock                0,7      0t0 882473175 can't identify protocol
java    22160 nmcman  734u     sock                0,7      0t0 882472192 can't identify protocol
java    22160 nmcman  735u     sock                0,7      0t0 882469537 can't identify protocol
java    22160 nmcman  736u     sock                0,7      0t0 882474127 can't identify protocol
java    22160 nmcman  737u     sock                0,7      0t0 882470571 can't identify protocol
java    22160 nmcman  738u     sock                0,7      0t0 882473176 can't identify protocol
java    22160 nmcman  739u     sock                0,7      0t0 882467761 can't identify protocol
java    22160 nmcman  740u     sock                0,7      0t0 882468707 can't identify protocol
java    22160 nmcman  741u     sock                0,7      0t0 882472193 can't identify protocol
java    22160 nmcman  742u     sock                0,7      0t0 882469538 can't identify protocol
java    22160 nmcman  743u     sock                0,7      0t0 882470572 can't identify protocol
java    22160 nmcman  744u     sock                0,7      0t0 882474128 can't identify protocol
java    22160 nmcman  745u     sock                0,7      0t0 882472194 can't identify protocol
java    22160 nmcman  746u     sock                0,7      0t0 882467762 can't identify protocol
java    22160 nmcman  747u     sock                0,7      0t0 882468708 can't identify protocol
java    22160 nmcman  748u     sock                0,7      0t0 882470573 can't identify protocol
java    22160 nmcman  749u     sock                0,7      0t0 882467763 can't identify protocol
java    22160 nmcman  750u     sock                0,7      0t0 882469539 can't identify protocol
java    22160 nmcman  751u     sock                0,7      0t0 882468709 can't identify protocol
java    22160 nmcman  752u     sock                0,7      0t0 882472195 can't identify protocol
java    22160 nmcman  753u     sock                0,7      0t0 882474129 can't identify protocol
java    22160 nmcman  754u     sock                0,7      0t0 882470574 can't identify protocol
java    22160 nmcman  755u     sock                0,7      0t0 882467764 can't identify protocol
java    22160 nmcman  756u     sock                0,7      0t0 882469540 can't identify protocol
java    22160 nmcman  757u     sock                0,7      0t0 882474130 can't identify protocol
java    22160 nmcman  758u     sock                0,7      0t0 882470575 can't identify protocol
java    22160 nmcman  759u     sock                0,7      0t0 882467765 can't identify protocol
java    22160 nmcman  760u     sock                0,7      0t0 882469541 can't identify protocol
java    22160 nmcman  761u     sock                0,7      0t0 882474131 can't identify protocol
java    22160 nmcman  762u     sock                0,7      0t0 882469451 can't identify protocol
java    22160 nmcman  763u     sock                0,7      0t0 882470576 can't identify protocol
java    22160 nmcman  764u     sock                0,7      0t0 882469542 can't identify protocol
java    22160 nmcman  765u     sock                0,7      0t0 882474132 can't identify protocol
java    22160 nmcman  766u     sock                0,7      0t0 882470577 can't identify protocol
java    22160 nmcman  767u     sock                0,7      0t0 882469543 can't identify protocol
java    22160 nmcman  768u     sock                0,7      0t0 882468710 can't identify protocol
java    22160 nmcman  769u     sock                0,7      0t0 882467766 can't identify protocol
java    22160 nmcman  770u     sock                0,7      0t0 882474133 can't identify protocol
java    22160 nmcman  771u     sock                0,7      0t0 882469544 can't identify protocol
java    22160 nmcman  772u     sock                0,7      0t0 882468711 can't identify protocol
java    22160 nmcman  773u     sock                0,7      0t0 882467767 can't identify protocol
java    22160 nmcman  774u     sock                0,7      0t0 882474134 can't identify protocol
java    22160 nmcman  775u     sock                0,7      0t0 882469545 can't identify protocol
java    22160 nmcman  776u     sock                0,7      0t0 882472196 can't identify protocol
java    22160 nmcman  777u     sock                0,7      0t0 882468712 can't identify protocol
java    22160 nmcman  778u     sock                0,7      0t0 882470578 can't identify protocol
java    22160 nmcman  779u     sock                0,7      0t0 882467768 can't identify protocol
java    22160 nmcman  780u     sock                0,7      0t0 882469546 can't identify protocol
java    22160 nmcman  781u     sock                0,7      0t0 882472197 can't identify protocol
java    22160 nmcman  782u     sock                0,7      0t0 882474135 can't identify protocol
java    22160 nmcman  783u     sock                0,7      0t0 882468713 can't identify protocol
java    22160 nmcman  784u     sock                0,7      0t0 882470579 can't identify protocol
java    22160 nmcman  785u     sock                0,7      0t0 882467769 can't identify protocol
java    22160 nmcman  786u     sock                0,7      0t0 882472198 can't identify protocol
java    22160 nmcman  787u     sock                0,7      0t0 882474136 can't identify protocol
java    22160 nmcman  788u     sock                0,7      0t0 882469547 can't identify protocol
java    22160 nmcman  789u     sock                0,7      0t0 882470580 can't identify protocol
java    22160 nmcman  790u     sock                0,7      0t0 882467770 can't identify protocol
java    22160 nmcman  791u     sock                0,7      0t0 882467771 can't identify protocol
java    22160 nmcman  792u     sock                0,7      0t0 882472199 can't identify protocol
java    22160 nmcman  793u     sock                0,7      0t0 882474137 can't identify protocol
java    22160 nmcman  794u     sock                0,7      0t0 882469548 can't identify protocol
java    22160 nmcman  795u     sock                0,7      0t0 882470581 can't identify protocol
java    22160 nmcman  796u     sock                0,7      0t0 882474138 can't identify protocol
java    22160 nmcman  797u     sock                0,7      0t0 882469549 can't identify protocol
java    22160 nmcman  798u     sock                0,7      0t0 882472200 can't identify protocol
java    22160 nmcman  799u     sock                0,7      0t0 882467772 can't identify protocol
java    22160 nmcman  800u     sock                0,7      0t0 882469550 can't identify protocol
java    22160 nmcman  801u     sock                0,7      0t0 882472201 can't identify protocol
java    22160 nmcman  802u     sock                0,7      0t0 882474139 can't identify protocol
java    22160 nmcman  803u     sock                0,7      0t0 882467773 can't identify protocol
java    22160 nmcman  804u     sock                0,7      0t0 882469551 can't identify protocol
java    22160 nmcman  805u     sock                0,7      0t0 882468715 can't identify protocol
java    22160 nmcman  806u     sock                0,7      0t0 882474140 can't identify protocol
java    22160 nmcman  807u     sock                0,7      0t0 882472202 can't identify protocol
java    22160 nmcman  808u     sock                0,7      0t0 882469552 can't identify protocol
java    22160 nmcman  809u     sock                0,7      0t0 882468716 can't identify protocol
java    22160 nmcman  810u     sock                0,7      0t0 882474141 can't identify protocol
java    22160 nmcman  811u     sock                0,7      0t0 882467774 can't identify protocol
java    22160 nmcman  812u     sock                0,7      0t0 882472203 can't identify protocol

用strace -p <pid>能查看系统调用, 下面看到的应该是某些socket没有close完成。
[pid 22910] close(1022 <unfinished ...>
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics