[ ] When alocating a poll of Conns, check the alignment!
[ ] http://thread.gmane.org/gmane.linux.network/337836 - SO_INCOMING_CPU
[ ] Scenario: raspundem la un request, vedem ca nu mai avem nimic de trimis,
datele sint inca in buffer, apelam close => datele se pierd!
Trebuie sa facem shutdown!
[ ] De facut o schema cu starile prin care trece o conexiune, suspectez
ca atunci cind obuf e 0, nu fac shutdown in loc de close.
[ ] La http/1.1, default e sa nu inchida conexiunea.
[ ] SO_INCOMING_CPU http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2c8c56e15df3d4c2af3d656e44feb18789f75837
[ ] Check the new batch mode of epoll
[ ] Use SO_REUSEPORT for accept():
!!! http://lists.dragonflybsd.org/pipermail/users/2013-July/053632.html
!!! https://github.com/monkey/monkey/commit/d1da249a0b5e8f5765ea8031919fb32e93c57cb8
[ ] Use defer accept!
[ ]
== Devel point ==
[ ] I think that I must switch back to processes. Too much overhead for threads.
And I do not know if I gain something by using threads.
[ ] Now I am working on simple web requests.
Static (/) and dynamic (/cgi?a=1).
[ ] We must send "HTTP/x.x code message" respecting incoming request.
Our API must deal with it.
== Some history ==
2014-04-02: Se pare ca bat gwan-ul. Cam 4300 vs 3700. Dar eu nu fac chiar tot
ce face el. Trec la un API pentru a crea un server web.
Sa vedem cam cum ar trebui sa arate.
C = Conn_alloc();
wp = Conn_wpool_create();
Conn_set_wp(C, wp);
while (1) {
ws = Conn_ws_create(C);
Conn_ws_path(C, "/static", "/home/x/public_html");
Conn_ws_script(C, "/cgi-bin/script1", function_script1);
Sounds good.
Another thing: libConn - 40k, wpool2 - 10k!
Se pare ca syscall-urile mele dureaza mai mult decit ale lui.
Chiar nu am nici o explicatie. Cum naiba de se intimpla asta?
Oare se contorizeaza si cod-ul dintre syscall-uri?
Probabil ca se contorizeaza si asteptarea! Si atunci e corect.
Dar la shutdown ce explicatie am?!
Se pare ca dupa un 'ab', wpool2 nu se mai opreste din mincat CPU!
Concluzie: Eu petrec 87% din timp in epoll_wait! gwan doar 6!!!
Se pare ca vine EPOLLIN si EPOLLRDHUP si nu fac nimic!
Dezactivez EPOLLRDHUP! Wow! 5600 req/s!
Dar, se pare ca tot am 95% in epoll_wait. 112724 apeluri fata de 10k!
Tot multe! Se pare ca ma blochez undeva si nu mai progresez de acolo!
strace -c (-n20000 -c10):
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
19.84 0.031245 1 60000 setsockopt
19.52 0.030749 2 20000 writev
18.32 0.028859 1 40000 20000 shutdown
16.58 0.026120 0 60000 20000 epoll_ctl
8.68 0.013667 1 20002 close
6.23 0.009811 0 20183 183 accept4
5.94 0.009357 1 10440 epoll_wait
4.88 0.007684 0 20042 20 read
0.00 0.000000 0 2 open
0.00 0.000000 0 12 stat
0.00 0.000000 0 2 fstat
0.00 0.000000 0 2 mmap
0.00 0.000000 0 1 mprotect
0.00 0.000000 0 2 munmap
------ ----------- ----------- --------- --------- ----------------
100.00 0.157492 250688 40203 total
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
87.27 3.011111 46 65802 epoll_wait
3.17 0.109265 5 20000 sendto
2.37 0.081884 4 20146 145 accept4
2.28 0.078617 1 63925 recvfrom
1.90 0.065479 3 20005 epoll_ctl
1.75 0.060324 3 20000 shutdown
1.22 0.041954 2 20008 close
0.03 0.001000 500 2 socketpair
0.02 0.000535 20 27 19 open
0.00 0.000077 19 4 munmap
0.00 0.000038 5 7 read
0.00 0.000000 0 1 write
0.00 0.000000 0 5 fstat
0.00 0.000000 0 19 mmap
0.00 0.000000 0 12 mprotect
0.00 0.000000 0 4 brk
0.00 0.000000 0 2 rt_sigaction
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 socket
0.00 0.000000 0 1 bind
0.00 0.000000 0 1 listen
0.00 0.000000 0 2 setsockopt
0.00 0.000000 0 2 clone
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 getcwd
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 1 arch_prctl
0.00 0.000000 0 3 1 futex
0.00 0.000000 0 2 sched_setaffinity
0.00 0.000000 0 1 sched_getaffinity
0.00 0.000000 0 1 epoll_create
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 3 set_robust_list
0.00 0.000000 0 3 epoll_create1
------ ----------- ----------- --------- --------- ----------------
100.00 3.450284 229996 166 total
Am adaugat cancel_disable.
ab -n20000 -c10 http://localhost:60000/100.html: 4763 req/sec sub perf
FARA PERF: 4374 req/sec WTF?!
perf report:
14.45% wpool2 [kernel.kallsyms] [k] ep_poll
13.84% wpool2 [kernel.kallsyms] [k] set_normalized_timespec
13.63% wpool2 [vdso] [.] 0x0000000000000cb0
6.24% wpool2 [kernel.kallsyms] [k] read_hpet
5.71% wpool2 [kernel.kallsyms] [k] select_estimate_accuracy
5.41% wpool2 libConn.so.1.0.33 [.] Conn_wpool_worker_func
Probabil ca apelez gettimeofday de prea multe ori. Da, se pare ca
0x0000000000000cb0 este gettimeofday.
Daca scot sched_yield, cu perf record am 4778 req/s
[pid 1787] SYS_mmap(0, 0x8000000, 0, 0x4022) = 0x7f5bb264f000
[pid 1787] SYS_munmap(0x7f5bb264f000, 26939392) = 0
[pid 1787] SYS_munmap(0x7f5bb8000000, 40169472) = 0
Se pare ca se face un mmap si apoi imediat munmap. WTF?!
Ulterior nu mai face.
Eu gwan
epoll_wait epoll_wait
accept4 accept4
mmap -
munmap -
munmap -
mprotect -
- setsockopt(NODELAY)
epoll_ctl epoll_ctl
epoll_wait epoll_wait
recvfrom read
- setsockopt(NODELAY)
- open
- fstat
- open,stat,fstat,mmap,read,close,munmap
sendto writev
shutdown shutdown
- setsockopt(NODELAY)
epoll_wait epoll_wait
- epoll_ctl(DEL)
- shutdown!
- epoll_ctl(DEL)!
close close
Clar pot mai bine de atit!
Use pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
Maybe this way cancelation will not appear in perf reports.
- Eu stau mult mai mult timp in epoll_wait. Very strange. E vorba de 22 de secunde in plus!
- Eu chem accept4 cu 40000 mai mult! Fuck!
- Cum naiba eu chem de 50.000 ori shutdown, fara erori, iar el cheama de 100.000 si dureaza mai putin?!
- E incredibil cum reuseste. Doar daca syscall-urile mele sint intrerupte de prea multe ori.
- Macar eu fac de 3 ori mai putine setsockopt.
Next steps:
Nu mai chem accept4 inca o data, pentru ca veni cu notificarea.
Din ce in ce mai putin cred in EPOLLET. What a fuck?!
Decit sa fac un apel la accept, pe fiecare thread, mai bine apelez epoll_wait.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
97.87 35.710386 187 191175 epoll_wait
1.26 0.459349 3 141576 91576 accept4
0.52 0.189052 4 50000 sendto
0.22 0.080552 2 50000 shutdown
0.06 0.023073 0 50000 close
0.03 0.010337 0 50306 recvfrom
0.03 0.009216 0 50000 epoll_ctl
0.02 0.007091 0 50000 setsockopt
------ ----------- ----------- --------- --------- ----------------
100.00 36.489056 633057 91576 total
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
97.60 13.464343 89 150462 epoll_wait
0.91 0.124850 2 50000 writev
0.49 0.067336 1 100000 50000 shutdown
0.44 0.060369 1 101351 51351 accept4
0.25 0.034678 0 150000 setsockopt
0.13 0.017909 0 150000 50000 epoll_ctl
0.11 0.015268 0 50123 4 read
0.07 0.009198 0 50002 close
------ ----------- ----------- --------- --------- ----------------
100.00 13.795050 802006 151355 total
2013-12-10: after -O3
prof: -c2+5000 8%
-c2+50000 3719
2013-11-24: on r1 (after putting free structures in front of free list)
-c10+50000 5034
-c2+50000 3777 4700
-c1+50000 2973 3900
2013-11-21: on r1
me gwan
-c1+50000 1703 3206
-c2+50000 3619 4900
-c10+50000 3647 5028
2013-11-20: on r1 (after doing allocations per thread):
branch 1+5000 7
-c2+50000 5093!broken?
2013-11-17: on r1 (after doing accept in all workers):
me me+Log
branch 1+5000 6.4!
-c1+50000 2180
-c2+50000 3400
2013-11-16: on r1:
me me+Log gwan
branch 1+5000 9.5%
-c1+50000: 3882 2094!
-c2+50000: 4100 4927
2013-11-13: on r1: ~4270 req/s (-n50000 -c2 + taskset + nice) branch mispredict: 8% K:3.11.4-201 so(Conn)=448 gwan:-c1:~2000 me-c1:3875
2013-11-12: on r1: ~3850 req/s (-n50000 -c2 + taskset + nice) branch mispredict: 9% K:3.11.4-201 so(Conn)=480
[ ] Call Conn_ws_free when freeing a Conn.
[ ] Make sure we compile with -O3
[ ] Should we call again accept or go to poll mode? I think we should go to poll.
[ ] Compile with -s to obtain profiling on assembly code.
[ ] We may get rid of NODELAY because we write and do shutdown. I hope
this is triggering a flush. To test.
[ ] Prima data, ar trebui sa ignor O pentru ca nu am cum sa am ceva in buffer.
[ ] Imi trebui un mecanism, preferabil fara locking, ca sa trimit statistici catre master.
Eventual doar la cerere, ca sa evit trafic inutil.
Dar, conexiunea pentru statistici, o sa vina pe un worker.
Probabil ca pot sa fac o semnalizare prin pipe. Copiez intr-un buffer
statisticile curente, apoi trimit pointer-ul prin pipe. Aste pentru update.
In momentul in care vine o cerere de statistici, trebuie sa le cer de la master
si apoi sa le servesc.
[ ] Stop using callbacks for send/receive to speed up operations.
[ ] We should do not call initial out hook. We can just try to send at first
kick and react to EAGAIN. Very probably we can send.
[ ] Probabil ca o sa avem structuri diferite pentru ce seteaza clientul
(Conn_alloc/commit) si alta pentru bookkeeping-ul intern.
[ ] Move main pollfd to all threads. Tis way they will be "equal" and every
core will be at full speed without migrations.
[ ] Check with gdb why we get a segmentation fault in line 2267.
[ ] Limit the number of acepts to not starve read/write.
[ ] Should we do Conn_now per thread? It is updated from all worker threads!
[ ] Switch to libConn.so.1 at compile time to be able tu bump the version sometime.
[ ] Verify likely/unlikely. I suspect are not working correctly.
[ ] http://lwn.net/Articles/257209/
[ ] http://highscalability.com/blog/2013/5/13/the-secret-to-10-million-concurrent-connections-the-kernel-i.html
[ ] http://fasterdata.es.net/host-tuning/linux/
[ ] Investigate MSG_MORE when sending.
[ ] When init Conn, preallocate a 1 worker wp and set it and when user requests
another wp, just put(wp) and set the new one? Or at commit time?
Use cases: want to alloc 1 core for a listen port and for other
many cores.
[ ] Split Conn_poll_cb into MASTER/NON_MASTER and do not make it callback
but inline.
[ ] ->next pointer can be removed from struct Conn.
This way I can save a lot of space.
[ ] Do not make the fd -1, is pointless.
[ ] Replace Conn_X with Conn_get_socket_X!
[ ] Use shutdown(2) before closing connection. Done, but see the link.
[ ] Switch all pointers to callbacks to a single callback with paramenters +
a flag that will say for what type of callbacks to call the callback.
What happends when I want to change one callback?
[ ] Nu pare ca inchid conexiunea: fac shutdown, dar atit.
[ ] Conn_free_intern is not called. Because of callbacks?
[ ] Try to alloc bigger chunks for wpool and maybe other stuff.
[ ] Alloc private area just after Conn structure. Add a function to set private
[ ] Set on master socket the needed in/out buffer sizes and inherit to accepted
ones. This is because we may need different buffers for different masters.
[ ] Daca am luat HUP, nu mai trebuie sa permit parsarea in continuare!
[ ] Ar trebui sa-l scoatem din lista de active C-ul caruia ii facem free.
[ ] Align Conn structures to 8 bytes in allocations blocks.
[ ] Investigate the idea to put free buffers in front of the queue because
they are hot.
[ ]
[ ] Use enums for enum types.
[ ] Cache getaddrinfo responses
[ ] Investigate moving TCP stack in userspace.
[ ] Conn_join(C1, C2) (Bridge 2 connections together for proxy stuff.)
[ ] See http://highscalability.com/blog/2012/9/10/russ-10-ingredient-recipe-for-making-1-million-tps-on-5k-har.html
[ ] Dump all memory statistics
[ ] SCTP
[ ] .error_state -> error_type
[ ] if (.error_state...) -> if (.state == CONN_STATE_ERROR)
[ ] Add a function to set the maximum number of connections.
[ ] Fix the whole list scanning for expiration, band and closing.
[ ] Put callbacks in a structure to free some space from struct Conn.
[ ] wpool: When we free a Conn structure, we have to Conn_del_wp!
[ ] wpool: What if we add master sockets also to workers and do nothing in main
thread? Check ma.c example. Verified: accept wakes up only one thread.
Still to check if epoll wakes in all threads! Seems it wakes all threads!
Not very good.
[ ] Investigate splice.
[ ] Investigate MSG_MORE as an alternative to CORK or writev.
[ ] Check if we are swapping and warn.
[ ] Log faults and io.
[ ] Add access control
Conn_ac_set_default(C, CONN_AC_DENY) - default deny (or CONN_AC_ALLOW)
Conn_ac_add(C, CONN_AC_ALLOW, "2001::1/64"); - for ipv6
Conn_ac_add(C, CONN_AC_ALLOW, ""); - for ipv4
[ ] A la redir stuff
[ ] Check PACKET: can we send with "send" without knowing the MAC?
[ ] UDP
[ ] Ce se intimpla daca se ajunge la ~ sfirsitul buffer-ului si nu pot inca sa
procesez datele? We should log and close the connection. It is
programmer's fault or a DoS.
[ ] Queue for delete/trytoconnect/etc.
[ ] net.core.somaxconn
[ ] Take care for /proc/net/netstat
[ ] /proc/sys/net/ipv4/tcp_mem
Now (512M): 49152 65536 98304
Now (256M): 24576 32768 49152 - 55 conns/sec
Test with: 80000 120000 240000 - 92 conns/sec
Test with 160000 240000 480000 - 96 conns/sec
echo "16000 64000 512000" > tcp_[rw]mem - 96
After echo 1 > /proc/sys/net/ipv4/tcp_low_latency - 156 conns/sec
Pentru a reduce numarul de conexiuni in TIME-WAIT:
echo 200 > /proc/sys/net/ipv4/tcp_max_tw_buckets
[ ] Add loadbalancing and failover in the base code.
[ ] Automaticaly put \0 at the end of receive data. What for?!
[ ] Add the possibility to wait for an char/string before calling recv/data callback.
Maybe do this with socket filtering or in kernel?
[ ] Change socket buffer accordingly with user settings to minimize
needed memory.
[ ] Dump how many memory is in use vor various parts of the internal data.
[ ] Do not mix slot and id and fd in examples.
[ ] Test suite
[ ] Free memory when the number of connections is going down.
[ ] Bandwidth part should have a separate pointer, to not load too much Conn structure.
[ ] Maybe we should have Bandwidth classes so we can group connections.
[ ] http://www.erlang-solutions.com/thesis/tcp_optimisation/tcp_optimisation.html
[ ]
=== When we switch to Conn version 2 library ===
[ ] Conn_socket will call Conn_socket_proto
[ ] use enums!
[ ] http://urbanairship.com/blog/2010/09/29/linux-kernel-tuning-for-c500k/
[ ]