catalinux / Conn (public) (License: LGPLv2) (since 2016-03-01) (hash sha1)
Net library for easy building ipv4/ipv6 network daemons/clients

/TODO (98bc46993d3191b70bde26e2baec4a96328d131a) (17095 bytes) (mode 100644) (type blob)

[ ] When alocating a poll of Conns, check the alignment!
[ ] http://thread.gmane.org/gmane.linux.network/337836 - SO_INCOMING_CPU

[ ] Scenario: raspundem la un request, vedem ca nu mai avem nimic de trimis,
	datele sint inca in buffer, apelam close => datele se pierd!
	Trebuie sa facem shutdown!
[ ] De facut o schema cu starile prin care trece o conexiune, suspectez
	ca atunci cind obuf e 0, nu fac shutdown in loc de close.
[ ] La http/1.1, default e sa nu inchida conexiunea.
[ ] SO_INCOMING_CPU http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2c8c56e15df3d4c2af3d656e44feb18789f75837
[ ] Check the new batch mode of epoll
[ ] Use SO_REUSEPORT for accept():
	!!! http://lists.dragonflybsd.org/pipermail/users/2013-July/053632.html
	!!! https://github.com/monkey/monkey/commit/d1da249a0b5e8f5765ea8031919fb32e93c57cb8
[ ] Use defer accept!
[ ] 

== Devel point ==
[ ] I think that I must switch back to processes. Too much overhead for threads.
	And I do not know if I gain something by using threads.
[ ] Now I am working on simple web requests.
	Static (/) and dynamic (/cgi?a=1).
[ ] We must send "HTTP/x.x code message" respecting incoming request.
	Our API must deal with it.

== Some history ==
2014-04-02: Se pare ca bat gwan-ul. Cam 4300 vs 3700. Dar eu nu fac chiar tot
	ce face el. Trec la un API pentru a crea un server web.
	Sa vedem cam cum ar trebui sa arate.
	C = Conn_alloc();
	wp = Conn_wpool_create();
	Conn_set_wp(C, wp);
	Conn_commit(C);
	while (1) {
		Conn_poll(-1);
	}

	ws = Conn_ws_create(C);
	Conn_ws_path(C, "/static", "/home/x/public_html");
	Conn_ws_script(C, "/cgi-bin/script1", function_script1);

	Sounds good.

	Another thing: libConn - 40k, wpool2 - 10k!

2014-03-25:
	Se pare ca syscall-urile mele dureaza mai mult decit ale lui.
	Chiar nu am nici o explicatie. Cum naiba de se intimpla asta?
	Oare se contorizeaza si cod-ul dintre syscall-uri?
	Probabil ca se contorizeaza si asteptarea! Si atunci e corect.
	Dar la shutdown ce explicatie am?!
	Se pare ca dupa un 'ab', wpool2 nu se mai opreste din mincat CPU!

2014-03-25:
	Concluzie: Eu petrec 87% din timp in epoll_wait! gwan doar 6!!!
	Se pare ca vine EPOLLIN si EPOLLRDHUP si nu fac nimic!
	Dezactivez EPOLLRDHUP! Wow! 5600 req/s!
	Dar, se pare ca tot am 95% in epoll_wait. 112724 apeluri fata de 10k!
	Tot multe! Se pare ca ma blochez undeva si nu mai progresez de acolo!

	strace -c (-n20000 -c10):
	gwan:
	% time     seconds  usecs/call     calls    errors syscall
	------ ----------- ----------- --------- --------- ----------------
	 19.84    0.031245           1     60000           setsockopt
	 19.52    0.030749           2     20000           writev
	 18.32    0.028859           1     40000     20000 shutdown
	 16.58    0.026120           0     60000     20000 epoll_ctl
	  8.68    0.013667           1     20002           close
	  6.23    0.009811           0     20183       183 accept4
	  5.94    0.009357           1     10440           epoll_wait
	  4.88    0.007684           0     20042        20 read
	  0.00    0.000000           0         2           open
	  0.00    0.000000           0        12           stat
	  0.00    0.000000           0         2           fstat
	  0.00    0.000000           0         2           mmap
	  0.00    0.000000           0         1           mprotect
	  0.00    0.000000           0         2           munmap
	------ ----------- ----------- --------- --------- ----------------
	100.00    0.157492                250688     40203 total

	% time     seconds  usecs/call     calls    errors syscall
	------ ----------- ----------- --------- --------- ----------------
	 87.27    3.011111          46     65802           epoll_wait
	  3.17    0.109265           5     20000           sendto
	  2.37    0.081884           4     20146       145 accept4
	  2.28    0.078617           1     63925           recvfrom
	  1.90    0.065479           3     20005           epoll_ctl
	  1.75    0.060324           3     20000           shutdown
	  1.22    0.041954           2     20008           close
	  0.03    0.001000         500         2           socketpair
	  0.02    0.000535          20        27        19 open
	  0.00    0.000077          19         4           munmap
	  0.00    0.000038           5         7           read
	  0.00    0.000000           0         1           write
	  0.00    0.000000           0         5           fstat
	  0.00    0.000000           0        19           mmap
	  0.00    0.000000           0        12           mprotect
	  0.00    0.000000           0         4           brk
	  0.00    0.000000           0         2           rt_sigaction
	  0.00    0.000000           0         1           rt_sigprocmask
	  0.00    0.000000           0         1         1 access
	  0.00    0.000000           0         1           socket
	  0.00    0.000000           0         1           bind
	  0.00    0.000000           0         1           listen
	  0.00    0.000000           0         2           setsockopt
	  0.00    0.000000           0         2           clone
	  0.00    0.000000           0         1           execve
	  0.00    0.000000           0         1           getcwd
	  0.00    0.000000           0         1           getrlimit
	  0.00    0.000000           0         1           arch_prctl
	  0.00    0.000000           0         3         1 futex
	  0.00    0.000000           0         2           sched_setaffinity
	  0.00    0.000000           0         1           sched_getaffinity
	  0.00    0.000000           0         1           epoll_create
	  0.00    0.000000           0         1           set_tid_address
	  0.00    0.000000           0         3           set_robust_list
	  0.00    0.000000           0         3           epoll_create1
	------ ----------- ----------- --------- --------- ----------------
	100.00    3.450284                229996       166 total


2014-03-24
	Am adaugat cancel_disable.
	ab -n20000 -c10 http://localhost:60000/100.html: 4763 req/sec sub perf
	FARA PERF: 4374 req/sec WTF?!
	perf report:
	14.45%  wpool2  [kernel.kallsyms]   [k] ep_poll
	13.84%  wpool2  [kernel.kallsyms]   [k] set_normalized_timespec
	13.63%  wpool2  [vdso]              [.] 0x0000000000000cb0
	 6.24%  wpool2  [kernel.kallsyms]   [k] read_hpet
	 5.71%  wpool2  [kernel.kallsyms]   [k] select_estimate_accuracy
	 5.41%  wpool2  libConn.so.1.0.33   [.] Conn_wpool_worker_func
	Probabil ca apelez gettimeofday de prea multe ori. Da, se pare ca
	0x0000000000000cb0 este gettimeofday.
	Daca scot sched_yield, cu perf record am 4778 req/s
	[pid 1787] SYS_mmap(0, 0x8000000, 0, 0x4022)                    = 0x7f5bb264f000
	[pid 1787] SYS_munmap(0x7f5bb264f000, 26939392)                 = 0
	[pid 1787] SYS_munmap(0x7f5bb8000000, 40169472)                 = 0
	Se pare ca se face un mmap si apoi imediat munmap. WTF?!
	Ulterior nu mai face.

	Eu			gwan
	epoll_wait		epoll_wait
	accept4			accept4
	mmap			-
	munmap			-
	munmap			-
	mprotect		-
	-			setsockopt(NODELAY)
	epoll_ctl		epoll_ctl
	epoll_wait		epoll_wait
	recvfrom		read
	-			setsockopt(NODELAY)
	-			open
	-			fstat
	-			open,stat,fstat,mmap,read,close,munmap
	sendto			writev
	shutdown		shutdown
	-			setsockopt(NODELAY)
	epoll_wait		epoll_wait
	-			epoll_ctl(DEL)
	-			shutdown!
	-			epoll_ctl(DEL)!
	close			close
	Clar pot mai bine de atit!

2014-03-11
	Use pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
	Maybe this way cancelation will not appear in perf reports.
2013-12-12
Concluzii:
	- Eu stau mult mai mult timp in epoll_wait. Very strange. E vorba de 22 de secunde in plus!
	- Eu chem accept4 cu 40000 mai mult! Fuck!
	- Cum naiba eu chem de 50.000 ori shutdown, fara erori, iar el cheama de 100.000 si dureaza mai putin?!
	- E incredibil cum reuseste. Doar daca syscall-urile mele sint intrerupte de prea multe ori.
	- Macar eu fac de 3 ori mai putine setsockopt.
Next steps:
	Nu mai chem accept4 inca o data, pentru ca veni cu notificarea.
	Din ce in ce mai putin cred in EPOLLET. What a fuck?!
	Decit sa fac un apel la accept, pe fiecare thread, mai bine apelez epoll_wait.

wpool2:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.87   35.710386         187    191175           epoll_wait
  1.26    0.459349           3    141576     91576 accept4
  0.52    0.189052           4     50000           sendto
  0.22    0.080552           2     50000           shutdown
  0.06    0.023073           0     50000           close
  0.03    0.010337           0     50306           recvfrom
  0.03    0.009216           0     50000           epoll_ctl
  0.02    0.007091           0     50000           setsockopt
------ ----------- ----------- --------- --------- ----------------
100.00   36.489056                633057     91576 total

gwan:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.60   13.464343          89    150462           epoll_wait
  0.91    0.124850           2     50000           writev
  0.49    0.067336           1    100000     50000 shutdown
  0.44    0.060369           1    101351     51351 accept4
  0.25    0.034678           0    150000           setsockopt
  0.13    0.017909           0    150000     50000 epoll_ctl
  0.11    0.015268           0     50123         4 read
  0.07    0.009198           0     50002           close
------ ----------- ----------- --------- --------- ----------------
100.00   13.795050                802006    151355 total


2013-12-10: after -O3
				me
	prof: -c2+5000		8%
	-c2+50000		3719

2013-11-24: on r1 (after putting free structures in front of free list)
			me
	-c10+50000		5034
	-c2+50000	3777	4700
	-c1+50000	2973	3900

2013-11-21: on r1
			me	gwan
	-c1+50000	1703	3206
	-c2+50000	3619	4900
	-c10+50000	3647	5028

2013-11-20: on r1 (after doing allocations per thread):
			me
	branch 1+5000	7
	-c1+50000	
	-c2+50000	5093!broken?

2013-11-17: on r1 (after doing accept in all workers):
			me	me+Log
	branch 1+5000	6.4!
	-c1+50000	2180
	-c2+50000	3400
2013-11-16: on r1:
			me	me+Log		gwan
	branch 1+5000	9.5%
	-c1+50000:	3882			2094!
	-c2+50000:	4100			4927
2013-11-13: on r1: ~4270 req/s (-n50000 -c2 + taskset + nice) branch mispredict: 8% K:3.11.4-201 so(Conn)=448 gwan:-c1:~2000 me-c1:3875
2013-11-12: on r1: ~3850 req/s (-n50000 -c2 + taskset + nice) branch mispredict: 9% K:3.11.4-201 so(Conn)=480

== SHOWSTOPPERS ==
[ ] Call Conn_ws_free when freeing a Conn.
[ ] Make sure we compile with -O3
[ ] Should we call again accept or go to poll mode? I think we should go to poll.
[ ] Compile with -s to obtain profiling on assembly code.
[ ] We may get rid of NODELAY because we write and do shutdown. I hope
	this is triggering a flush. To test.
[ ] Prima data, ar trebui sa ignor O pentru ca nu am cum sa am ceva in buffer.
[ ] Imi trebui un mecanism, preferabil fara locking, ca sa trimit statistici catre master.
	Eventual doar la cerere, ca sa evit trafic inutil.
	Dar, conexiunea pentru statistici, o sa vina pe un worker.
	Probabil ca pot sa fac o semnalizare prin pipe. Copiez intr-un buffer
	statisticile curente, apoi trimit pointer-ul prin pipe. Aste pentru update.
	In momentul in care vine o cerere de statistici, trebuie sa le cer de la master
	si apoi sa le servesc.
[ ] Stop using callbacks for send/receive to speed up operations.
[ ] We should do not call initial out hook. We can just try to send at first
	kick and react to EAGAIN. Very probably we can send.
[ ] Probabil ca o sa avem structuri diferite pentru ce seteaza clientul
	(Conn_alloc/commit) si alta pentru bookkeeping-ul intern.
[ ] Move main pollfd to all threads. Tis way they will be "equal" and every
	core will be at full speed without migrations.
[ ] Check with gdb why we get a segmentation fault in line 2267.
[ ] Limit the number of acepts to not starve read/write.

== HIGH PRIORITY ==
[ ] Should we do Conn_now per thread? It is updated from all worker threads!
[ ] Switch to libConn.so.1 at compile time to be able tu bump the version sometime.
[ ] Verify likely/unlikely. I suspect are not working correctly.
[ ] http://lwn.net/Articles/257209/
[ ] http://highscalability.com/blog/2013/5/13/the-secret-to-10-million-concurrent-connections-the-kernel-i.html
[ ] http://fasterdata.es.net/host-tuning/linux/
[ ] Investigate MSG_MORE when sending.
[ ] When init Conn, preallocate a 1 worker wp and set it and when user requests
	another wp, just put(wp) and set the new one? Or at commit time?
	Use cases: want to alloc 1 core for a listen port and for other
	many cores.
[ ] Split Conn_poll_cb into MASTER/NON_MASTER and do not make it callback
	but inline.
[ ] ->next pointer can be removed from struct Conn.
	This way I can save a lot of space.
[ ] Do not make the fd -1, is pointless.
[ ] Replace Conn_X with Conn_get_socket_X!
[ ] Use shutdown(2) before closing connection. Done, but see the link.
	http://www.developerweb.net/forum/archive/index.php/t-2940.html.
[ ] Switch all pointers to callbacks to a single callback with paramenters +
	a flag that will say for what type of callbacks to call the callback.
	What happends when I want to change one callback?
[ ] Nu pare ca inchid conexiunea: fac shutdown, dar atit.
[ ] Conn_free_intern is not called. Because of callbacks?
[ ] Try to alloc bigger chunks for wpool and maybe other stuff.
[ ] Alloc private area just after Conn structure. Add a function to set private
	size.
[ ] Set on master socket the needed in/out buffer sizes and inherit to accepted
	ones. This is because we may need different buffers for different masters.
[ ] Daca am luat HUP, nu mai trebuie sa permit parsarea in continuare!
[ ] Ar trebui sa-l scoatem din lista de active C-ul caruia ii facem free.
[ ] Align Conn structures to 8 bytes in allocations blocks.
[ ] Investigate the idea to put free buffers in front of the queue because
	they are hot.
[ ] 


== LOW PRIORITY ==
[ ] Use enums for enum types.
[ ] Cache getaddrinfo responses
[ ] Investigate moving TCP stack in userspace.
[ ] Conn_join(C1, C2) (Bridge 2 connections together for proxy stuff.)
[ ] See http://highscalability.com/blog/2012/9/10/russ-10-ingredient-recipe-for-making-1-million-tps-on-5k-har.html
[ ] Dump all memory statistics
[ ] SCTP
[ ] .error_state -> error_type
[ ] if (.error_state...) -> if (.state == CONN_STATE_ERROR)
[ ] Audit CONN_STATE_EMPTY vs CONN_STATE_FREE
[ ] Add a function to set the maximum number of connections.
[ ] Fix the whole list scanning for expiration, band and closing.
[ ] Put callbacks in a structure to free some space from struct Conn.
[ ] wpool: When we free a Conn structure, we have to Conn_del_wp!
[ ] wpool: What if we add master sockets also to workers and do nothing in main
	thread? Check ma.c example. Verified: accept wakes up only one thread.
	Still to check if epoll wakes in all threads! Seems it wakes all threads!
	Not very good.
[ ] Investigate splice.
[ ] Investigate MSG_MORE as an alternative to CORK or writev.
[ ] Check if we are swapping and warn.
[ ] Log faults and io.
[ ] Add access control
	Conn_ac_set_default(C, CONN_AC_DENY) - default deny (or CONN_AC_ALLOW)
	Conn_ac_add(C, CONN_AC_ALLOW, "2001::1/64"); - for ipv6
	Conn_ac_add(C, CONN_AC_ALLOW, "192.168.0.0/25"); - for ipv4
[ ] A la redir stuff
[ ] Check PACKET: can we send with "send" without knowing the MAC?
[ ] UDP
[ ] Ce se intimpla daca se ajunge la ~ sfirsitul buffer-ului si nu pot inca sa
	procesez datele? We should log and close the connection. It is
	programmer's fault or a DoS.
[ ] Queue for delete/trytoconnect/etc.

Performance:
[ ] net.core.somaxconn
[ ]	Take care for /proc/net/netstat
[ ]	/proc/sys/net/ipv4/tcp_mem
	Now (512M): 49152 65536 98304
	Now (256M): 24576 32768 49152 - 55 conns/sec

	Test with: 80000 120000 240000 - 92 conns/sec
	Test with 160000 240000 480000 - 96 conns/sec

	After:
		echo "16000 64000 512000" > tcp_[rw]mem - 96

	After echo 1 > /proc/sys/net/ipv4/tcp_low_latency - 156 conns/sec

Pentru a reduce numarul de conexiuni in TIME-WAIT:
	echo 200 > /proc/sys/net/ipv4/tcp_max_tw_buckets

[ ] Add loadbalancing and failover in the base code.
[ ] Automaticaly put \0 at the end of receive data. What for?!
[ ] Add the possibility to wait for an char/string before calling recv/data callback.
	Maybe do this with socket filtering or in kernel?
[ ] Change socket buffer accordingly with user settings to minimize
	needed memory.
[ ] Dump how many memory is in use vor various parts of the internal data.
[ ] Do not mix slot and id and fd in examples.
[ ] Test suite
[ ] Free memory when the number of connections is going down.
[ ] Bandwidth part should have a separate pointer, to not load too much Conn structure.
[ ] Maybe we should have Bandwidth classes so we can group connections.
[ ] http://www.erlang-solutions.com/thesis/tcp_optimisation/tcp_optimisation.html
[ ] 

=== When we switch to Conn version 2 library ===
[ ] Conn_socket will call Conn_socket_proto
[ ] use enums!
[ ] http://urbanairship.com/blog/2010/09/29/linux-kernel-tuning-for-c500k/
[ ] 


Mode Type Size Ref File
100644 blob 112 3a048942198455b0035de36927f4655a76284dc6 .exclude
100644 blob 94 2e97920b91646e1a8c2438ca375e2aaae22793fb .gitignore
100644 blob 169 c003c095218f64ad33aeb89987f61eb575557d96 .mailmap
100644 blob 1945 fecf0e7a7e8580485101a179685aedc7e00affbb Changelog.pre109
100644 blob 85381 f1e72cb5282868b14553381899e84250ab0d80ce Conn.c
100644 blob 5314 5ea7425b4a2b6cd68d6bcabfe25c23a4ceff6b6c Conn.h
100644 blob 860 30db00511f3bdee57aea7c5cfbf628a82b89d5db Conn.spec.in
100644 blob 747 662c3f3fe8d0a3d23770631d7a0a260719d81e62 Conn_config.h.in
100644 blob 30 d987fa5df957830331139935d517009e2911b0cf INSTALL
100644 blob 25275 92b8903ff3fea7f49ef5c041b67a087bca21c5ec LICENSE
100644 blob 1340 d52f32b2778712dd13038f390b2f4321473de673 Makefile.in
100644 blob 192 5b11bdfb23857d8588845465aef993b320596b44 README
100644 blob 17095 98bc46993d3191b70bde26e2baec4a96328d131a TODO
100755 blob 30 92c4bc48245c00408cd7e1fd89bc1a03058f4ce4 configure
040000 tree - d4c9c4a69c5cfa2a84316967185f1661b6817779 docs
100755 blob 13704 87c4881d7f32f8179d29d86ee17ddbe0f6254c57 duilder
100644 blob 381 47868bb84597a47ce487a62d22819d57666c861c duilder.conf
040000 tree - 2ae1ba40f0684de17690de8caccfeb24a964ad1d examples
040000 tree - cc405c053275900a4395d05041eb8e6decae0647 tests
Hints:
Before first commit, do not forget to setup your git environment:
git config --global user.name "your_name_here"
git config --global user.email "your@email_here"

Clone this repository using HTTP(S):
git clone https://rocketgit.com/user/catalinux/Conn

Clone this repository using ssh (do not forget to upload a key first):
git clone ssh://rocketgit@ssh.rocketgit.com/user/catalinux/Conn

Clone this repository using git:
git clone git://git.rocketgit.com/user/catalinux/Conn

You are allowed to anonymously push to this repository.
This means that your pushed commits will automatically be transformed into a merge request:
... clone the repository ...
... make some changes and some commits ...
git push origin main