RocketGit

catalinux / Conn (public) (License: LGPLv2) (since 2016-03-01) (hash sha1)

Net library for easy building ipv4/ipv6 network daemons/clients

Clone URLs: https://rocketgit.com/user/catalinux/Conn ssh://rocketgit@ssh.rocketgit.com/user/catalinux/Conn git://git.rocketgit.com/user/catalinux/Conn

/TODO (5fbc10d9b77d79cd5f62c5f972cc4f7171a4c4f5) (19376 bytes) (mode 100644) (type blob)

Use 'set follow-fork-mode child' for gdb to follow childs.

[ ] Not clear what happens with CFLAGS passed by rpmbuild (for example).
[ ] Do not set affinity if we have a single worker?
[ ] Should I have a link to the master, to remove a lot of pointers to
	common stuff? For example C->web. Or the callbacks.
[ ] We share C->web between connections!
	So struct Conn_web must contail only 'urls' (for now), to be able to
	dispatch requests, but the rest of the fields must be per C.
	Check Conn_web_dispatch for the logic.
[ ] Shouldn't Conn_web_script receive a *web parameter instead of C?!
	Because we do not expose the web stuff!
[ ] Parent must register the pipe socket to be able to receive notifications!
	And the loglevel when entering a function should be the minimum of all Log calls
	inside the function. If is not ok, log full info in errors!
[ ] We must standardize on [C->id __func__] in all functions.
[ ] Before Conn_commit, call Conn_private_size(C, xxx); and auto alloc priv
	area.
[ ] It is bad. If we start only one worker, seems I do not accept connections!
[ ] I need to pass the bind info using control socket and do the bind in the
	client. Think about a crash. Think about multiple listening sockets.
	Maybe the best thing to do is in master to not deal with any Conn
	structure. Only the control interfaces.
	Not really possible because the API controls a Conn struct.
	But, maybe, do not init anything except callbacks.
[ ] On master sockets, we must not try to get peername
	getpeername(5, 0x7ffc5c935fe0, [16]) = -1 ENOTCONN (Transport endpoint is not connected)
[ ] I must fork workers in Conn_commit, to have the callbacks.
[ ] Check SO_BUSY_POLL (man 7 socket)
[ ] Start as many threads as cpumask is: avoid forbidden CPUs
[ ] 4.4: Add setsockopt() support for SO_INCOMING_CPU and extend SO_REUSEPORT
	selection logic : If a TCP listener or UDP socket has this option set,
	a packet is delivered to this socket only if CPU handling the packet
	matches the specified one. This allows to build very efficient TCP
	servers, using one listener per RX queue, as the associated TCP
	listener should only accept flows handled in softirq by the same cpu.
	This provides optimal NUMA behavior and keep cpu caches hot
[ ] http://thread.gmane.org/gmane.linux.network/337836 - SO_INCOMING_CPU
[ ] SO_INCOMING_CPU http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2c8c56e15df3d4c2af3d656e44feb18789f75837
[ ] https://lwn.net/Articles/659199/ listenter improvements - see about numa
[ ] https://lwn.net/Articles/655299/ MSG_ZEROCOPY
[ ] When alocating a poll of Conns, check the alignment!
[ ] Scenario: raspundem la un request, vedem ca nu mai avem nimic de trimis,
	datele sint inca in buffer, apelam close => datele se pierd!
	Trebuie sa facem shutdown!
[ ] De facut o schema cu starile prin care trece o conexiune, suspectez
	ca atunci cind obuf e 0, nu fac shutdown in loc de close.
[ ] La http/1.1, default e sa nu inchida conexiunea.
[ ] Check the new batch mode of epoll
[ ] Use SO_REUSEPORT for accept():
	!!! http://lists.dragonflybsd.org/pipermail/users/2013-July/053632.html
	!!! https://github.com/monkey/monkey/commit/d1da249a0b5e8f5765ea8031919fb32e93c57cb8
[ ] Use defer accept!
[ ] 

== Devel point ==
[ ] I think that I must switch back to processes. Too much overhead for threads.
	And I do not know if I gain something by using threads.
[ ] Now I am working on simple web requests.
	Static (/) and dynamic (/cgi?a=1).
[ ] We must send "HTTP/x.x code message" respecting incoming request.
	Our API must deal with it.

== Some history ==
2014-04-02: Se pare ca bat gwan-ul. Cam 4300 vs 3700. Dar eu nu fac chiar tot
	ce face el. Trec la un API pentru a crea un server web.
	Sa vedem cam cum ar trebui sa arate.
	C = Conn_alloc();
	wp = Conn_wpool_create();
	Conn_set_wp(C, wp);
	Conn_commit(C);
	while (1) {
		Conn_poll(-1);
	}

	ws = Conn_ws_create(C);
	Conn_ws_path(C, "/static", "/home/x/public_html");
	Conn_ws_script(C, "/cgi-bin/script1", function_script1);

	Sounds good.

	Another thing: libConn - 40k, wpool2 - 10k!

2014-03-25:
	Se pare ca syscall-urile mele dureaza mai mult decit ale lui.
	Chiar nu am nici o explicatie. Cum naiba de se intimpla asta?
	Oare se contorizeaza si cod-ul dintre syscall-uri?
	Probabil ca se contorizeaza si asteptarea! Si atunci e corect.
	Dar la shutdown ce explicatie am?!
	Se pare ca dupa un 'ab', wpool2 nu se mai opreste din mincat CPU!

2014-03-25:
	Concluzie: Eu petrec 87% din timp in epoll_wait! gwan doar 6!!!
	Se pare ca vine EPOLLIN si EPOLLRDHUP si nu fac nimic!
	Dezactivez EPOLLRDHUP! Wow! 5600 req/s!
	Dar, se pare ca tot am 95% in epoll_wait. 112724 apeluri fata de 10k!
	Tot multe! Se pare ca ma blochez undeva si nu mai progresez de acolo!

	strace -c (-n20000 -c10):
	gwan:
	% time     seconds  usecs/call     calls    errors syscall
	------ ----------- ----------- --------- --------- ----------------
	 19.84    0.031245           1     60000           setsockopt
	 19.52    0.030749           2     20000           writev
	 18.32    0.028859           1     40000     20000 shutdown
	 16.58    0.026120           0     60000     20000 epoll_ctl
	  8.68    0.013667           1     20002           close
	  6.23    0.009811           0     20183       183 accept4
	  5.94    0.009357           1     10440           epoll_wait
	  4.88    0.007684           0     20042        20 read
	  0.00    0.000000           0         2           open
	  0.00    0.000000           0        12           stat
	  0.00    0.000000           0         2           fstat
	  0.00    0.000000           0         2           mmap
	  0.00    0.000000           0         1           mprotect
	  0.00    0.000000           0         2           munmap
	------ ----------- ----------- --------- --------- ----------------
	100.00    0.157492                250688     40203 total

	% time     seconds  usecs/call     calls    errors syscall
	------ ----------- ----------- --------- --------- ----------------
	 87.27    3.011111          46     65802           epoll_wait
	  3.17    0.109265           5     20000           sendto
	  2.37    0.081884           4     20146       145 accept4
	  2.28    0.078617           1     63925           recvfrom
	  1.90    0.065479           3     20005           epoll_ctl
	  1.75    0.060324           3     20000           shutdown
	  1.22    0.041954           2     20008           close
	  0.03    0.001000         500         2           socketpair
	  0.02    0.000535          20        27        19 open
	  0.00    0.000077          19         4           munmap
	  0.00    0.000038           5         7           read
	  0.00    0.000000           0         1           write
	  0.00    0.000000           0         5           fstat
	  0.00    0.000000           0        19           mmap
	  0.00    0.000000           0        12           mprotect
	  0.00    0.000000           0         4           brk
	  0.00    0.000000           0         2           rt_sigaction
	  0.00    0.000000           0         1           rt_sigprocmask
	  0.00    0.000000           0         1         1 access
	  0.00    0.000000           0         1           socket
	  0.00    0.000000           0         1           bind
	  0.00    0.000000           0         1           listen
	  0.00    0.000000           0         2           setsockopt
	  0.00    0.000000           0         2           clone
	  0.00    0.000000           0         1           execve
	  0.00    0.000000           0         1           getcwd
	  0.00    0.000000           0         1           getrlimit
	  0.00    0.000000           0         1           arch_prctl
	  0.00    0.000000           0         3         1 futex
	  0.00    0.000000           0         2           sched_setaffinity
	  0.00    0.000000           0         1           sched_getaffinity
	  0.00    0.000000           0         1           epoll_create
	  0.00    0.000000           0         1           set_tid_address
	  0.00    0.000000           0         3           set_robust_list
	  0.00    0.000000           0         3           epoll_create1
	------ ----------- ----------- --------- --------- ----------------
	100.00    3.450284                229996       166 total


2014-03-24
	Am adaugat cancel_disable.
	ab -n20000 -c10 http://localhost:60000/100.html: 4763 req/sec sub perf
	FARA PERF: 4374 req/sec WTF?!
	perf report:
	14.45%  wpool2  [kernel.kallsyms]   [k] ep_poll
	13.84%  wpool2  [kernel.kallsyms]   [k] set_normalized_timespec
	13.63%  wpool2  [vdso]              [.] 0x0000000000000cb0
	 6.24%  wpool2  [kernel.kallsyms]   [k] read_hpet
	 5.71%  wpool2  [kernel.kallsyms]   [k] select_estimate_accuracy
	 5.41%  wpool2  libConn.so.1.0.33   [.] Conn_wpool_worker_func
	Probabil ca apelez gettimeofday de prea multe ori. Da, se pare ca
	0x0000000000000cb0 este gettimeofday.
	Daca scot sched_yield, cu perf record am 4778 req/s
	[pid 1787] SYS_mmap(0, 0x8000000, 0, 0x4022)                    = 0x7f5bb264f000
	[pid 1787] SYS_munmap(0x7f5bb264f000, 26939392)                 = 0
	[pid 1787] SYS_munmap(0x7f5bb8000000, 40169472)                 = 0
	Se pare ca se face un mmap si apoi imediat munmap. WTF?!
	Ulterior nu mai face.

	Eu			gwan
	epoll_wait		epoll_wait
	accept4			accept4
	mmap			-
	munmap			-
	munmap			-
	mprotect		-
	-			setsockopt(NODELAY)
	epoll_ctl		epoll_ctl
	epoll_wait		epoll_wait
	recvfrom		read
	-			setsockopt(NODELAY)
	-			open
	-			fstat
	-			open,stat,fstat,mmap,read,close,munmap
	sendto			writev
	shutdown		shutdown
	-			setsockopt(NODELAY)
	epoll_wait		epoll_wait
	-			epoll_ctl(DEL)
	-			shutdown!
	-			epoll_ctl(DEL)!
	close			close
	Clar pot mai bine de atit!

2014-03-11
	Use pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
	Maybe this way cancelation will not appear in perf reports.
2013-12-12
Concluzii:
	- Eu stau mult mai mult timp in epoll_wait. Very strange. E vorba de 22 de secunde in plus!
	- Eu chem accept4 cu 40000 mai mult! Fuck!
	- Cum naiba eu chem de 50.000 ori shutdown, fara erori, iar el cheama de 100.000 si dureaza mai putin?!
	- E incredibil cum reuseste. Doar daca syscall-urile mele sint intrerupte de prea multe ori.
	- Macar eu fac de 3 ori mai putine setsockopt.
Next steps:
	Nu mai chem accept4 inca o data, pentru ca veni cu notificarea.
	Din ce in ce mai putin cred in EPOLLET. What a fuck?!
	Decit sa fac un apel la accept, pe fiecare thread, mai bine apelez epoll_wait.

wpool2:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.87   35.710386         187    191175           epoll_wait
  1.26    0.459349           3    141576     91576 accept4
  0.52    0.189052           4     50000           sendto
  0.22    0.080552           2     50000           shutdown
  0.06    0.023073           0     50000           close
  0.03    0.010337           0     50306           recvfrom
  0.03    0.009216           0     50000           epoll_ctl
  0.02    0.007091           0     50000           setsockopt
------ ----------- ----------- --------- --------- ----------------
100.00   36.489056                633057     91576 total

gwan:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.60   13.464343          89    150462           epoll_wait
  0.91    0.124850           2     50000           writev
  0.49    0.067336           1    100000     50000 shutdown
  0.44    0.060369           1    101351     51351 accept4
  0.25    0.034678           0    150000           setsockopt
  0.13    0.017909           0    150000     50000 epoll_ctl
  0.11    0.015268           0     50123         4 read
  0.07    0.009198           0     50002           close
------ ----------- ----------- --------- --------- ----------------
100.00   13.795050                802006    151355 total


2013-12-10: after -O3
				me
	prof: -c2+5000		8%
	-c2+50000		3719

2013-11-24: on r1 (after putting free structures in front of free list)
			me
	-c10+50000		5034
	-c2+50000	3777	4700
	-c1+50000	2973	3900

2013-11-21: on r1
			me	gwan
	-c1+50000	1703	3206
	-c2+50000	3619	4900
	-c10+50000	3647	5028

2013-11-20: on r1 (after doing allocations per thread):
			me
	branch 1+5000	7
	-c1+50000	
	-c2+50000	5093!broken?

2013-11-17: on r1 (after doing accept in all workers):
			me	me+Log
	branch 1+5000	6.4!
	-c1+50000	2180
	-c2+50000	3400
2013-11-16: on r1:
			me	me+Log		gwan
	branch 1+5000	9.5%
	-c1+50000:	3882			2094!
	-c2+50000:	4100			4927
2013-11-13: on r1: ~4270 req/s (-n50000 -c2 + taskset + nice) branch mispredict: 8% K:3.11.4-201 so(Conn)=448 gwan:-c1:~2000 me-c1:3875
2013-11-12: on r1: ~3850 req/s (-n50000 -c2 + taskset + nice) branch mispredict: 9% K:3.11.4-201 so(Conn)=480

== SHOWSTOPPERS ==
[ ] Call Conn_ws_free when freeing a Conn.
[ ] Make sure we compile with -O3
[ ] Should we call again accept or go to poll mode? I think we should go to poll.
[ ] Compile with -s to obtain profiling on assembly code.
[ ] We may get rid of NODELAY because we write and do shutdown. I hope
	this is triggering a flush. To test.
[ ] Prima data, ar trebui sa ignor O pentru ca nu am cum sa am ceva in buffer.
[ ] Imi trebui un mecanism, preferabil fara locking, ca sa trimit statistici catre master.
	Eventual doar la cerere, ca sa evit trafic inutil.
	Dar, conexiunea pentru statistici, o sa vina pe un worker.
	Probabil ca pot sa fac o semnalizare prin pipe. Copiez intr-un buffer
	statisticile curente, apoi trimit pointer-ul prin pipe. Aste pentru update.
	In momentul in care vine o cerere de statistici, trebuie sa le cer de la master
	si apoi sa le servesc.
[ ] Stop using callbacks for send/receive to speed up operations.
[ ] We should do not call initial out hook. We can just try to send at first
	kick and react to EAGAIN. Very probably we can send.
[ ] Probabil ca o sa avem structuri diferite pentru ce seteaza clientul
	(Conn_alloc/commit) si alta pentru bookkeeping-ul intern.
[ ] Move main pollfd to all threads. Tis way they will be "equal" and every
	core will be at full speed without migrations.
[ ] Check with gdb why we get a segmentation fault in line 2267.
[ ] Limit the number of acepts to not starve read/write.

== HIGH PRIORITY ==
[ ] Should we do Conn_now per thread? It is updated from all worker threads!
[ ] Verify likely/unlikely. I suspect are not working correctly.
[ ] http://lwn.net/Articles/257209/
[ ] http://highscalability.com/blog/2013/5/13/the-secret-to-10-million-concurrent-connections-the-kernel-i.html
[ ] http://fasterdata.es.net/host-tuning/linux/
[ ] Investigate MSG_MORE when sending.
[ ] When init Conn, preallocate a 1 worker wp and set it and when user requests
	another wp, just put(wp) and set the new one? Or at commit time?
	Use cases: want to alloc 1 core for a listen port and for other
	many cores.
[ ] Split Conn_poll_cb into MASTER/NON_MASTER and do not make it callback
	but inline.
[ ] ->next pointer can be removed from struct Conn.
	This way I can save a lot of space.
[ ] Do not make the fd -1, is pointless.
[ ] Replace Conn_X with Conn_get_socket_X!
[ ] Use shutdown(2) before closing connection. Done, but see the link.
	http://www.developerweb.net/forum/archive/index.php/t-2940.html.
[ ] Switch all pointers to callbacks to a single callback with paramenters +
	a flag that will say for what type of callbacks to call the callback.
	What happends when I want to change one callback?
[ ] Nu pare ca inchid conexiunea: fac shutdown, dar atit.
[ ] Conn_free_intern is not called. Because of callbacks?
[ ] Try to alloc bigger chunks for wpool and maybe other stuff.
[ ] Alloc private area just after Conn structure. Add a function to set private
	size.
[ ] Set on master socket the needed in/out buffer sizes and inherit to accepted
	ones. This is because we may need different buffers for different masters.
[ ] Daca am luat HUP, nu mai trebuie sa permit parsarea in continuare!
[ ] Ar trebui sa-l scoatem din lista de active C-ul caruia ii facem free.
[ ] Align Conn structures to 8 bytes in allocations blocks.
[ ] Investigate the idea to put free buffers in front of the queue because
	they are hot.
[ ] 


== LOW PRIORITY ==
[ ] Use enums for enum types.
[ ] Cache getaddrinfo responses
[ ] Investigate moving TCP stack in userspace.
[ ] Conn_join(C1, C2) (Bridge 2 connections together for proxy stuff.)
[ ] See http://highscalability.com/blog/2012/9/10/russ-10-ingredient-recipe-for-making-1-million-tps-on-5k-har.html
[ ] Dump all memory statistics
[ ] SCTP
[ ] .error_state -> error_type
[ ] if (.error_state...) -> if (.state == CONN_STATE_ERROR)
[ ] Audit CONN_STATE_EMPTY vs CONN_STATE_FREE
[ ] Add a function to set the maximum number of connections.
[ ] Fix the whole list scanning for expiration, band and closing.
[ ] Put callbacks in a structure to free some space from struct Conn.
[ ] wpool: When we free a Conn structure, we have to Conn_del_wp!
[ ] wpool: What if we add master sockets also to workers and do nothing in main
	thread? Check ma.c example. Verified: accept wakes up only one thread.
	Still to check if epoll wakes in all threads! Seems it wakes all threads!
	Not very good.
[ ] Investigate splice.
[ ] Investigate MSG_MORE as an alternative to CORK or writev.
[ ] Check if we are swapping and warn.
[ ] Log faults and io.
[ ] Add access control
	Conn_ac_set_default(C, CONN_AC_DENY) - default deny (or CONN_AC_ALLOW)
	Conn_ac_add(C, CONN_AC_ALLOW, "2001::1/64"); - for ipv6
	Conn_ac_add(C, CONN_AC_ALLOW, "192.168.0.0/25"); - for ipv4
[ ] A la redir stuff
[ ] Check PACKET: can we send with "send" without knowing the MAC?
[ ] UDP
[ ] Ce se intimpla daca se ajunge la ~ sfirsitul buffer-ului si nu pot inca sa
	procesez datele? We should log and close the connection. It is
	programmer's fault or a DoS.
[ ] Queue for delete/trytoconnect/etc.

Performance:
[ ] net.core.somaxconn
[ ]	Take care for /proc/net/netstat
[ ]	/proc/sys/net/ipv4/tcp_mem
	Now (512M): 49152 65536 98304
	Now (256M): 24576 32768 49152 - 55 conns/sec

	Test with: 80000 120000 240000 - 92 conns/sec
	Test with 160000 240000 480000 - 96 conns/sec

	After:
		echo "16000 64000 512000" > tcp_[rw]mem - 96

	After echo 1 > /proc/sys/net/ipv4/tcp_low_latency - 156 conns/sec

Pentru a reduce numarul de conexiuni in TIME-WAIT:
	echo 200 > /proc/sys/net/ipv4/tcp_max_tw_buckets

[ ] Add loadbalancing and failover in the base code.
[ ] Automaticaly put \0 at the end of receive data. What for?!
[ ] Add the possibility to wait for an char/string before calling recv/data callback.
	Maybe do this with socket filtering or in kernel?
[ ] Change socket buffer accordingly with user settings to minimize
	needed memory.
[ ] Dump how many memory is in use vor various parts of the internal data.
[ ] Do not mix slot and id and fd in examples.
[ ] Test suite
[ ] Free memory when the number of connections is going down.
[ ] Bandwidth part should have a separate pointer, to not load too much Conn structure.
[ ] Maybe we should have Bandwidth classes so we can group connections.
[ ] http://www.erlang-solutions.com/thesis/tcp_optimisation/tcp_optimisation.html
[ ] 

=== When we switch to Conn version 2 library ===
[ ] Conn_socket will call Conn_socket_proto
[ ] use enums!
[ ] http://urbanairship.com/blog/2010/09/29/linux-kernel-tuning-for-c500k/
[ ]

Mode	Type	Size	Ref	File
100644	blob	129	38f2534580e0aace0e6a5b49d79ada2c2ca162be	.exclude
100644	blob	155	f46bab40e9d32df6e6ef7ab643931d4e1019d9bf	.gitignore
100644	blob	169	c003c095218f64ad33aeb89987f61eb575557d96	.mailmap
100644	blob	1945	fecf0e7a7e8580485101a179685aedc7e00affbb	Changelog.pre109
100644	blob	79498	7ad881b5be242d5fdac3ceb550ab4d140344db49	Conn.c
100644	blob	5849	c20cfbd2cfe0b094ed52e61b428b92da19b5daae	Conn.h
100644	blob	917	5423bbceb9236d56d8ee827877d8b6d65986c490	Conn.spec.in
100644	blob	747	662c3f3fe8d0a3d23770631d7a0a260719d81e62	Conn_config.h.in
100644	blob	5507	95294798236381d591db36ef84ab53040183fccf	Conn_intern.h
100644	blob	10966	95be3dead1c6fee0917051220e74aef99f49daed	Conn_web.c
100644	blob	93	4754320eef2b558b97b9c75bd01e545f102670b7	Conn_web.h
100644	blob	30	d987fa5df957830331139935d517009e2911b0cf	INSTALL
100644	blob	25275	92b8903ff3fea7f49ef5c041b67a087bca21c5ec	LICENSE
100644	blob	1215	0518f7d179939cbc33d13f6626fc45f94db1bf68	Makefile.in
100644	blob	29	e214257f87a28e8fb0413b627cf7ee76ade2e94c	Makefile.include.in
100644	blob	192	5b11bdfb23857d8588845465aef993b320596b44	README
100644	blob	19376	5fbc10d9b77d79cd5f62c5f972cc4f7171a4c4f5	TODO
100755	blob	30	92c4bc48245c00408cd7e1fd89bc1a03058f4ce4	configure
040000	tree	-	d4c9c4a69c5cfa2a84316967185f1661b6817779	docs
100755	blob	16779	274273c95ecfd0d46b63e1e3e8fbd24c204586c9	duilder
100644	blob	1274	25314f6d244d5e701d7c2a22826f00f2cf242651	duilder.conf
040000	tree	-	8373602f32aa7000178a7d17fac694f480c246cd	examples
040000	tree	-	5643f06c34660e576e6c5d0dee5ac74a2bf34f51	tests

Hints:
Before first commit, do not forget to setup your git environment:

git config --global user.name "your_name_here"
git config --global user.email "your@email_here"

Clone this repository using HTTP(S):

git clone https://rocketgit.com/user/catalinux/Conn

Clone this repository using ssh (do not forget to upload a key first):

git clone ssh://rocketgit@ssh.rocketgit.com/user/catalinux/Conn

Clone this repository using git:

git clone git://git.rocketgit.com/user/catalinux/Conn

You are allowed to anonymously push to this repository.
This means that your pushed commits will automatically be transformed into a merge request:

... clone the repository ...
... make some changes and some commits ...
git push origin main