thr3ads.net - Icecast dev - [icecast-dev] ICECAST enseirb group [Aug 2004]

If this information is useful, please help other people find it:
Share via:

Michael Smith

2004-Aug-06 14:57 UTC

[icecast-dev] ICECAST enseirb group

On Tuesday 11 March 2003 10:24, Arnaud Ebalard wrote:> Hello,
> after the reply of Jerome Alet, I forward our questions
> to you. I hope it is now the right place to put it.
>
> Could you please send your answers, questions, comments
> to  interopcast-general@nongnu.org so everyone in the
> group can get a copy of it.
>
> Thank you.
Just a quick note - there are some good questions worth real consideration in 
here, but it's late here and I haven't got time to answer them all - but
I
will ASAP. 

icecast-dev@xiph.org is the right place for this sort of question - I've 
forwarded this one to interopcast-general@nongnu.org as requested, but won't
do so for full replies to all of these questions.

Mike

--- >8 ----
List archives:  http://www.xiph.org/archives/
icecast project homepage: http://www.icecast.org/
To unsubscribe from this list, send a message to
'icecast-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Arnaud Ebalard

2004-Aug-06 14:57 UTC

head link

[icecast-dev] ICECAST enseirb group

Hello,
after the reply of Jerome Alet, I forward our questions
to you. I hope it is now the right place to put it.

Could you please send your answers, questions, comments
to  interopcast-general@nongnu.org so everyone in the
group can get a copy of it.

Thank you.

----- Original Message -----
From: "Jerome Alet" <alet@librelogiciel.com>
To: <interopcast-general@nongnu.org>
Sent: Sunday, March 09, 2003 9:05 PM
Subject: Re: [Interopcast-general] ICECAST enseirb group

<p>> Hi,>
> I fear that this list is the wrong one for this sort of message, which
> should be directed to icecast-dev@xiph.org
>
> AFAIK this list is more about IceCast interoperability, documentation,
> translations, etc...
>
> bye,
>
> PS : really interesting anyway
>
> Jerome Alet
>
> On Sun, Mar 09, 2003 at 07:47:02PM +0100, Arnaud Ebalard wrote:
> > Hello, everybody, here is the ENSEIRB group (Sébastien, Florian,
Jérémy,
> > Denis, Mickaël, Antoine, Guiilaume and me).
> > We are trying to understand in details how ICECAST server works. The
reading> > of sources is quite difficult for the numerous parts that come with no
> > comments. We have some questions that arose at the reading of sources.
There> > are probably many stupid things or misunderstandings in our questions.
> > Excuse us for those one. We are waiting for answers before beginning
working> > on the sources. Let us know your ideas about the project, the
directions
you> > want to follow and the parts you are working on.
> >
> > ----------------------------------------
> > THE QUESTIONS :
> >
> > In file connection.c, the couple of functions _add_connection and
> > _get_connection work on the list structure _queue : insertion of a
> > connection by mean of _add_connection is made in a single operation
while> > getting a connection by mean of _get_connection is made by scanning
the
> > whole list to find the oldest connection in the list. This seems to be
> > hugely inefficient and quite blocking for other threads that are
trying
to> > access the list simultaneously. A structure of circularly
single-chained
> > list could be more efficient : insertion of an element and access to
the
> > last element can be done in a single operation.
> >
> > ----------------------------------------
> >
> > We saw that the stat module worked with an avl in order to keep the
global> > parameters ( _stats.global_tree ) and "per source"
parameters (
> > _stats.source_tree ) : we don't understand the use of an avl for
the
global> > parameters. Can't we directly use fields in the _stats_t structure
instead> > of an avl. It would be more efficient than searching the avl for a key
and> > the asociated value. The values are saved as strings in the avls : so
we
> > keep converting strings to integer, incrementing them, and then
reconverting> > them to srings like in the function stats_event_inc. Can this be
improved or> > is there anything we didn't understood or saw ?
> >
> > ----------------------------------------
> >
> > In _handle_connecion(), after stats_event_inc(NULL,
"connections"); we
make> > sock_set_blocking(con->sock, SOCK_BLOCK); and then use
> > util_read_header(con->sock, header, 4096) in order to read the
header of
the> > request send by the client. Why putting the socket in SOCK_BLOCK mode
?
Why> > receiving data from the socket byte per byte ? We know there can be
> > something behind the header that we would read on the socket if we
receive> > more than one byte at once : can't we get the complete request of
the
client> > (header + data), and then parse this to find the '\n\n' (end
of the
header)> > and suppress '\r' ?
> >
> > ----------------------------------------
> >
> > In function _handle_conection in file connection.c, the buffer has a
size of> > 4096 bytes. Is there any particular reason for this, why not use a
constant> > macro ?
> >
> > ----------------------------------------
> >
> > In _slave_thread function, there is :
> >
> > ...
> > while (_initialized)
> > {
> >  if (max_interval > ++interval)
> >    {
> >     thread_sleep(1000000);
> >     continue;
> >    }
> >  else
> > ...
> >
> > Can't we do something like :
> > ...
> > while (_initialized)
> > {
> >  if (max_interval > interval)
> >    {
> >     thread_sleep(1000000 * (max_interval - interval));
> >    }
> >  else
> > ...
> >
> > ----------------------------------------
> >
> > In fserv_thread_function() function in fserve.c file, can
> > avl_tree_unlock(client_tree); be made just after client_node > >
avl_get_first(client_tree); instead of being done at the end or in the
> > if(!client_node) { .... } block ? Are we wrong ? If yes, why ?
> >
> > ----------------------------------------
> >
> > In fserv_thread_function() function in fserve.c file, can something
wrong> > happen if we are between avl_tree_unlock(pending_tree); and
> > avl_tree_unlock(client_tree); in :
> >
> > ...
> > avl_tree_unlock(pending_tree);
> > avl_tree_unlock(client_tree);
> > thread_cond_wait(&fserv_cond);
> > continue;
> > ...
> >
> > if another thread add a client in the pending three before this thread
> > reaches the thread_cond_wait(&fserv_cond); line ? Could a client
be
trapped> > and not handled because this 3 lines are not an atomic action ?
> >
> > ----------------------------------------
> >
> > When using the select() function, we always calculate the biggest
value
for> > socket descriptor in order to pass it as first argument. Why don't
we
use> > FD_SETSIZE ? Is that more efficient or is there another reason to keep
> > calculate the biggest socket descriptor ?
> >
> > ----------------------------------------
> >
> > This is a big question for us : we saw the use of avl in order to
store
all> > the clients (in client_tree ). The thing we also noticed is that all
the
> > accesses we saw where made using avl_get_first() and avl_get_next().
We
> > didn't saw any other access made by using a key or , for example,
the
> > connection id. Wouldn't it be better to use a simple circularly
linked
list> > : the accesses and insertions would be faster (constant!). Are we
wrong
? If> > yes, why ?
> >
> > ----------------------------------------
> >
> > In sock_write_bytes(), there is a "sanity check" which can
return
> > SOCK_ERROR. If this happens, there can be a problem in the calling
function> > because this one will then verify the kind of return using
> > sock_recoverable(sock_error()). And sock_error() will return something
wrong> > because of errno or WSAGetLastError() are not concerned by the return
of
the> > "sanity check".
> >
> > ----------------------------------------
> >
> > In fserv_thread_function , there is a first scan of the client tree
(with> > avl_get_first(client_tree) ), in order to serve the client. In the
same
time> > the value of client->con->error is set to 1 if the client was
completely
> > served (end of file). Then, there is another complete scan of the
> > client_tree in order to delete all the node for which the client has
got
a> > client->con->error set to 1. This is time consuming. If we used
a list
(as> > we said before) to store the clients we could efficiently do the
service
of> > the clients and, in the same time, delete the completely served
client.
> >
> > ----------------------------------------
> >
> > Can someone explain us the strategy of source_find_mount, and
particularly> > the aim of that line :
> >
> > ...
> > node = global.source_tree->root->right;
> > ...
> >
> > Why not using source_compare_sources ?
> >
> > ----------------------------------------
> >
> > Where does the avl module come from ? What are his performances ?
> > possibilities ? limitations ?
> >
> > ----------------------------------------
> >
> > What can ICECAST do with *.m3u files ?
> >
> > ----------------------------------------
> >
> > This is probably a quite stupid question but I'd like to know why
ICECAST> > isn't a multiprocessing application? The main problem is that
global
> > variables can be easily used in multithread applications but with
> > difficulties in multiprocessing applications if modules are quite
bound
> > together. In ICECAST, something could probabaly be done to make the
fserve> > module and the source module run in differents processes. This would
be
> > great for Multiprocessing Systems to separate this to service on 2
> > processors. We could serve more clients at the same time ( Am I wrong
? ).> > Is the limitation induced by bandwith on the server and not on CPU ?
> >
> >
> >
> >
> > Thank you for reading these questions.
> > Please, let us know your comments and ideas about what we wrote and
what
you> > think.
> >
> > Arnaud, for the ENSEIRB group.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Interopcast-general mailing list
> > Interopcast-general@nongnu.org
> > http://mail.nongnu.org/mailman/listinfo/interopcast-general
>
> --
> (c) 1928-???? The letters U, S, and A and the U.S.A. flag are
> the copyrighted property of Disney Enterprises Inc.
> Any infringment will be punished with the death penalty.
> Details :
http://yro.slashdot.org/article.pl?sid=03/01/15/1528253&tid=123
>
>
> _______________________________________________
> Interopcast-general mailing list
> Interopcast-general@nongnu.org
> http://mail.nongnu.org/mailman/listinfo/interopcast-general
>
>
<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
icecast project homepage: http://www.icecast.org/
To unsubscribe from this list, send a message to
'icecast-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Michael Smith

2004-Aug-06 14:57 UTC

head link

[icecast-dev] ICECAST enseirb group

Ok, lots of good questions here, I'll try and respond to them...
> > On Sun, Mar 09, 2003 at 07:47:02PM +0100, Arnaud Ebalard wrote:
> > > Hello, everybody, here is the ENSEIRB group (Sébastien, Florian,
> > > Jérémy, Denis, Mickaël, Antoine, Guiilaume and me).
> > > We are trying to understand in details how ICECAST server works.
The
>
> reading
>
> > > of sources is quite difficult for the numerous parts that come
with no
> > > comments. We have some questions that arose at the reading of
sources.
>
> There
>
> > > are probably many stupid things or misunderstandings in our
questions.
> > > Excuse us for those one. We are waiting for answers before
beginning
>
> working
>
> > > on the sources. Let us know your ideas about the project, the
> > > directions
>
> you
>
> > > want to follow and the parts you are working on.
> > >
> > > ----------------------------------------
> > > THE QUESTIONS :
> > >
> > > In file connection.c, the couple of functions _add_connection and
> > > _get_connection work on the list structure _queue : insertion of
a
> > > connection by mean of _add_connection is made in a single
operation
>
> while
>
> > > getting a connection by mean of _get_connection is made by
scanning the
> > > whole list to find the oldest connection in the list. This seems
to be
> > > hugely inefficient and quite blocking for other threads that are
trying
>
> to
>
> > > access the list simultaneously. A structure of circularly
> > > single-chained list could be more efficient : insertion of an
element
> > > and access to the last element can be done in a single operation.
This is true - the current implementation here is not particular efficient. 
This could be solved very simply by just keeping a second pointer to the end 
of the queue, or in any number of other ways.

I'd be happy to apply a patch to implement this, but I'm not going to go
to
the effort of doing it myself - this is NOT something where efficiency 
matters at all - the queue will most often be of length zero or one. Longer 
queues are possible but unlikely to happen frequently.
> > >
> > > ----------------------------------------
> > >
> > > We saw that the stat module worked with an avl in order to keep
the
>
> global
>
> > > parameters ( _stats.global_tree ) and "per source"
parameters (
> > > _stats.source_tree ) : we don't understand the use of an avl
for the
>
> global
>
> > > parameters. Can't we directly use fields in the _stats_t
structure
>
> instead
>
> > > of an avl. It would be more efficient than searching the avl for
a key
>
> and
>
> > > the asociated value. The values are saved as strings in the avls
: so
> > > we keep converting strings to integer, incrementing them, and
then
>
> reconverting
>
> > > them to srings like in the function stats_event_inc. Can this be
>
> improved or
>
> > > is there anything we didn't understood or saw ?
The stats_t structure doesn't have any fields (apart from pointers to the
avl
trees). The design is like this to allow any sort of stats to be added to the 
tree without having to have them registered in a central place (in stats.h). 
This is a debatable decision, and the design does have some performance 
implications (like the string->integer->string conversions), but I
don't
think they're serious. 

The stats code generally needs a moderate amount of attention - it's not
used
too consistently, among other things. Designing better ways to do this would 
be worth considering.

<p>> > >> > > ----------------------------------------
> > >
> > > In _handle_connecion(), after stats_event_inc(NULL,
"connections"); we
>
> make
>
> > > sock_set_blocking(con->sock, SOCK_BLOCK); and then use
> > > util_read_header(con->sock, header, 4096) in order to read the
header
> > > of
>
> the
>
> > > request send by the client. Why putting the socket in SOCK_BLOCK
mode ?
>
The header-reading code is not prepared to deal with non-blocking I/O, so it 
has to be in blocking mode. 

In the current design (this could be reconsidered and perhaps changed), all 
connections are handled initially in blocking mode, then when handoff to a 
source happens (for client connections, and/or for file-serving connections, 
but not for any other sort), they're flipped back to non-blocking mode. This
design decision was made as a compromise between scalability/efficiency and 
simpliciity of implementation. The vast majority of client connection time is 
generally spent in non-blocking mode being handled by a single source thread, 
so this is not unreasonable.
> Why
>
> > > receiving data from the socket byte per byte ? We know there can
be
> > > something behind the header that we would read on the socket if
we
>
> receive
>
> > > more than one byte at once : can't we get the complete
request of the
>
> client
>
> > > (header + data), and then parse this to find the '\n\n'
(end of the
>
> header)
>
> > > and suppress '\r' ?
This is a good suggestion, but would require a generic buffering layer in the 
socket I/O code. This hasn't been done, but certainly could be. Again, this 
is a minor inefficiency where the additional CPU load fades into 
insignificance against the bulk of the work done. Optimising things with no 
visible cpu impact is lower-priority (at least for the active developers) 
than bugfixes or adding neccesary features.
> > >
> > > ----------------------------------------
> > >
> > > In function _handle_conection in file connection.c, the buffer
has a
>
> size of
>
> > > 4096 bytes. Is there any particular reason for this, why not use
a
>
> constant
>
> > > macro ?
This constant is fairly arbitrary. It could be a named (#defined) constant, 
there's no good reason for it not being so.
> > >
> > > ----------------------------------------
> > >
> > > In _slave_thread function, there is :
> > >
> > > ...
> > > while (_initialized)
> > > {
> > >  if (max_interval > ++interval)
> > >    {
> > >     thread_sleep(1000000);
> > >     continue;
> > >    }
> > >  else
> > > ...
> > >
> > > Can't we do something like :
> > > ...
> > > while (_initialized)
> > > {
> > >  if (max_interval > interval)
> > >    {
> > >     thread_sleep(1000000 * (max_interval - interval));
> > >    }
> > >  else
No. The sleep is kept short so that in _initialized goes to zero (as on a 
shutdown request), the loop exits reasonably quickly. The ideal 
implementation uses signals to break out of the sleep system call rather than 
relying on this, but relying on correct behaviour in a program that mixes 
threads and signals tends to cause significant portability issues (whilst 
it's clearly specified, non-compliant implementations are very common).

<p>> > > ...> > >
> > > ----------------------------------------
> > >
> > > In fserv_thread_function() function in fserve.c file, can
> > > avl_tree_unlock(client_tree); be made just after client_node >
> > avl_get_first(client_tree); instead of being done at the end or in the
> > > if(!client_node) { .... } block ? Are we wrong ? If yes, why ?
Looks like it could be. I wrote that code at about 2am one night, it's far 
from clear. They're more or less equivalent, though - there can be no 
contention for the client lock there, so it doesn't matter (come to think of
it.. I'm not entirely sure why I'm bothering to use locks on the client
tree
there at all - they should only be needed on the pending tree. The same is 
probably also true of the client_tree for each of the sources - actually, no, 
the admin stuff needs to be able to grab read locks here). 

This is a good point, I'll take a good look at it in detail when I have a 
little more time, and when I'm more awake.
> > >
> > > ----------------------------------------
> > >
> > > In fserv_thread_function() function in fserve.c file, can
something
>
> wrong
>
> > > happen if we are between avl_tree_unlock(pending_tree); and
> > > avl_tree_unlock(client_tree); in :
> > >
> > > ...
> > > avl_tree_unlock(pending_tree);
> > > avl_tree_unlock(client_tree);
> > > thread_cond_wait(&fserv_cond);
> > > continue;
> > > ...
> > >
> > > if another thread add a client in the pending three before this
thread
> > > reaches the thread_cond_wait(&fserv_cond); line ? Could a
client be
>
> trapped
>
> > > and not handled because this 3 lines are not an atomic action ?
Yes. Good catch. Obviously a very unlikely case, but this should be fixed. The 
client won't be permanently trapped even in this case - only trapped until 
the next file-serving client is added. 

This requires a fair bit of work in the avl and thread modules, I think. 
Unless.. it might be ok to just throw away the rwlocks here (there aren't 
multiple readers) and switch to using mutexes. Still requires some work in 
the thread module. 

<p>> > >> > > ----------------------------------------
> > >
> > > When using the select() function, we always calculate the biggest
value
>
> for
>
> > > socket descriptor in order to pass it as first argument. Why
don't we
>
> use
>
> > > FD_SETSIZE ? Is that more efficient or is there another reason to
keep
> > > calculate the biggest socket descriptor ?
This is more efficient, and is the "correct" way to use select().
poll() is
used in preference to select anyway. Some efficiency work would be nice with 
both of these, though, so that the poll array or select bitset is kept around 
rather than being recreated every time.
> > >
> > > ----------------------------------------
> > >
> > > This is a big question for us : we saw the use of avl in order to
store
>
> all
>
> > > the clients (in client_tree ). The thing we also noticed is that
all
> > > the accesses we saw where made using avl_get_first() and
> > > avl_get_next(). We didn't saw any other access made by using
a key or ,
> > > for example, the connection id. Wouldn't it be better to use
a simple
> > > circularly linked
>
> list
>
> > > : the accesses and insertions would be faster (constant!). Are we
wrong
>
> ? If
>
> > > yes, why ?
As things stand currently, the use of avl trees here is unneccesary. However, 
there will be functionality fairly soon (for admin usage) where lookup for 
key or connection id will be important.
> > >
> > > ----------------------------------------
> > >
> > > In sock_write_bytes(), there is a "sanity check" which
can return
> > > SOCK_ERROR. If this happens, there can be a problem in the
calling
>
> function
>
> > > because this one will then verify the kind of return using
> > > sock_recoverable(sock_error()). And sock_error() will return
something
>
> wrong
>
> > > because of errno or WSAGetLastError() are not concerned by the
return
> > > of
>
> the
>
> > > "sanity check".
Not sure what you mean here - sock_error() correctly returns errno, which is 
what sock_recoverable() is designed to use. The return value is either 
positive (normal case) or SOCK_ERROR indicating that some error occurred. 
This is perfectly self-consistent. Can you explain in more detail why you 
think this is wrong?
> > >
> > > ----------------------------------------
> > >
> > > In fserv_thread_function , there is a first scan of the client
tree
>
> (with
>
> > > avl_get_first(client_tree) ), in order to serve the client. In
the same
>
> time
>
> > > the value of client->con->error is set to 1 if the client
was
> > > completely served (end of file). Then, there is another complete
scan
> > > of the client_tree in order to delete all the node for which the
client
> > > has got
>
> a
>
> > > client->con->error set to 1. This is time consuming. If we
used a list
>
> (as
>
> > > we said before) to store the clients we could efficiently do the
> > > service
>
> of
>
> > > the clients and, in the same time, delete the completely served
client.
Yes. The fileserving code was written late at night, it's not exactly the
most
efficient stuff I've ever written. However, again, the tree lookups are more
efficient when we need to do other stuff (like the admin interface saying 
"remove this client") - though that's not complete yet, the
underlying system
is designed to deal with this case efficiently.
> > >
> > > ----------------------------------------
> > >
> > > Can someone explain us the strategy of source_find_mount, and
>
> particularly
>
> > > the aim of that line :
> > >
> > > ...
> > > node = global.source_tree->root->right;
> > > ...
> > >
> > > Why not using source_compare_sources ?
Not sure, I didn't write that bit. The code looks correct and reasonably 
efficient, though. 
> > >
> > > ----------------------------------------
> > >
> > > Where does the avl module come from ? What are his performances ?
> > > possibilities ? limitations ?
Maybe Jack can answer this one? I don't know much detail about the avl 
implementation, other than how to use them.
> > >
> > > ----------------------------------------
> > >
> > > What can ICECAST do with *.m3u files ?
Serve them (via the fileserving functionality), and autogenerate them for 
files and sources.
> > >
> > > ----------------------------------------
> > >
> > > This is probably a quite stupid question but I'd like to know
why
>
> ICECAST
>
> > > isn't a multiprocessing application? The main problem is that
global
> > > variables can be easily used in multithread applications but with
> > > difficulties in multiprocessing applications if modules are quite
bound
> > > together. In ICECAST, something could probabaly be done to make
the
>
> fserve
>
> > > module and the source module run in differents processes. This
would be
> > > great for Multiprocessing Systems to separate this to service on
2
> > > processors. We could serve more clients at the same time ( Am I
wrong
>
> ? ).
>
> > > Is the limitation induced by bandwith on the server and not on
CPU ?
Icecast is heavily multithreaded - the threads can and will run on different 
cpus if you have a multi-CPU machine. So yes, you're wrong.

Additionally, you'd be hard-pressed to find a system where you were
cpu-bound
due to icecast, and not network-bound. icecast is quite capable (assuming 
your network hardware is) of saturating even gigabit ethernet without the cpu 
usage of icecast itself becoming excessively high (however, the cpu usage of 
the kernel-level networking implementation will tend to get pretty high in 
this case).

<p>Mike

--- >8 ----
List archives:  http://www.xiph.org/archives/
icecast project homepage: http://www.icecast.org/
To unsubscribe from this list, send a message to
'icecast-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Apparently Analagous Threads

Search for more seemingly similar threads

Icecast dev - Aug 2004 - ICECAST enseirb group

[icecast-dev] ICECAST enseirb group

[icecast-dev] ICECAST enseirb group

[icecast-dev] ICECAST enseirb group

Apparently Analagous Threads