In order to live stream with any peer-to-peer technology, the systems needs to create its own video from its camera and receive at the minimum one video. To stream a composited video all the videos need to be combined into a another single video.
Compositing: WebRTC is sending chunks, (or small less than a second), series of mp4 videos each of which can have a different resolution, and different frame rates and are not synced. To join these into a single video the resolution needs to be adjusted, and the frames synchronize. If the CPU, or bandwidth required is beyond that of the system then frames are lost. If memory required is beyond that of the system, it tries to use slower external storage which can not keep up and the system grinds to a halt. The compositing can be memory intensive, as a frame for each of the seperate videos needs to be in memory while a composited frame is being constructed. The CPU needs to be able to decompress each stream into their frames and then compress a combined frame to be streamed out over the network connection.
Most small computers can handle receiving one video and encoding and transmitting another with webRTC within the browser; But, to scale more CPU power is needed and bandwidth is limited for most peer systems. When bandwidth is not the limiting factor, CPU power and memory become the limiting factors. As noted by Nerd on the Street.
A tablet may be able to play four videos at once, but when the extra work of keeping the most current frame for each video in memory, then compositing each of these frames into a new frame ... the tablet does not have the ability. The tablet is displaying the frame for each video and then it gets the next frame. Host systems tend to have a mean and lean setup they don't keep a lot of information in memory instead they send the information out and are ready for the next request. Host systems are normally scaled to deliver lots of data to lots of consumers, doing as little processing as possible to provide the content.
Zoom and other video conferencing systems are configured to process the content or video either on the delivery system itself or on a second computer that feeds the delivery system. Having the processing ability on the host does not allow the host to deliver the content to more people.
Jibri is handling the compositing by using a virtual Chrome browser for the compositing; Then capturing that composite into a composited stream. Note, jibri.conf allows Jibsi/Jibri to send the stream to another system that will deliever the stream out in scale to a large number of viewers.
Iptables can be used to defend against DDOS against both the compositing system and the delivery system.
Alternatively, the peer system with enough CPU power can use OBS to capture the composite and send it to a host; But the takeaway is using an additional system other than the one being used to deliever the streaming the video to the public is a good idea, and that system could be a second backend system on a virtual host that is only used by the hour.