In principle it’s not terribly complicated (not to trivialize the work Marco has devoted to make it work well and sound natural):
When the amplitude of the audio is below a certain threshold for a certain length of time, increase the playback speed until the amplitude is above threshold. The longer the audio is below-threshold, the greater the increase in playback. Shorter pauses are sped-up less, longer pauses sped up more so the effect is a bit more natural and subtle.
It would be a bit easier to just detect “silence” (defined as any audio below a certain level defined by the developer) and not play those passages back, but that would usually produce a rather aggressivly-condensed result. Marco has been careful about making it effective without being really noticeable.