Both 'scat singing' and 'mouth music' use words (though mostly they are meaningless words), usually called 'vocables' for convenience. Birdsong has been shown to have local dialects, though whether or not the sounds involved are (from a bird's point of view) discrete words with assignable meanings is quite another matter. Perhaps the confusion here arises from the American tendency to call 'tunes' (music without words) 'songs', while in other parts of the world we tend to think of a song as invariably having words of some sort.
Levitin's thesis isn't new. Music and language are interdependent, and without the former there's a good chance we wouldn't have the latter. It sounds like a very interesting book, though.