Actually yes, that *is* it.
The tracks don't mix. They stay separate in the file, and most players only play Track 1 by default. Even those that do play other tracks, only play one at a time. That's meant for alternate languages, descriptions for the visually impaired, stuff like that. Some people use them in intermediate files, so they can mix them later in post-production.
If you want multiple things to be heard at the same time, you need to put them on the same Track.