The reason is scientific.
Listening to our voice while we speak or sing takes place both by air (external) and by bone (internal). On the other hand, when we listen to a recording, the sound reaches our ears only externally, distorting our perception. For this reason, generally the perception of our sound in real time is deeper and the timbre we perceive is darker and more pleasant.
It is like seeing ourselves in the mirror or in a photograph: generally we prefer the mirror.
Photography is merciless.