Self-supervised monocular depth estimation is a significant task for low-cost and efficient 3D scene perception in endoscopy. However, the variety of illumination conditions and scene features is still the primary challenge for depth estimation in diverse endoscopic scenes. In this work, a self-supervised framework is proposed for monocular depth estimation in diverse endoscopy. Firstly, considering the diverse features in endoscopic scenes with different tissues, a novel block-wise mixture of dynamic low-rank experts is proposed to efficiently finetune the foundation model for endoscopic depth estimation. In the proposed module, based on the input feature, different experts with a small amount of trainable parameters are adaptively selected for weighted inference, from low-rank experts which are allocated based on the generalization of each block. Moreover, a novel self-supervised training framework is proposed to jointly cope with brightness inconsistency and reflectance interference. The proposed method outperform state-of-the-art works on SCARED dataset and SimCol dataset. Furthermore, the proposed network also achieves the best generalization based on zero-shot depth estimation on C3VD, Hamlyn and SERV-CT dataset. The outstanding performance of our model is further demonstrated with sim-to-real test, 3D reconstruction and ego-motion estimation. The proposed method could contribute to accurate endoscopy for minimally invasive measurment and surgery. The codes will be released upon acceptance.