Novel View Synthesis (NVS) is an important task for 3D interpretation in remote sensing scenes, which also benefits vicinagearth security by enhancing situational awareness capabilities. Recently, NVS methods based on Neural Radiance Fields (NeRFs) have attracted increasing attention for self-supervised training and highly photo-realistic synthesis results. However, it is still challenging to synthesize novel view images in remote sensing scenes, given the complexity of land covers and the sparsity of input multi-view images. In this paper, we propose a novel NVS method named FReSNeRF, which combines Image-Based Rendering (IBR) and NeRF to achieve high-quality results in remote sensing scenes with sparse input. We effectively solve the degradation problem by adopting the sampling space annealing method. Additionally, we introduce depth smoothness based on the segmentation mask to constrain the scene geometry. Experiments on multiple scenes show the superiority of our proposed FReSNeRF over other methods.