Actually this is using the Transfer Learning technique. This is one of the best methods to do object detection because if we train our own model it will not give this much simpler and accurate detection.
Suppose we are trying to do from scratch then We have to feed a large amount of data with lots of variety of training data. You have to define the model from Scratch which may or may not be efficient like the researcher has already done. If you have to do from very scratch you have to spend a very high amount of time with a good quality of resources.
I hope so this is useful. Let me know if you want further clarification.