The Curse of String Distance
When you work for sometime on a larger project, you may realize it doesn't perform as well as it did. That is usually expected but sometimes, it just doesn't feel justified.
This was also our case and because the performance of the project was important, we had to get our hands dirty with profilers and debuggers. Of course, we found a few bottlenecks.
The cursed method
Pobody's nerfect, but I don't want to write today about them. I want to write about a cursed method, which was with us right from the start. Which was becoming more and more demanding as we were adding more and more classes.
The curse started from an innocent need. Let's have short field names in Mongo to have organized layout. Let's have self explanatory bean names in application. Spring very helpfully provides the annotation @Field, which can be used to define different database field name than bean field name. And we used it generously. Just a quick search shows about 200 annotation occurrences.
We found out from profiling, the application is spending unhealthy amount of time in method org.springframework.beans.PropertyMatches.calculateStringDistance.
Every field, which had different name than Mongo field, calculated the distance against every other bean field.
To make it worse, the application is processing entities from the database and we had to load about 100.000.000 of them. As you can imagine, the number of calculateStringDistance calls was pretty high. At the end of the day, we implemented a cache in PropertyMatches, but now we have to maintain our custom build of spring-beans. Not something we want to do for the whole life span of project.
The more we looked at the issue, the more we believed the call is not necessary. The distance was calculated for exception message of PropertyReferenceException from Spring Data MongoDB project. I would argue that a preparation of Exception message should be as light as possible. Still, it can be justified if the message is helpful. This exception is caught in QueryMapper.getPath and method returns null, so we even cannot see the content of message, which drags down the performance.
However, the field is still mapped correctly probably (this is where we stopped debugging) using the name from annotation @Field. The existence of @Field itself suggests the bean name will most likely differ from Mongo field name. The PropertyReferenceException shouldn't be needed in those cases.
Is it something that should be investigated in Spring Data MongoDB? Or is there something we should do on our side to prevent this issue happening? There are not a lot of places in our code where to do things differently though. Our bean fields uses @Field annotation and we don't even have a custom bean converter. This is how we get the entity from database.
It's hard to believe we are the only ones facing this issue. If you have some observation or solution, feel free to use our comment section. We also opened the ticket DATAMONGO-1991 in Spring Jira and it would help if more people participate in the discussion. It could be hopefully resolved sooner with more feedback.
Author: Luděk Novotný