Design issue instantiateEntity bug or feature?

xfrocks

Well-known member
Affected version
2.0.0
I stumbled upon this piece of code in \XF\Mvc\Entity\Manager:

PHP:
		/** @var Entity $entity */
		$entity = new $className($this, $structure, $values, $relations);
		if ($values)
		{
			...

			$primary = $this->getEntityCacheLookupString($keys);
			if (isset($this->entities[$class][$primary]))
			{
				$entity = $this->entities[$class][$primary];
				// TODO: how to handle relationships, at least if the existing entity has pending changes?
			}
			else
			{
				$this->entities[$class][$primary] = $entity;
			}
		}

Not quite sure about the business logic here but this is causing additional database queries for relations. Sample code 1:

PHP:
$singleTest1 = $app->em()->find('XF:Thread', 1);
$singleTest2 = $app->em()->find('XF:Thread', 1, ['Forum']);

One would expect $singleTest2 to include the Forum data but it doesn't. Checking the log shows that $singleTest2 did not trigger a new query so I guess that's fair. We are just doing some caching here and $singleTest2.Forum will be fetched on demand anyway.

Sample code 2:

PHP:
$multipleTest1 = $app->em()->findByIds('XF:Thread', [1, 2]);
$multipleTest2 = $app->em()->findByIds('XF:Thread', [1, 2], ['Forum']);

Again, items in $multipleTest2 don't have the Forum data but the system did execute 2 queries for these two lines of code. I have also checked and can confirm that the _uniqueEntityId of items in $multipleTest2 match those in $multipleTest1. Apparently getEntityCacheLookupString is buggy?
 
The code you've quoted mostly isn't relevant in your first example, as you're using find() which will always pull from the cache if possible (regardless of requested joins), so the code in question doesn't run for the second query. That is, for the most part, expected.

For the second example, the "todo" comment in the quoted code is relevant and explains what's happening here. Though in that case, because it's a primary join, when you request Forum, it will be pulled from the cache.

FindByIds() doesn't use the local cache at all, but only one instance of an Entity will ever exist in the system at a time (so they will refer to the same object).

So while there may be some changes here in the future, there aren't any changes planned here in the short term. It hasn't really come up with any significance in any part of the core. It doesn't cause incorrect data to be available, though it can cause a few unnecessary queries in specific configurations.
 
Thanks Mike for the response. For the second sample: since we have already queried the data and finished creating a new entity object, what is the benefit of the cache lookup? If we have reached this point, returning $entity immediately should be enough?
 
Reduction of memory usage is fairly significant (1000 threads in 1 forum will share 1 forum object), but it's also an important design choice that there is always only one value for an entity in a request (essentially, one source of truth). Otherwise, values will represent an essentially arbitrary snapshot in time rather than what is (or is about to be) in the DB. This is really only significant when doing writes.
 
That explains a lot. Do you have plans to address the Relation issues as noted by the comment in the code? It's quite uncomfortable to specifically query with some relations and find out they are not there because the entity has been initialized at some point in the past. Adding everything into defaultWith seems like a bad idea...
 
There are no immediate plans to make changes. I can't think of a particular situation in the core where this was a significant issue.
 
Top Bottom