Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent return of dangling Vector/QAngle to VScript #321

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

z33ky
Copy link

@z33ky z33ky commented Sep 5, 2024

When a Vector or QAngle rvalue reference is returned from a VScript function, the constructed ScriptVariant_t must not store the pointer to it since it is, per convention, a temporary reference. Only do that for lvalue-references, but do a copy when constructing from or assigning a rvalue reference.


I'm not sure ScriptVairant_t really needs to try and prevent copies of Vector and QAngle. They're pretty small and trivial classes, so are cheap to copy. When a copy must be made it surely beats a dynamic allocation & deallocation.
Free() refers to FIELD_HSCRIPT, although it will never set SV_FREE. Perhaps this is just outdated?
Strings will need to be copied (strdup()ed) due to their dynamic length, but it could try to do a small-string optimization.

The types of new and delete are misaligned, since Free() only ever does delete m_pszString. Since HSCRIPT, Vector and QAngle are all trivial to destruct this shouldn't have any ill effects.
At most I'm worried about strdup() being paired with delete instead of free().

Anyway, I wanted to hear your thoughts on this before I change more stuff. This change alone sufficiently fixes issues I've observed in return-values playing a Mapbase-based mod that uses VScripts (https://www.moddb.com/mods/sourceworld).


Does this PR close any issues?

PR Checklist

  • My PR follows all guidelines in the CONTRIBUTING.md file
  • My PR targets a develop branch OR targets another branch with a specific goal in mind

When a Vector or QAngle rvalue reference is returned from a VScript
function, the constructed ScriptVariant_t must not store the pointer to
it since it is, per convention, a temporary reference. Only do that for
lvalue-references, but do a copy when constructing from or assigning a
rvalue reference.
@z33ky
Copy link
Author

z33ky commented Sep 5, 2024

Also, specifically addressing #104 (comment)

Strings are not leaked, doing this for them crashes, so the current non-functional retval.Free() is actually dangerous here.

I didn't see any crashes calling GetName() and similar methods on different entities. Though perhaps other code changed to allow this now, or Linux/GCC is less prone to crashes here.

@samisalreadytaken
Copy link

samisalreadytaken commented Sep 6, 2024

That comment was poorly worded. It was more about being explicit that the only effect of Free() was on Vectors and that it doesn't cover HSCRIPT. Strings were never used after free because they don't set SV_FREE. Though superficially looking at it now, I don't see HSCRIPT leaks either; perhaps I was confusing it with another issue like (#166). I honestly don't remember, the original issue should've included more detail. Also that issue was opened when there was no return value freeing. If it is confirmed that HSCRIPT aren't leaked, then it can be closed as being fixed 3 years ago.

HSCRIPT variants never setting SV_FREE is the cause of #123 and possibly other differences to Valve's implementations of multiple languages.

strdup/delete mismatch is benign in Source because they all call the same global allocator functions. Is the strdup code path even ever hit at the moment? In a project wide grep, I don't see any ScriptVariant_t constructors with bCopy = true parameter except on ScriptRegisterConstantFromTemp, which only registers Vectors. It could be fixed though.

I'm curious what were the problematic functions that required this. Functions that return vec3_origin/vec3_invalid or members from CScriptNetPropManager::GetPropVectorArray now unnecessarily reallocate them with this change. What could be done to prevent this?

@z33ky
Copy link
Author

z33ky commented Sep 6, 2024

If it is confirmed that HSCRIPT aren't leaked, then it can be closed as being fixed 3 years ago.

I don't know about 'HSCRIPT, I didn't follow how these objects are managed and it looks more complex than VectorandQAngle(andchar*`).

Is the strdup() code path even ever hit at the moment?

I didn't even check that.
Though it doesn't seem to matter, especially if when it would happen, delete would still work fine.
Since you seem to be leaning at least slightly in the direction of "fixing" the pairing, I'll just insert a switch to handle the variants correctly. QAngle will still be deleted as Vector though, since it doesn't have it own field-type, but yeah, trivial destructor and all that.

I'm curious what were the problematic functions that required this.

Specifically I noticed it on CBasePlayer::EyeVectors(), but many, if not all, returned absurd values. Honestly I'm surprised this didn't cause issues on Windows, the stack-space the return-value pointer referred to probably gets reused for(/overwritten with) other stuff pretty quickly.

Functions that return vec3_origin/vec3_invalid or members from CScriptNetPropManager::GetPropVectorArray now unnecessarily reallocate them with this change. What could be done to prevent this?

That still uses the (const Vector &val, bool bCopy = false) constructor, since it's returning a const lvalue reference. So it will not be copied.
But as I pondered above, I'm not sure how much memory and performance this really saves in order to try and avoid copying and storing 8 additional bytes. All functions returning a Vector or QAngle by value will incur the cost of dynamic memory allocation and deallocation, though perhaps these are less common/used? I dunno. I'm fine keeping at as I currently have it in this PR, which fixes memory corruptions.
I suppose you could try and somehow store Vector/QAngle return values in a static (i.e. global) variable or reserving stack space in function_stub to use for it, since it will just be copied into VM memory (PushVariant()) and free'd after.
I could perhaps try and implement something along these lines - in a separate PR, that seems out of scope for this bug fix - though I'm not too happy about trying to muck with manual memory management in code I'm not familiar with. You can maybe see that I tried to keep the fix safe and simple.

@z33ky z33ky force-pushed the dangling-vscript-vector branch 2 times, most recently from f9d4f33 to 0551292 Compare September 6, 2024 21:31
@samisalreadytaken
Copy link

samisalreadytaken commented Sep 6, 2024

I think it makes more sense to change strdup into new char[] and memcpy instead of switching delete. Though new[] wouldn't match delete. Maybe malloc Vector?

Specifically I noticed it on CBasePlayer::EyeVectors()

EyeVectors, presumably from ScriptGetEyeForward, returns static Vector as is done everywhere else. I'm confused how returning static Vector references causes that.

That still uses the (const Vector &val, bool bCopy = false)

ScriptVariant_t constructors aren't called for function return values, assignment operators are. Though this seems to be a different issue.

const Vector& CBaseCombatCharacter::ScriptGetAttackSpread( HSCRIPT, HSCRIPT ) script binding is compiled as Vector const &, thus calls operator=( const Vector &vec ):

CMemberScriptBinding2<CBaseCombatCharacter *,Vector const & (__thiscall CBaseCombatCharacter::*)(HSCRIPT__ *,HSCRIPT__ *),Vector const &,HSCRIPT__ *,HSCRIPT__ *>::Call(void *, void *, ScriptVariant_t *, int, ScriptVariant_t *)

However, const Vector &CScriptNetPropManager::GetPropVectorArray( HSCRIPT, const char *, int ) is compiled as Vector, thus calls operator=( Vector &&vec ):

CMemberScriptBinding2<CScriptNetPropManager *,Vector (__thiscall CScriptNetPropManager::*)(HSCRIPT__ *,char const *),Vector,HSCRIPT__ *,char const *>::Call(void *, void *, ScriptVariant_t *, int, ScriptVariant_t *)

They both return vec3_origin/vec3_invalid as const Vector &, so what causes this difference? Note that I'm only testing on msvc.

Test calls: player.GetAttackSpread(null,null), NetProps.GetPropVector(null,"")

Stack allocating a Vector in function_stub sounds interesting, that could be the way to go.

@z33ky
Copy link
Author

z33ky commented Sep 7, 2024

I think it makes more sense to change strdup into new char[] and memcpy instead of switching delete. Though new[] wouldn't match delete. Maybe malloc Vector?

I could new Vector[1]. Or yeah, malloc(), which also maybe makes it more clear we're expecting to work with trivial types, or invoke constructor and destructor manually.

EyeVectors, presumably from ScriptGetEyeForward, returns static Vector as is done everywhere else. I'm confused how returning static Vector references causes that.

Oh sorry, it it Weapon_ShootPosition(). I replaced that with EyeVectors() in the vscript as my first attempt to fix this, which did indeed work fine, before I started to debug why Weapon_ShootPosition() does not work.

ScriptVariant_t constructors aren't called for function return values, assignment operators are. Though this seems to be a different issue.

I don't know what branch you've linked in #321 (comment), but there is different code in both master and develop, which does return a Vector by value for GetPropVector().

@samisalreadytaken
Copy link

I'm using # 260, but it's my mistake. I incorrectly put Vector as return type in the macro that defines non-array versions of the functions; the debugger stepping from GetPropVectorArray straight into script bindings caused my confusion. Ignore that part of my comment.

@z33ky
Copy link
Author

z33ky commented Sep 8, 2024

I think I have been chasing some ghost bugs due to corrupted object files or something.
Anyway, I'm not sure I have much time this week, but I will work on this. Stack allocation for the function_stub return value looks pretty doable.

z33ky added a commit to z33ky/source-sdk-2013 that referenced this pull request Nov 12, 2024
Instead of temporary allocating dynamic memory, which subsequently is
copied into another dynamically allocated piece of memory, provide a
temporary buffer on the stack.
The script value inserted into the VM still needs to allocate
dynamically though...

A few sites are adapted to not create a ScriptVariant_t from a temporary
Vector to avoid dynamic allocation there too.

cf. mapbase-source#321
@z33ky
Copy link
Author

z33ky commented Nov 12, 2024

I've had a family emergency and some stuff got left by the wayside.
I've picked this one back up. I'm not sure if it really was ghost bugs, as memory management issues can have weird effects. It looks like I've managed to get it working stably, so now here's the stack-based (malloc()-free) temporary return-value storage for function_stub() plus a few minor fixes.

z33ky added a commit to z33ky/source-sdk-2013 that referenced this pull request Nov 12, 2024
Instead of temporary allocating dynamic memory, which subsequently is
copied into another dynamically allocated piece of memory, provide a
temporary buffer on the stack.
The script value inserted into the VM still needs to allocate
dynamically though...

A few sites are adapted to not create a ScriptVariant_t from a temporary
Vector to avoid dynamic allocation there too.

cf. mapbase-source#321
@z33ky
Copy link
Author

z33ky commented Nov 13, 2024

Hm, I attempted to fix some sort of syntax error MSVC has with the placement new, but it still complains. gcc accepts both versions. I don't know what its problem is, nor how else I could attempt to fix it :/

Type FIELD_HSCRIPT is removed, it never sets SV_FREE.

All dynamic memory is now free`d via free(). For strdup(), this would be
the conformant way in standard C++. It matters not in Source, since both
use the same global allocator of the engine, but it is tidier.
Vector and QAngle are now allocated via malloc() instead of new to
provide symmetry.
When assigning a null value in getVariant(), make sure a previous
SV_FREE is not carried over.

In SqurrelVM::ReleaseValue(), make sure SV_FREE is not kept after
Free()ing a value.
z33ky added a commit to z33ky/source-sdk-2013 that referenced this pull request Nov 13, 2024
Instead of temporary allocating dynamic memory, which subsequently is
copied into another dynamically allocated piece of memory, provide a
temporary buffer on the stack.
The script value inserted into the VM still needs to allocate
dynamically though...

A few sites are adapted to not create a ScriptVariant_t from a temporary
Vector to avoid dynamic allocation there too.

cf. mapbase-source#321
Instead of temporary allocating dynamic memory, which subsequently is
copied into another dynamically allocated piece of memory, provide a
temporary buffer on the stack.
The script value inserted into the VM still needs to allocate
dynamically though...

A few sites are adapted to not create a ScriptVariant_t from a temporary
Vector to avoid dynamic allocation there too.

cf. mapbase-source#321
@z33ky
Copy link
Author

z33ky commented Nov 13, 2024

Well, why would I expect new to always be a keyword as defined by the standard. It's a macro of course, silly me.
Builds are fixed.

@samisalreadytaken
Copy link

Vector is always float[3] and has a trivial constructor, I don't think placement new is needed at all. It can simply be a copy *pNewVector = *m_pVector

I built z33ky:dangling-vscript-vector with samisalreadytaken:vscript-debugger, and tested this code below on my Windows machine with and without commit 7efec77 "Optimize squirrel function_stub". It was measurably faster without it

sqdbg_prof_stop();
sqdbg_prof_start();

local player = player;

for ( local i = 1000; i--; )
{
	sqdbg_prof_begin("t1");
	player.ShootPosition();
	player.ShootPosition();
	player.ShootPosition();
	sqdbg_prof_end();
}

sqdbg_prof_print("t1");
// 7efec77 reverted
(sqdbg) prof | t1 : total 580.10 us, avg 580.68 ns, peak   3.50 us(1), hits 1000
(sqdbg) prof | t2 : total   5.52 ms, avg 552.17 ns, peak   2.10 us(5035), hits 10000
(sqdbg) prof | t3 : total 563.48 ms, avg 563.48 ns, peak  18.90 us(274844), hits 1000000
// with 7efec77
(sqdbg) prof | t1 : total 778.80 us, avg 779.58 ns, peak   5.00 us(1), hits 1000
(sqdbg) prof | t2 : total   7.83 ms, avg 783.51 ns, peak  41.20 us(1911), hits 10000
(sqdbg) prof | t3 : total 800.75 ms, avg 800.75 ns, peak 127.00 us(496971), hits 1000000

@z33ky
Copy link
Author

z33ky commented Nov 14, 2024

Vector is always float[3] and has a trivial constructor, I don't think placement new is needed at all. It can simply be a copy *pNewVector = *m_pVector

Conceptually yes (unless _DEBUG and VECTOR_PARANOIA is defined), albeit currently the default constructor is user-provided, so is not trivial.
From a cursory glance it seems gcc and MSVC will emit the same code either way. Assignment looks simpler, placement new is more technically correct.

I built z33ky:dangling-vscript-vector with samisalreadytaken:vscript-debugger, and tested this code below on my Windows machine with and without commit 7efec77

Perhaps another change in the commit is responsible? I don't have time to do more testing today; Instead of reverting the whole commit I manually inserted the intermediary malloc() (which was part of operator= before) to keep the rest of the changes in place.

diff --git a/sp/src/public/vscript/vscript_templates.h b/sp/src/public/vscript/vscript_templates.h
index 9f2054cc3..4edf38002 100644
--- a/sp/src/public/vscript/vscript_templates.h
+++ b/sp/src/public/vscript/vscript_templates.h
@@ -374,8 +374,10 @@ inline FUNCPTR_TYPE ScriptConvertFuncPtrFromVoid( void *p )
 			{ \
 				return false; \
 			} \
-			new ( &temporaryReturnStorage.m_vec ) Vector( ((FUNC_TYPE)pFunction)( SCRIPT_BINDING_ARGS_##N ) ); \
-			*pReturn = temporaryReturnStorage.m_vec; \
+			Vector *vec = (Vector*)malloc( sizeof( Vector ) ); \
+			new ( vec ) Vector( ((FUNC_TYPE)pFunction)( SCRIPT_BINDING_ARGS_##N ) ); \
+			*pReturn = vec; \
+			pReturn->m_flags |= SV_FREE; \
 			return true; \
 		} \
 	}; \
@@ -394,8 +396,10 @@ inline FUNCPTR_TYPE ScriptConvertFuncPtrFromVoid( void *p )
 			{ \
 				return false; \
 			} \
-			new ( &temporaryReturnStorage.m_ang ) QAngle( ((FUNC_TYPE)pFunction)( SCRIPT_BINDING_ARGS_##N ) ); \
-			*pReturn = temporaryReturnStorage.m_ang; \
+			QAngle *ang = (QAngle*)malloc( sizeof( QAngle ) ); \
+			new ( ang ) QAngle( ((FUNC_TYPE)pFunction)( SCRIPT_BINDING_ARGS_##N ) ); \
+			*pReturn = ang; \
+			pReturn->m_flags |= SV_FREE; \
 			return true; \
 		} \
 	}; \
@@ -452,8 +456,10 @@ inline FUNCPTR_TYPE ScriptConvertFuncPtrFromVoid( void *p )
 			{ \
 				return false; \
 			} \
-			new ( &temporaryReturnStorage.m_vec ) Vector( (((OBJECT_TYPE_PTR)(pContext))->*ScriptConvertFuncPtrFromVoid<FUNC_TYPE>(pFunction))( SCRIPT_BINDING_ARGS_##N ) ); \
-			*pReturn = temporaryReturnStorage.m_vec; \
+			Vector *vec = (Vector*)malloc( sizeof( Vector ) ); \
+			new ( vec ) Vector( (((OBJECT_TYPE_PTR)(pContext))->*ScriptConvertFuncPtrFromVoid<FUNC_TYPE>(pFunction))( SCRIPT_BINDING_ARGS_##N ) ); \
+			*pReturn = vec; \
+			pReturn->m_flags |= SV_FREE; \
 			return true; \
 		} \
 	}; \
@@ -472,8 +478,10 @@ inline FUNCPTR_TYPE ScriptConvertFuncPtrFromVoid( void *p )
 			{ \
 				return false; \
 			} \
-			new ( &temporaryReturnStorage.m_ang ) QAngle( (((OBJECT_TYPE_PTR)(pContext))->*ScriptConvertFuncPtrFromVoid<FUNC_TYPE>(pFunction))( SCRIPT_BINDING_ARGS_##N ) ); \
-			*pReturn = temporaryReturnStorage.m_ang; \
+			QAngle *ang = (QAngle*)malloc( sizeof( QAngle ) ); \
+			new ( ang ) QAngle( (((OBJECT_TYPE_PTR)(pContext))->*ScriptConvertFuncPtrFromVoid<FUNC_TYPE>(pFunction))( SCRIPT_BINDING_ARGS_##N ) ); \
+			*pReturn = ang; \
+			pReturn->m_flags |= SV_FREE; \
 			return true; \
 		} \
 	}; \
diff --git a/sp/src/vscript/vscript_squirrel.cpp b/sp/src/vscript/vscript_squirrel.cpp
index 47364e007..153dbe2a4 100644
--- a/sp/src/vscript/vscript_squirrel.cpp
+++ b/sp/src/vscript/vscript_squirrel.cpp
@@ -1428,7 +1428,6 @@ SQInteger function_stub(HSQUIRRELVM vm)

 	PushVariant(vm, retval);

-	Assert(!(retval.m_flags & SV_FREE));
 	retval.Free();

 	return pFunc->m_desc.m_ReturnType != FIELD_VOID;

Could you test this on your system? Also, just making sure, you are profiling a release-mode build?

My results look like

// plain
(sqdbg) prof | t1              : total 853.75 us, avg 854.60 ns, peak  10.50 us(664), hits 1000
(sqdbg) prof | t2              : total   8.76 ms, avg 875.61 ns, peak  46.00 us(7494), hits 10000
(sqdbg) prof | t3              : total  86.66 ms, avg 866.64 ns, peak  63.75 us(2333), hits 100000
// patched (+malloc)
(sqdbg) prof | t1              : total 933.50 us, avg 934.43 ns, peak  10.75 us(729), hits 1000
(sqdbg) prof | t2              : total   9.15 ms, avg 914.74 ns, peak  14.75 us(2284), hits 10000
(sqdbg) prof | t3              : total  90.77 ms, avg 907.76 ns, peak  16.25 us(47563), hits 100000

(I assume t2 and t3 are the same loop, just more iterations.)
For some reason there's much higher peaks in the "plain" version, though maybe it was just some inopportune OS scheduling (I ran the test multiple times, just didn't check the peak values until now). And the average doesn't change much admittedly, around 40ns per iteration... I'll profile some changes, like actually reverting 7efec77 (because why not), but perhaps it's just not really worth investigating and should just be removed then.

I'll test with fully reverting 7efec77 tomorrow. The only sensible explanation I can come up with would be that passing the temporaryReturnStorage for the call leads to worse register allocation and the small malloc() can be served really fast or something along those lines.

@samisalreadytaken
Copy link

That malloc patch is slower in the same way as yours. Even removing the unused temporaryReturnStorage parameter had some effect, albeit by very little.

I am of course testing on release with no (native) debugger attached, and t2, t3 are more iterations of the same loop.

// 7efec77 revert
(sqdbg) prof | t1 : total 576.70 us, avg 577.28 ns, peak   3.80 us(816), hits 1000
(sqdbg) prof | t2 : total   6.18 ms, avg 618.02 ns, peak   2.40 us(425), hits 10000
(sqdbg) prof | t3 : total 592.78 ms, avg 592.78 ns, peak  13.50 us(202631), hits 1000000

// with 7efec77
(sqdbg) prof | t1 : total 793.10 us, avg 793.89 ns, peak   3.70 us(1), hits 1000
(sqdbg) prof | t2 : total   7.95 ms, avg 795.54 ns, peak   2.10 us(4917), hits 10000
(sqdbg) prof | t3 : total 789.92 ms, avg 789.92 ns, peak  15.30 us(151586), hits 1000000

// + malloc patch
(sqdbg) prof | t1 : total 854.40 us, avg 855.26 ns, peak   4.80 us(567), hits 1000
(sqdbg) prof | t2 : total   8.26 ms, avg 826.32 ns, peak   2.40 us(8655), hits 10000
(sqdbg) prof | t3 : total 851.58 ms, avg 851.58 ns, peak  15.00 us(108205), hits 1000000

// + remove temporaryReturnStorage parameter
(sqdbg) prof | t1 : total 841.60 us, avg 842.44 ns, peak   4.50 us(1), hits 1000
(sqdbg) prof | t2 : total   8.04 ms, avg 804.10 ns, peak   1.60 us(2413), hits 10000
(sqdbg) prof | t3 : total 829.87 ms, avg 829.87 ns, peak  16.80 us(257791), hits 1000000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants