Fixed XF\Http\Response->contentType() check not correct

digitalpoint

Well-known member
Affected version
2.2.9
Ran into an issue where a proxy system would throw an invalid content type error. It turns out it's because the contentType() method is doing a regex on what it considers a valid content type like so:
PHP:
if (!preg_match('#^[a-zA-Z0-9]+/[a-zA-Z0-9-+]+$#', $contentType))
{
   throw new \InvalidArgumentException('Invalid content type');
}

A Content-Type header can in fact also have a semi-colon with additional info (which was causing the problem for me). To work around it, I ended up dropping the semi-colon and anything after, but it's probably not the "right" thing to do.


Syntax
Content-Type: text/html; charset=UTF-8
Content-Type: multipart/form-data; boundary=something


Directives
media-type
The MIME type of the resource or the data.

charset
The character encoding standard.

boundary
For multipart entities the boundary directive is required. The directive consists of 1 to 70 characters from a set of characters (and not ending with white space) known to be very robust through email gateways. It is used to encapsulate the boundaries of the multiple parts of the message. Often, the header boundary is prepended with two dashes and the final boundary has two dashes appended at the end.
 

PaulB

Well-known member
There are also perfectly-valid content types that don't match that format, some of which could conceivably be dealt with by XenForo or its add-ons. For example, search this page for vnd.: https://www.iana.org/assignments/media-types/media-types.xhtml While most of those are obscure, some still see widespread use on the web and could appear on a vanilla XF installation. For example:
  • .m3u8 -> application/vnd.apple.mpegurl
  • .xls -> application/vnd.ms-excel
  • .ppt -> application/vnd.ms-powerpoint
  • .docx -> application/vnd.openxmlformats-officedocument.wordprocessingml.document
  • .xlsx -> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  • .pptx -> application/vnd.openxmlformats-officedocument.presentationml.presentation
 

Chris D

XenForo developer
Staff member
A Content-Type header can indeed contain a ; to separate the charset and/or boundary parts but unless I'm mistaken, that's not actually what we're expecting to be received by this method. We're only interested in the content type portion itself, in which case the regex is valid for the majority of cases.

@digitalpoint what exactly was the content type that was invalid here? I'm fairly sure dropping the bit after the ; is probably the right thing to do, but would like more context if possible.

@PaulB I think you're correct. It looks like changing the regex to ^[a-zA-Z0-9]+/[a-zA-Z0-9-+\._]+$ (adding dots and underscores to the second part) makes all of those listed to be correct.
 

digitalpoint

Well-known member
@digitalpoint what exactly was the content type that was invalid here? I'm fairly sure dropping the bit after the ; is probably the right thing to do, but would like more context if possible.
In my particular case, my Google Analytics add-on has an option to serve the Google Tag Manager JavaScript from your own domain. As part of that, it fetches the JavaScript that (for xenforo.com) exists at: https://www.googletagmanager.com/gtag/js?id=G-D7JYH6EQJL

The content type header for that is content-type: application/javascript; charset=UTF-8

So in my setup, rather than pass through the content-type from XF\Http\Response, I just set it myself (getting the content type causes the exception because it has a semi-colon in it).
 

Chris D

XenForo developer
Staff member
Yeah I think that's the correct thing to do.

The contentType method wants essentially the MIME type bit as one argument then, optionally, the charset bit as the second argument. The onus is mostly on you to separate the two parts of a normal content-type header into the respective parts, if needed.

Or, as in cases like yours, if you know the content type and it's not expected to change, just hard code it.

We'll call this fixed, but it's only fixed as per @PaulB's comments about additional valid MIME type characters.
 

digitalpoint

Well-known member
You don’t think it would be better to drop anything after the semi-colon when getting the content type and not throw an exception? Seems weird to throw an exception simply because you use the method to get the content type and there’s a valid content type that includes valid info like charset. 🤷🏻‍♂️
 
Top