Cache poisoning from rfc6455 (WebSockets) not requiring server message to be masked?
In RFC6455 section 10.3, it explains why they have made clients mask their outgoing frames (so that a malicious server cannot manipulate a client into sending something in plaintext, as the message could be a HTTP request which could be used to cache poison a proxy server).
However, there is no requirement for the server to mask its messages. Not only that, but Chrome rejects messages from a server that are masked, throwing error "A server must not mask any frames that it sends to the client.".
It seems to me that not masking server messages is exploitable to also poison proxy caches:
1-Imagine a websocket server providing some sort of store/retrieve functionality - such that a client can define parts of a servers later clear-text response). This could be as simple as a form whose fields are later retrieved, or an interface to store/receive from a redis database.
2-A malicious client can ask the server to store data that looks like a HTTP response. That data is sent masked in a websocket frame and not interpreted by proxy server.
3-The malicious client the sends a valid masked websocket frame to ask for retrieval of the stored data. This again is ignored by proxy server. However the client immediately appends to that frame, in clear text, "\r\n\r\n" followed by a HTTP request for the stored HTTP response. The proxy interprets this as a HTTP request.
4-The server recieves the masked frame - and sends out websocket frame containing the HTTP response in clear text. The proxy interprets this as the HTTP response to the clients HTTP request.
5-The client has thus controlled both a clear text HTTP request, and a clear text HTTP response. This allows the client to cache poison the proxy server.
The above would be impossible if the server masked its messages also. There seems to be no apparent reason why this functionality wasn't required, let alone is actively denied?!
Obviously you can argue the real exploit is in the proxy server, or even in the idea of letting a client define future clear text websocket message. But shouldn't they just mask server responses as well?
To protect against this, you need to apply masking at the payload level, and have the data unmasked in the javascript on the client, instead of by the browser websocket api.