[breaking] trust setting to indicate whether input text is trusted (#1794)

* trust option to indicate whether input text is trusted

* Revamp into trust contexts beyond just command

* Document new trust function style

* Fix screenshot testing

* Use trust setting in \url and \href

* Check `isTrusted` in `\url` and `\href` (so now disabled by default)
* Automatically compute `protocol` from `url` in `isTrusted`, so it
  doesn't need to be passed into every context.

* Document untrusted features in support list/table

* Existing tests trust by default

* remove allowedProtocols and fix flow errors

* remove 'allowedProtocols' from documentation

* add a comment about a flow error, rename urlToProtocol to protocolFromUrl

* add tests test that use function version of trust option

* default trust to false in MathML tests

* fix test title, remove 'trust: false' from test settings since it's the default
This commit is contained in:
Erik Demaine
2019-07-08 21:57:23 -04:00
committed by Kevin Barabash
parent fc79f79c78
commit 3800dc49c1
16 changed files with 352 additions and 62 deletions

View File

@@ -18,8 +18,8 @@ You can provide an object of options as the last argument to [`katex.render` and
- `colorIsTextColor`: `boolean`. If `true`, `\color` will work like LaTeX's `\textcolor`, and take two arguments (e.g., `\color{blue}{hello}`), which restores the old behavior of KaTeX (pre-0.8.0). If `false` (the default), `\color` will work like LaTeX's `\color`, and take one argument (e.g., `\color{blue}hello`). In both cases, `\textcolor` works as in LaTeX (e.g., `\textcolor{blue}{hello}`).
- `maxSize`: `number`. All user-specified sizes, e.g. in `\rule{500em}{500em}`, will be capped to `maxSize` ems. If set to `Infinity` (the default), users can make elements and spaces arbitrarily large.
- `maxExpand`: `number`. Limit the number of macro expansions to the specified number, to prevent e.g. infinite macro loops. If set to `Infinity`, the macro expander will try to fully expand as in LaTeX. (default: 1000)
- `allowedProtocols`: `string[]`. Allowed protocols in `\href`. Use `_relative` to allow relative urls, and `*` to allow all protocols. (default: `["http", "https", "mailto", "_relative"]`)
- `strict`: `boolean` or `string` or `function` (default: `"warn"`). If `false` or `"ignore`", allow features that make writing LaTeX convenient but are not actually supported by (Xe)LaTeX (similar to MathJax). If `true` or `"error"` (LaTeX faithfulness mode), throw an error for any such transgressions. If `"warn"` (the default), warn about such behavior via `console.warn`. Provide a custom function `handler(errorCode, errorMsg, token)` to customize behavior depending on the type of transgression (summarized by the string code `errorCode` and detailed in `errorMsg`); this function can also return `"ignore"`, `"error"`, or `"warn"` to use a built-in behavior. A list of such features and their `errorCode`s:
- `"unknownSymbol"`: Use of unknown Unicode symbol, which will likely also
lead to warnings about missing character metrics, and layouts may be
incorrect (especially in terms of vertical heights).
@@ -31,10 +31,27 @@ You can provide an object of options as the last argument to [`katex.render` and
A second category of `errorCode`s never throw errors, but their strictness
affects the behavior of KaTeX:
- `"newLineInDisplayMode"`: Use of `\\` or `\newline` in display mode
(outside an array/tabular environment). In strict mode, no line break
results, as in LaTeX.
- `trust`: `boolean` or `function` (default: `false`). If `false` (do not trust input), prevent any commands like `\includegraphics` that could enable adverse behavior, rendering them instead in `errorColor`. If `true` (trust input), allow all such commands. Provide a custom function `handler(context)` to customize behavior depending on the context (command, arguments e.g. a URL, etc.). A list of possible contexts:
- `{command: "\\url", url, protocol}`
- `{command: "\\href", url, protocol}`
- `{command: "\\includegraphics", url, protocol}`
Here are some sample trust settings:
- Forbid specific command: `trust: (context) => context.command !== '\\includegraphics'`
- Allow specific command: `trust: (context) => context.command === '\\url'`
- Allow multiple specific commands: `trust: (context) => ['\\url', '\\href'].includes(context.command)`
- Allow all commands with a specific protocol: `trust: (context) => context.protocol === 'http'`
- Allow all commands with specific protocols: `trust: (context) => ['http', 'https', '_relative'].includes(context.protocol)`
- Allow all commands but forbid specific protocol: `trust: (context) => context.protocol !== 'file'`
- Allow certain commands with specific protocols: `trust: (context) => ['\\url', '\\href'].includes(context.command) && ['http', 'https', '_relative'].includes(context.protocol)`
For example:
```js

View File

@@ -9,10 +9,11 @@ Of course, it is always a good idea to sanitize the HTML, though you will need
a rather generous whitelist (including some of SVG and MathML) to support
all of KaTeX.
Use `maxSize` option for preventing large width/height visual affronts,
use `maxExpand` for preventing infinite macro loop attacks, and
use `allowedProtocols` for preventing certain protocols in `\href`. Please
refer to [Options](options.md) for more details.
A variety of options give finer control over the security of KaTeX
with untrusted inputs; refer to [Options](options.md) for more details.
* `maxSize` can prevent large width/height visual affronts.
* `maxExpand` can prevent infinite macro loop attacks.
* `trust` can allow certain commands that are not always safe (e.g., `\includegraphics`)
The error message thrown by KaTeX may contain unescaped LaTeX source code.
See [Handling Errors](error.md) for more details.

View File

@@ -453,7 +453,7 @@ use `\ce` instead|
|\hookleftarrow|$\hookleftarrow$||
|\hookrightarrow|$\hookrightarrow$||
|\hphantom|$a\hphantom{bc}d$|`a\hphantom{bc}d`|
|\href|$\href{https://katex.org/}{\KaTeX}$|`\href{https://katex.org/}{\KaTeX}`|
|\href|$\href{https://katex.org/}{\KaTeX}$|`\href{https://katex.org/}{\KaTeX}` Requires `trust` [option](options.md)|
|\hskip|$w\hskip1em i\hskip2em d$|`w\hskip1em i\hskip2em d`|
|\hslash|$\hslash$||
|\hspace|$s\hspace7ex k$|`s\hspace7ex k`|
@@ -1105,7 +1105,7 @@ use `\ce` instead|
|\upsilon|$\upsilon$||
|\upuparrows|$\upuparrows$||
|\urcorner|$\urcorner$||
|\url|$\footnotesize\url{https://katex.org/}$|`\url{https://katex.org/}`|
|\url|$\footnotesize\url{https://katex.org/}$|`\url{https://katex.org/}` Requires `trust` [option](options.md)|
|\utilde|$\utilde{AB}$|`\utilde{AB}`|
## V

View File

@@ -101,6 +101,13 @@ The `{array}` environment does not yet support `\cline` or `\multicolumn`.
## HTML
The following "raw HTML" features are potentially dangerous for untrusted
inputs, so they are disabled by default, and attempting to use them produces
the command names in red (which you can configure via the `errorColor`
[option](options.md)). To fully trust your LaTeX input, you need to pass
an option of `trust: true`; you can also enable just some of the commands
or for just some URLs via the `trust` [option](options.md).
|||
|:----------------|:-------------------|
| $\href{https://katex.org/}{\KaTeX}$ | `\href{https://katex.org/}{\KaTeX}` |

View File

@@ -7,14 +7,14 @@ import {validUnit} from "./units";
import {supportedCodepoint} from "./unicodeScripts";
import unicodeAccents from "./unicodeAccents";
import unicodeSymbols from "./unicodeSymbols";
import utils from "./utils";
import {checkNodeType} from "./parseNode";
import ParseError from "./ParseError";
import {combiningDiacriticalMarksEndRegex} from "./Lexer";
import Settings from "./Settings";
import SourceLocation from "./SourceLocation";
import {Token} from "./Token";
import type {ParseNode, AnyParseNode, SymbolParseNode} from "./parseNode";
import type {ParseNode, AnyParseNode, SymbolParseNode, UnsupportedCmdParseNode}
from "./parseNode";
import type {Atom, Group} from "./symbols";
import type {Mode, ArgType, BreakToken} from "./types";
import type {FunctionContext, FunctionSpec} from "./defineFunction";
@@ -266,8 +266,7 @@ export default class Parser {
* Converts the textual input of an unsupported command into a text node
* contained within a color node whose color is determined by errorColor
*/
handleUnsupportedCmd(): AnyParseNode {
const text = this.nextToken.text;
formatUnsupportedCmd(text: string): UnsupportedCmdParseNode {
const textordArray = [];
for (let i = 0; i < text.length; i++) {
@@ -287,7 +286,6 @@ export default class Parser {
body: [textNode],
};
this.consume();
return colorNode;
}
@@ -723,14 +721,6 @@ export default class Parser {
// "undefined" behaviour, and keep them as-is. Some browser will
// replace backslashes with forward slashes.
const url = res.text.replace(/\\([#$%&~_^{}])/g, '$1');
let protocol = /^\s*([^\\/#]*?)(?::|&#0*58|&#x0*3a)/i.exec(url);
protocol = (protocol != null ? protocol[1] : "_relative");
const allowed = this.settings.allowedProtocols;
if (!utils.contains(allowed, "*") &&
!utils.contains(allowed, protocol)) {
throw new ParseError(
`Forbidden protocol '${protocol}'`, res);
}
return {
type: "url",
mode: this.mode,
@@ -803,7 +793,8 @@ export default class Parser {
throw new ParseError(
"Undefined control sequence: " + text, firstToken);
}
result = this.handleUnsupportedCmd();
result = this.formatUnsupportedCmd(text);
this.consume();
}
}

View File

@@ -16,6 +16,26 @@ export type StrictFunction =
(errorCode: string, errorMsg: string, token?: Token | AnyParseNode) =>
?(boolean | string);
export type TrustContextTypes = {
"\\href": {|
command: "\\href",
url: string,
protocol?: string,
|},
"\\includegraphics": {|
command: "\\includegraphics",
url: string,
protocol?: string,
|},
"\\url": {|
command: "\\url",
url: string,
protocol?: string,
|},
};
export type AnyTrustContext = $Values<TrustContextTypes>;
export type TrustFunction = (context: AnyTrustContext) => ?boolean;
export type SettingsOptions = {
displayMode?: boolean;
output?: "html" | "mathml" | "htmlAndMathml";
@@ -27,9 +47,9 @@ export type SettingsOptions = {
minRuleThickness?: number;
colorIsTextColor?: boolean;
strict?: boolean | "ignore" | "warn" | "error" | StrictFunction;
trust?: boolean | TrustFunction;
maxSize?: number;
maxExpand?: number;
allowedProtocols?: string[];
};
/**
@@ -42,7 +62,7 @@ export type SettingsOptions = {
* math (true), meaning that the math starts in \displaystyle
* and is placed in a block with vertical margin.
*/
class Settings {
export default class Settings {
displayMode: boolean;
output: "html" | "mathml" | "htmlAndMathml";
leqno: boolean;
@@ -53,9 +73,9 @@ class Settings {
minRuleThickness: number;
colorIsTextColor: boolean;
strict: boolean | "ignore" | "warn" | "error" | StrictFunction;
trust: boolean | TrustFunction;
maxSize: number;
maxExpand: number;
allowedProtocols: string[];
constructor(options: SettingsOptions) {
// allow null options
@@ -73,10 +93,9 @@ class Settings {
);
this.colorIsTextColor = utils.deflt(options.colorIsTextColor, false);
this.strict = utils.deflt(options.strict, "warn");
this.trust = utils.deflt(options.trust, false);
this.maxSize = Math.max(0, utils.deflt(options.maxSize, Infinity));
this.maxExpand = Math.max(0, utils.deflt(options.maxExpand, 1000));
this.allowedProtocols = utils.deflt(options.allowedProtocols,
["http", "https", "mailto", "_relative"]);
}
/**
@@ -146,6 +165,22 @@ class Settings {
return false;
}
}
}
export default Settings;
/**
* Check whether to test potentially dangerous input, and return
* `true` (trusted) or `false` (untrusted). The sole argument `context`
* should be an object with `command` field specifying the relevant LaTeX
* command (as a string starting with `\`), and any other arguments, etc.
* If `context` has a `url` field, a `protocol` field will automatically
* get added by this function (changing the specified object).
*/
isTrusted(context: AnyTrustContext) {
if (context.url && !context.protocol) {
context.protocol = utils.protocolFromUrl(context.url);
}
const trust = typeof this.trust === "function"
? this.trust(context)
: this.trust;
return Boolean(trust);
}
}

View File

@@ -2,7 +2,8 @@
import {checkNodeType} from "./parseNode";
import type Parser from "./Parser";
import type {ParseNode, AnyParseNode, NodeType} from "./parseNode";
import type {ParseNode, AnyParseNode, NodeType, UnsupportedCmdParseNode}
from "./parseNode";
import type Options from "./Options";
import type {ArgType, BreakToken, Mode} from "./types";
import type {HtmlDomNode} from "./domTree";
@@ -21,7 +22,9 @@ export type FunctionHandler<NODETYPE: NodeType> = (
context: FunctionContext,
args: AnyParseNode[],
optArgs: (?AnyParseNode)[],
) => ParseNode<NODETYPE>;
) => UnsupportedCmdParseNode | ParseNode<NODETYPE>;
// Note: reverse the order of the return type union will cause a flow error.
// See https://github.com/facebook/flow/issues/3663.
export type HtmlBuilder<NODETYPE> = (ParseNode<NODETYPE>, Options) => HtmlDomNode;
export type MathMLBuilder<NODETYPE> = (
@@ -199,10 +202,6 @@ export default function defineFunction<NODETYPE: NodeType>({
handler: handler,
};
for (let i = 0; i < names.length; ++i) {
// TODO: The value type of _functions should be a type union of all
// possible `FunctionSpec<>` possibilities instead of `FunctionSpec<*>`,
// which is an existential type.
// $FlowFixMe
_functions[names[i]] = data;
}
if (type) {

View File

@@ -18,6 +18,14 @@ defineFunction({
handler: ({parser}, args) => {
const body = args[1];
const href = assertNodeType(args[0], "url").url;
if (!parser.settings.isTrusted({
command: "\\href",
url: href,
})) {
return parser.formatUnsupportedCmd("\\href");
}
return {
type: "href",
mode: parser.mode,
@@ -49,6 +57,14 @@ defineFunction({
},
handler: ({parser}, args) => {
const href = assertNodeType(args[0], "url").url;
if (!parser.settings.isTrusted({
command: "\\url",
url: href,
})) {
return parser.formatUnsupportedCmd("\\url");
}
const chars = [];
for (let i = 0; i < href.length; i++) {
let c = href[i];

View File

@@ -85,6 +85,13 @@ defineFunction({
alt = alt.substring(0, alt.lastIndexOf('.'));
}
if (!parser.settings.isTrusted({
command: "\\includegraphics",
url: src,
})) {
return parser.formatUnsupportedCmd("\\includegraphics");
}
return {
type: "includegraphics",
mode: parser.mode,

View File

@@ -19,6 +19,9 @@ export type SymbolParseNode =
ParseNode<"spacing"> |
ParseNode<"textord">;
// ParseNode from `Parser.formatUnsupportedCmd`
export type UnsupportedCmdParseNode = ParseNode<"color">;
// Union of all possible `ParseNode<>` types.
export type AnyParseNode = $Values<ParseNodeTypes>;

View File

@@ -91,6 +91,15 @@ export const assert = function<T>(value: ?T): T {
return value;
};
/**
* Return the protocol of a URL, or "_relative" if the URL does not specify a
* protocol (and thus is relative).
*/
export const protocolFromUrl = function(url: string): string {
const protocol = /^\s*([^\\/#]*?)(?::|&#0*58|&#x0*3a)/i.exec(url);
return (protocol != null ? protocol[1] : "_relative");
};
export default {
contains,
deflt,
@@ -98,4 +107,5 @@ export default {
hyphenate,
getBaseElem,
isCharacterBox,
protocolFromUrl,
};

View File

@@ -970,6 +970,151 @@ exports[`Newlines via \\\\ and \\newline \\\\ causes newline, even after mrel an
`;
exports[`href and url commands should allow all protocols when trust option is true 1`] = `
[
{
"type": "href",
"body": [
{
"type": "mathord",
"loc": {
"end": 16,
"lexer": {
"input": "\\\\href{ftp://x}{foo}",
"lastIndex": 19
},
"start": 15
},
"mode": "math",
"text": "f"
},
{
"type": "mathord",
"loc": {
"end": 17,
"lexer": {
"input": "\\\\href{ftp://x}{foo}",
"lastIndex": 19
},
"start": 16
},
"mode": "math",
"text": "o"
},
{
"type": "mathord",
"loc": {
"end": 18,
"lexer": {
"input": "\\\\href{ftp://x}{foo}",
"lastIndex": 19
},
"start": 17
},
"mode": "math",
"text": "o"
}
],
"href": "ftp://x",
"mode": "math"
}
]
`;
exports[`href and url commands should allow explicitly allowed protocols 1`] = `
[
{
"type": "href",
"body": [
{
"type": "mathord",
"loc": {
"end": 16,
"lexer": {
"input": "\\\\href{ftp://x}{foo}",
"lastIndex": 19
},
"start": 15
},
"mode": "math",
"text": "f"
},
{
"type": "mathord",
"loc": {
"end": 17,
"lexer": {
"input": "\\\\href{ftp://x}{foo}",
"lastIndex": 19
},
"start": 16
},
"mode": "math",
"text": "o"
},
{
"type": "mathord",
"loc": {
"end": 18,
"lexer": {
"input": "\\\\href{ftp://x}{foo}",
"lastIndex": 19
},
"start": 17
},
"mode": "math",
"text": "o"
}
],
"href": "ftp://x",
"mode": "math"
}
]
`;
exports[`href and url commands should forbid relative URLs when trust option is false 1`] = `
[
{
"type": "color",
"body": [
{
"type": "text",
"body": [
{
"type": "textord",
"mode": "text",
"text": "\\\\"
},
{
"type": "textord",
"mode": "text",
"text": "h"
},
{
"type": "textord",
"mode": "text",
"text": "r"
},
{
"type": "textord",
"mode": "text",
"text": "e"
},
{
"type": "textord",
"mode": "text",
"text": "f"
}
],
"mode": "math"
}
],
"color": "#cc0000",
"mode": "math"
}
]
`;
exports[`href and url commands should not affect spacing around 1`] = `
[
{
@@ -1062,3 +1207,46 @@ exports[`href and url commands should not affect spacing around 1`] = `
}
]
`;
exports[`href and url commands should not allow explicitly disallow protocols 1`] = `
[
{
"type": "color",
"body": [
{
"type": "text",
"body": [
{
"type": "textord",
"mode": "text",
"text": "\\\\"
},
{
"type": "textord",
"mode": "text",
"text": "h"
},
{
"type": "textord",
"mode": "text",
"text": "r"
},
{
"type": "textord",
"mode": "text",
"text": "e"
},
{
"type": "textord",
"mode": "text",
"text": "f"
}
],
"mode": "math"
}
],
"color": "#cc0000",
"mode": "math"
}
]
`;

View File

@@ -2682,17 +2682,17 @@ describe("href and url commands", function() {
it("should allow letters [#$%&~_^] without escaping", function() {
const url = "http://example.org/~bar/#top?foo=$foo&bar=ba^r_boo%20baz";
const parsed1 = getParsed(`\\href{${url}}{\\alpha}`)[0];
const parsed1 = getParsed(`\\href{${url}}{\\alpha}`, new Settings({trust: true}))[0];
expect(parsed1.href).toBe(url);
const parsed2 = getParsed(`\\url{${url}}`)[0];
const parsed2 = getParsed(`\\url{${url}}`, new Settings({trust: true}))[0];
expect(parsed2.href).toBe(url);
});
it("should allow balanced braces in url", function() {
const url = "http://example.org/{{}t{oo}}";
const parsed1 = getParsed(`\\href{${url}}{\\alpha}`)[0];
const parsed1 = getParsed(`\\href{${url}}{\\alpha}`, new Settings({trust: true}))[0];
expect(parsed1.href).toBe(url);
const parsed2 = getParsed(`\\url{${url}}`)[0];
const parsed2 = getParsed(`\\url{${url}}`, new Settings({trust: true}))[0];
expect(parsed2.href).toBe(url);
});
@@ -2706,9 +2706,9 @@ describe("href and url commands", function() {
it("should allow escape for letters [#$%&~_^{}]", function() {
const url = "http://example.org/~bar/#top?foo=$}foo{&bar=bar^r_boo%20baz";
const input = url.replace(/([#$%&~_^{}])/g, '\\$1');
const parsed1 = getParsed(`\\href{${input}}{\\alpha}`)[0];
const parsed1 = getParsed(`\\href{${input}}{\\alpha}`, new Settings({trust: true}))[0];
expect(parsed1.href).toBe(url);
const parsed2 = getParsed(`\\url{${input}}`)[0];
const parsed2 = getParsed(`\\url{${input}}`, new Settings({trust: true}))[0];
expect(parsed2.href).toBe(url);
});
@@ -2717,31 +2717,43 @@ describe("href and url commands", function() {
});
it("should be marked up correctly", function() {
const markup = katex.renderToString(r`\href{http://example.com/}{example here}`);
const markup = katex.renderToString(r`\href{http://example.com/}{example here}`, {trust: true});
expect(markup).toContain("<a href=\"http://example.com/\">");
});
it("should allow protocols in allowedProtocols", function() {
expect("\\href{relative}{foo}").toParse();
expect("\\href{ftp://x}{foo}").toParse(new Settings({
allowedProtocols: ["ftp"],
}));
expect("\\href{ftp://x}{foo}").toParse(new Settings({
allowedProtocols: ["*"],
}));
});
it("should not allow protocols not in allowedProtocols", function() {
expect("\\href{javascript:alert('x')}{foo}").not.toParse();
expect("\\href{relative}{foo}").not.toParse(new Settings({
allowedProtocols: [],
}));
});
it("should not affect spacing around", function() {
const built = getBuilt`a\href{http://example.com/}{+b}`;
const built = getBuilt("a\\href{http://example.com/}{+b}", new Settings({trust: true}));
expect(built).toMatchSnapshot();
});
it("should forbid relative URLs when trust option is false", () => {
const parsed = getParsed("\\href{relative}{foo}");
expect(parsed).toMatchSnapshot();
});
it("should allow explicitly allowed protocols", () => {
const parsed = getParsed(
"\\href{ftp://x}{foo}",
new Settings({trust: (context) => context.protocol === "ftp"}),
);
expect(parsed).toMatchSnapshot();
});
it("should allow all protocols when trust option is true", () => {
const parsed = getParsed(
"\\href{ftp://x}{foo}",
new Settings({trust: true}),
);
expect(parsed).toMatchSnapshot();
});
it("should not allow explicitly disallow protocols", () => {
const parsed = getParsed(
"\\href{javascript:alert('x')}{foo}",
new Settings({trust: context => context.protocol !== "javascript"}),
);
expect(parsed).toMatchSnapshot();
});
});
describe("A raw text parser", function() {

View File

@@ -73,7 +73,9 @@ describe("A MathML builder", function() {
});
it('should set href attribute for href appropriately', () => {
expect(getMathML("\\href{http://example.org}{\\alpha}")).toMatchSnapshot();
expect(
getMathML("\\href{http://example.org}{\\alpha}", new Settings({trust: true})),
).toMatchSnapshot();
expect(getMathML("p \\Vdash \\beta \\href{http://example.org}{+ \\alpha} \\times \\gamma"));
});

View File

@@ -70,7 +70,8 @@
var settings = {
displayMode: !!query["display"],
throwOnError: !query["noThrow"]
throwOnError: !query["noThrow"],
trust: true // trust test inputs
};
if (query["errorColor"]) {
settings.errorColor = query["errorColor"];

View File

@@ -29,7 +29,8 @@ module.exports = function(md, options) {
const katex = require("../../");
function renderKatex(source, displayMode) {
return katex.renderToString(source, {displayMode, throwOnError: false});
return katex.renderToString(source,
{displayMode, throwOnError: false, trust: true});
}
/**