* Added check for type of expressions passed to parseTree function
* Added tests for bad input raising exception
* Added test for supported types NOT throwing exception
* Added test case for parser taking String objects
Summary:
This diff provides support for Latin-1, Cyrillic, and CJK characters
inside \text{} groups. For Latin-1 and Cyrillic characters we use
glyph metrics from a glyph from Basic Latin that has roughly the same
bounding box. We use the metrics for a capital 'M' to approximate the
full-width CJK characters. Half-width characters are not supported yet.
Test Plan:
- make test
- make screenshots
Reviewers: emily
This adds support for the following input sequences:
-- --- ` ' `` '' \degree \pounds \maltese
resulting in – — ‘ ’ “ ” ° £ ✠ symbols already present in our fonts.
As part of this modification, the recognition of multiple dashes was moved
from the lexer to the parser.
This is neccessary since in math mode a sequence of hyphens is just a
sequence of minus signs. Just like a pair of apostrophes in math mode is a
double prime not a right double quotation mark.
To make this easier, parseGroup and parseOptionalGroup have been merged.
* Introduce MacroExpander
The job of the MacroExpander is turning a stream of possibly expandable
tokens, as obtained from the Lexer, into a stream of non-expandable tokens
(in KaTeX, even though they may well be expandable in TeX) which can be
processed by the Parser. The challenge here is that we don't have
mode-specific lexer implementations any more, so we need to do everything on
the token level, including reassembly of sizes and colors.
* Make macros available in development server
Now one can specify macro definitions like \foo=bar as part of the query
string and use these macros in the formula being typeset.
* Add tests for macro expansions
* Handle end of input in special groups
This avoids an infinite loop if input ends prematurely.
* Simplify parseSpecialGroup
The parseSpecialGroup methos now returns a single token spanning the whole
special group, and leaves matching that string against a suitable regular
expression to whoever is calling the method. Suggested by @cbreeden.
* Incorporate review suggestions
Add improvements suggested by Kevin Barabash during review.
* Input range sanity checks
Ensure that both tokens of a token range come from the same lexer,
and that the range has a non-negative length.
* Improved wording of two comments
Summary: The KaTex renderer used to use old Khan Academy colors when displaying colored text. The configuration is now updated to have the new colors.
Test Plan:
- verified that colored text now uses the new colors in a browser
- running commands such as `\blueA{blueA}` will style the text in the blueA color from the configuration
Reviewers: emily, kevinb
Reviewed By: kevinb
Differential Revision: https://phabricator.khanacademy.org/D27963
Summary:
This only supports em and ex units and doesn't handle vertical layouts.
Negative kerning works.
Test Plan:
- make test
- make screenshots (verify that d is slightly overlapping c in the screenshots)
Reviewers: emily
Summary:
Alpert alerted me to the fact that \centerdot and \cdot are
not the same despite what MathJax thinks.
Test Plan:
- make serve
- load http://localhost:7936/
- see the `a \centerdot b` produces a small, bottom-aligned square
Auditors: alpert emily
Summary:
Update the symbol definition for \centerdot so that it does
the same thing as \cdot.
Fixes https://github.com/Khan/KaTeX/issues/421.
Test Plan:
- make serve
- open http://localhost:7936/
- verify that `a \centerdot b` looks the same as `a \cdot b`
Auditors: emily
Summary
We'd like contributors to use the same linter and lint rules that we use
internally. This diff swaps out eslint for jshint and fixes all lint failures
except for the max-len failures in the test suites.
Test Plan:
- ka-lint src
- make lint
- make test
Reviewers: emily
This is almost like the align* environment, but it starts out in math mode,
so we don't have to worry about the fact that we have no real surrounding
text mode in KaTeX. This is the first step towards align* and align.
Instead of passing around the current position as an argument, we now have a
parser property called pos to keep track of that. Instead of repeatedly
re-lexing at the current position we now have a property called nextToken
which contains the token beginning at the current position. We may need to
re-lex if we switch mode. Since the position is kept in the parser state,
we don't need to return it from parsing methods, which obsoletes the
ParseResult class.
Summary:
The ability to use pre-determined character widths will benefit alternative
layout engines such as gagern's canvas layout engine. I would also like to
experiment would using CSS transforms to absolutely position each glyph. This
diff adds a new make rule, make extended_metrics, which generates metrics that
also containing glyph widths.
Test Plan:
- run `make extended_metrics`
- verify that fontMetricsData.js contains entries with 5 numbers instead of 4
Reviewers: emily alpert
There are two main motivations for this commit. One is unicode input, which
requires unicode characters to get past the lexer. See discussion in #261.
The second is in preparation for #266, where we'd deal with one token of
look-ahead but might be lexing that token in an unknown mode in some cases.
The unit test shipped with this commit addresses the latter concern, since
it checks that a math-mode-only token may immediately follow some text mode
content group.
In this new implementation, all the various things that could get matched
have been collected into a single regular expression. The hope is that
this will be beneficial for performance and keep the code simpler.
The code was written with Unicode input in mind, including non-BMP codepoints.
The role of the lexer as a gate keeper, keeping out invalid TeX syntax, has
been abandoned. That role is still fulfilled by the symbols and functions
tables, though, since any input which is neither a symbol nor a command is
still considered invalid input, even though it lexes successfully.
Fixes issue #255.
Mixing the variable number of arguments a function receives from TeX code
with the fixed arguments which the parser provides can cause some confusion.
After this change, a handler will receive exactly two arguments: one is a
context object from which things provided by the parser can be accessed by
name, which allows for simple extensions in the future. The other is the
list of TeX arguments, passed as an array.
If we ever switch to EcmaScript 2015, we might want to use its destructuring
features to name the elements of the args array in the function head. Until
then, destructuring that array manually immediately at the beginning of the
function seems like a useful convention to easily find the meaning of these
arguments.
Having long object literals containing the code is problematic.
It makes it difficult to add auxiliary functions or data close to the
function inside the map where it is needed.
Building the map in several steps, repeating the map name at each step,
avoids that problem since it makes the definitions independent from one
another, so anything can go between them.
This commit deliberately avoided reindenting existing code to match the new
surroundings. That way it is easier to see where actual changes happen,
even when not performing a whitespace-ignoring diff.
Having one long array literal to contain the code of all environment
implementations is problematic. It makes it difficult to add auxiliary
functions or data close to the function inside the list where it is needed.
Now the functions are no longer defined using such a literal, but instead
using calls to a "defineEnvironment" function which receives all the
relevant data. Since each function call is independent from the others,
anything can go in between.
This commit deliberately avoided reindenting existing code to match the new
surroundings. That way it is easier to see where actual changes happen,
even when not performing a whitespace-ignoring diff.
Using function calls instead of one big object literal for the symbols makes
the notation far more concise and readable. Having the actual symbol name
in the last position helps aligning the preceding columns, making the list
easier to read.
Another benefit is that all symbol definitions now pass through a single
function, where additional processing (e.g. for Unicode input) might take
place in a future commit.
Since the previous commit deliberately avoided reindenting, this one here
does just that: reindenting the existing code. There are no other changes.
Notice how the new indentation leaves more room to function handlers.
Having one long array literal to contain the code of all function
implementations is problematic. It makes it difficult to add auxiliary
functions or data close to the function inside the list where it is needed.
Now the functions are no longer defined using such a literal, but instead
using calls to a "declareFunction" function which receives all the relevant
data. Since each function call is independent from the others, anything can
go in between.
This commit deliberately avoided reindenting existing code to match the new
surroundings. That way it is easier to see where actual changes happen,
even when not performing a whitespace-ignoring diff.
Also, the MathBb-chrome test changed, to what I believe is the correct
result? Not sure why it looked wrong before.
Test plan:
- `make test`
- take screenshots, see nothing changed.
This adds the ability to add `|` to a column description and have
vertical separators be added. I added types to the column descriptions
and added some logic to handle the separators when building the vertical
lists of the array.
Test plan:
- See the Arrays screenshot looks good.
- `make test`