KDevelop can't load project: "too large document"?

Discussion:

KDevelop can't load project: "too large document"?

Matthew Woehlke

2017-12-05 20:07:52 UTC

I'm trying to load my KDevelop project created for
https://github.com/kitware/kwiver. However, KDevelop says that it is
"loading project", but never finishes. When I dump KDevelop's stack
(pstack), all of the threads seem to be waiting (in poll or
pthread_cond_wait).

When I run latest master, I get:

kdevelop.plugins.cmake: error processing "too large document"

...followed by 100 MiB of JSON that appears to be the output of CMake
parsing the project.

Any thoughts? I'd *really* like to be able to open this project!

--
Matthew

René J.V. Bertin

2017-12-05 20:21:41 UTC

Post by Matthew Woehlke
Any thoughts? I'd *really* like to be able to open this project!

What's your cmake version, probably a new enough version to have server mode? A while back certain projects (includng KDevelop itself IIRC) didn't import correctly with server mode, so I implemented a "noserver" mode in the cmake wrapper script I use most of the time. I still use that for most projects as server-based import is noticeably slower (or was, last time I compared). The crucial part is this:

while [ $# != 0 ] ;do
case $1 in
-E)
# chdir into the actual directory = get rid of any and all symlinks on the current wdir
cd `greadlink -f "${PWD}"`
if [ "${2}" = "server" -a "`basename $0`" = "cmakewrapper_noserver" ] ;then
# emulate the error message from an older CMake version
( echo "CMake Error: cmake version 3.0.1 (faked to avoid server mode)"
echo "Usage: cmake -E [command] [arguments ...]"
) 1>&2
exit 1
fi
exec cmake "$@"
;;

If the master branch still has the compile_commands.json fallback (which I hope) you could try a similar approach to see if that makes any difference.

Then again, do you have a compile_commands.json file in your build directory, and what size is it?

You always have the possibility to import the project with a simpler project manager (= don't import the CMakeLists.txt file).

R.

Matthew Woehlke

2017-12-05 20:33:42 UTC

Post by RenÃ© J.V. Bertin

Post by Matthew Woehlke
Any thoughts? I'd *really* like to be able to open this project!

What's your cmake version, probably a new enough version to have server mode?

Yes:

$ rpm -q cmake
cmake-3.10.0-1.fc26.x86_64

I've also noticed CMake server-mode processes running associated with
KDevelop.

I did notice that the backtraces are a significant part of the JSON...
does KDevelop need those, and if not, is there a way to omit or strip them?

Post by RenÃ© J.V. Bertin
Then again, do you have a compile_commands.json file in your build directory, and what size is it?

$ ll -h compile_commands.json
-rw-rw-r--. 1 matthew matthew 817K Dec 5 15:00 compile_commands.json

--
Matthew

René J.V. Bertin

2017-12-05 21:31:49 UTC

Post by Matthew Woehlke
I did notice that the backtraces are a significant part of the JSON...
does KDevelop need those, and if not, is there a way to omit or strip them?

No idea.

I'd suggest to take a good look where that too-large error message really came from. I know next to nothing about JSON processing, but I seem to recall that you have to read the entire file before you can validate and parse it. If so, it seems reasonable that there's an upper limit, and that KDevelop would need to detect when this limit is reached and handle that situation gracefully (like by falling back to using the legacy import based on the much smaller compile_commands.json file).

Post by Matthew Woehlke
$ ll -h compile_commands.json
-rw-rw-r--. 1 matthew matthew 817K Dec 5 15:00 compile_commands.json

Compared to 100Mb ... that's a sizeable difference!

R.

Matthew Woehlke

2017-12-05 21:48:24 UTC

Post by RenÃ© J.V. Bertin

Post by Matthew Woehlke
I did notice that the backtraces are a significant part of the JSON...
does KDevelop need those, and if not, is there a way to omit or strip them?

No idea.

After looking at the CMake source code: not AFAICT :'(.

I don't know (didn't look) if KDevelop is interested in them; I believe
they are new in 3.10...

Post by RenÃ© J.V. Bertin
I'd suggest to take a good look where that too-large error message
really came from.

It comes from cmakeserver.cpp:150:

auto doc = QJsonDocument::fromJson(data, &error);
if (error.error) {
qCWarning(CMAKE) << "error processing" << error.errorString()
<< data;
}

Basically, QJson took one look at the size of the data and punted. In
particular, see also https://bugreports.qt.io/browse/QTBUG-58652.

Post by RenÃ© J.V. Bertin
it seems reasonable that there's an upper limit, and that KDevelop
would need to detect when this limit is reached and handle that
situation gracefully

Indeed. At the very least, it shouldn't just quit trying to import the
project, leaving KDevelop in a weird state :-).

Post by RenÃ© J.V. Bertin

Post by Matthew Woehlke
$ ll -h compile_commands.json
-rw-rw-r--. 1 matthew matthew 817K Dec 5 15:00 compile_commands.json

Compared to 100Mb ... that's a sizeable difference!

Yup.

The culprit is
https://gitlab.kitware.com/cmake/cmake/merge_requests/992. That
increased the size of the JSON by about 20x. The *pretty*-printed JSON,
if I strip out the stuff that cmake!992 added, is only about 5 MiB.
(Which is still lots more than the compile commands, but I suspect the
code model dump, even without the traces, has more and better structured
information.)

--
Matthew

René J.V. Bertin

2017-12-05 22:38:26 UTC

Post by Matthew Woehlke
After looking at the CMake source code: not AFAICT :'(.

Server output shouldn't become that big, even if you didn't had to wait for the entire output to be complete. Parsing something that's 20x larger than you need must come at a noticeable cost.

Post by Matthew Woehlke
I don't know (didn't look) if KDevelop is interested in them

It can clearly do without them...

Post by Matthew Woehlke
Basically, QJson took one look at the size of the data and punted. In
particular, see also https://bugreports.qt.io/browse/QTBUG-58652.

Looking at cmakemanager.cpp that probably means that you'd need to catch errors like this in ChooseCMakeInterfaceJob::successfulConnection() and make sure the failedConnection(int) slot is called on error.

Maybe you can simply do

connect(job, &CMakeServerImportJob::result, this, [this, job](){
if (job->error() == 0) {
manager->integrateData(job->projectData(), job->project());
} else {
failedConnection(job->error());
}
});

Post by Matthew Woehlke
Indeed. At the very least, it shouldn't just quit trying to import the
project, leaving KDevelop in a weird state :-).

From the looks of it the cmake importer doesn't do anything in terms of error handling.

Post by Matthew Woehlke
(Which is still lots more than the compile commands, but I suspect the
code model dump, even without the traces, has more and better structured
information.)

Yes, though I don't really miss it.

R.

Matthew Woehlke

2017-12-06 17:26:26 UTC

Post by RenÃ© J.V. Bertin
Maybe you can simply do
connect(job, &CMakeServerImportJob::result, this, [this, job](){
if (job->error() == 0) {
manager->integrateData(job->projectData(), job->project());
} else {
failedConnection(job->error());
}
});

No luck. It appears that failing to parse the JSON is an unrecoverable
error. That is, I see no code to cope with that occurrence, and indeed,
it would normally trip an assert (I'm building RelWithDebInfo, so I
assume and expect asserts are not enabled). The correct fix seems to be
to modify the response handler to handle failing to parse the response.
See attached patch.

The above doesn't *seem* to be necessary with this patch, but I might be
wrong about that.

Post by RenÃ© J.V. Bertin

Post by Matthew Woehlke
Indeed. At the very least, it shouldn't just quit trying to import the
project, leaving KDevelop in a weird state :-).

From the looks of it the cmake importer doesn't do anything in terms of error handling.

Handling an error from the *server* appears to be handled. The patch
hijacks this by synthesizing a response that looks like a server error,
and passing that back up the stack instead of the "real" response which
could not be parsed.

At any rate, see also
https://gitlab.kitware.com/cmake/cmake/issues/17502, especially Brad
King's most recent comment. The plan is to revert cmake!992 for CMake
3.10.1. I still think KDevelop should handle this more gracefully,
though; failing to load the project *and* getting stuck in a state that
the load can't even be canceled is... less than optimal :-).

--
Matthew

René J.V. Bertin

2017-12-06 17:56:17 UTC

Post by Matthew Woehlke

Post by RenÃ© J.V. Bertin
connect(job, &CMakeServerImportJob::result, this, [this, job](){
if (job->error() == 0) {
manager->integrateData(job->projectData(), job->project());
} else {
failedConnection(job->error());
}
});

No luck. It appears that failing to parse the JSON is an unrecoverable
error. That is, I see no code to cope with that occurrence, and indeed,

The suggested patch above assumes that the cmake server job still emits the result signal but with an error set (which apparently is not unheard of or else there would have been one of those "graceful" ASSERTs on job->error().
I didn't check whether that assumption was correct.

Post by Matthew Woehlke
(I'm building RelWithDebInfo, so I
assume and expect asserts are not enabled).

(You can also use a custom build type and set the exact compiler options you want via CFLAGS and CXXFLAGS in the environment. That way you're sure you won't get the benefit of all the advanced error handling implemented via asserts ;) )

Post by Matthew Woehlke
The above doesn't *seem* to be necessary with this patch, but I might be
wrong about that.

No, if errors signalled via a JSON document are indeed handled my suggestion shouldn't be required to handle this particular situation.

Post by Matthew Woehlke
Handling an error from the *server* appears to be handled. The patch
hijacks this by synthesizing a response that looks like a server error,
and passing that back up the stack instead of the "real" response which
could not be parsed.

OK, so what's the end result of this? Is the project imported using the older importer, or do you just get a failure message? If the latter, you might be able to implement a combined fix by setting the server import job's error flag to a non-zero value and making sure CMakeServerImportJob::result is emitted as soon as possible. With the patch above you should then trigger the fallback importer.
(Sorry for not checking this any further, I've got my head wrapped around other things ATM.)

Post by Matthew Woehlke
3.10.1. I still think KDevelop should handle this more gracefully,
though; failing to load the project *and* getting stuck in a state that
the load can't even be canceled is... less than optimal :-).

Indeed, because as long as Qt's JSON classes have a fixed size limit this error can occur with any big enough project. (The rebooted QtWebKit project for instance, which now uses cmake).

Cancelling a project import was never a good idea in KDevelop and AFAIK not officially supported.

R

Matthew Woehlke

2017-12-06 19:52:24 UTC

Post by RenÃ© J.V. Bertin

(I'm building RelWithDebInfo, so I assume and expect asserts are
not enabled).

(You can also use a custom build type and set the exact compiler
options you want via CFLAGS and CXXFLAGS in the environment. That
way you're sure you won't get the benefit of all the advanced error
handling implemented via asserts ;) )

Sure :-). That was more in line with noting why I didn't hit the assert
immediately when I tried the built-from-source version.

Post by RenÃ© J.V. Bertin

Handling an error from the *server* appears to be handled. The patch
hijacks this by synthesizing a response that looks like a server error,
and passing that back up the stack instead of the "real" response which
could not be parsed.

OK, so what's the end result of this? Is the project imported using
the older importer, or do you just get a failure message?

I'm... not sure? I seem to have identified targets, anyway...

(More critically, the project *loaded*, so at worst I guess I have the
equivalent of it being a generic project, but also with the build
integration, at least at the project level.)

Maybe it would be better for someone familiar with the code to test. It
should be easy enough to make the "codemodel" code path (in
CMakeServerImportJob::processResponse) emit an error a la the "error"
code path a couple lines below. This should have the same effect as when
the problem actually manifests.

Post by RenÃ© J.V. Bertin
you might be able to implement a combined fix by setting the server
import job's error flag to a non-zero value and making sure
CMakeServerImportJob::result is emitted as soon as possible.

That... *might* be what is happening? Again, I am not familiar with the
code, so I am making many WAGs, but both the "codemodel" and "error"
response handlers call `emitResult()`, the latter having set an error first.

What confuses me is that code in cmakemanager.cpp makes me think that
*any* response from the job triggers a failure and falling back on the
"old" code.

--
Matthew

René J.V. Bertin

2017-12-06 20:12:17 UTC

Post by Matthew Woehlke
What confuses me is that code in cmakemanager.cpp makes me think that
*any* response from the job triggers a failure and falling back on the
"old" code.

Unless I've been missing something the old code is only used when the cmake server job fails to *start*. That would happen if you don't have cmake, or if it returns an error immediately (that's what I'm exploiting to avoid using cmake server mode).
I'll have another look, but didn't see anything suggesting that such a fallback exists after the server job has started successfully.

R.

Matthew Woehlke

2017-12-06 20:25:57 UTC

Post by RenÃ© J.V. Bertin

Post by Matthew Woehlke
What confuses me is that code in cmakemanager.cpp makes me think that
*any* response from the job triggers a failure and falling back on the
"old" code.

Unless I've been missing something the old code is only used when the
cmake server job fails to *start*. That would happen if you don't
have cmake, or if it returns an error immediately (that's what I'm
exploiting to avoid using cmake server mode). I'll have another look,
but didn't see anything suggesting that such a fallback exists after
the server job has started successfully.

Ah, okay, I see; I was mixing up the server and the job. So... yeah, it
does seem like your patch would also help.

--
Matthew

René J.V. Bertin

2017-12-08 21:35:58 UTC

Post by Matthew Woehlke
Ah, okay, I see; I was mixing up the server and the job. So... yeah, it
does seem like your patch would also help.

Have you been able to look into this some more?

I had a better look myself and realised I probably missed a few details before. I was right that the choice between cmake-server import and "legacy" import is made depending on if cmake starts in server mode. I'm less certain that it's trivial to come back from that decision: once cmake accepts to run in server mode the `successfulConnection()` method apparently controls the project for as long as the server runs. We'd have to figure out how this implementation reacts to cmake exitting for instance.
But the main point here is that your oversized JSON object is encountered in a function that also seems to handle all other server replies. Evidently you wouldn't want to start a legacy import procedure if some trivial query returns an error after the project has already been imported successfully.

R.

Matthew Woehlke

2017-12-08 21:59:51 UTC

Post by RenÃ© J.V. Bertin

Post by Matthew Woehlke
Ah, okay, I see; I was mixing up the server and the job. So... yeah, it
does seem like your patch would also help.

Have you been able to look into this some more?

Hmm, weeeeell... I hadn't, but I tried just now, and with both patches,
it went boom (in the CMakeServerJob's dtor).

Post by RenÃ© J.V. Bertin
But the main point here is that your oversized JSON object is

encountered in a function that also seems to handle all other server
replies. Evidently you wouldn't want to start a legacy import procedure
if some trivial query returns an error after the project has already
been imported successfully.

True. Although, with CMake's current design, I'm not sure there *are*
meaningful queries that might be made after asking for the code model.
Anyway, I'm not fiddling in that logic; the response handler *already*
has that behavior. (A "normal" error from the server already causes the
job to quit. All I did was twist a QJson error into looking like an
error from the server in order to hijack that.) So, I guess whoever
wrote that code originally already wasn't too worried about that case.

At any rate, it's probably safe enough to apply at least my patch, which
gets us to an assuredly better position than we're at otherwise. (Can
you do that? Or else remind me how to submit changes for KDevelop? It's
been a long time since I submitted anything, and I believe the process
has changed in the mean time...)

--
Matthew

Sven Brauch

2017-12-08 22:04:21 UTC

Post by Matthew Woehlke
At any rate, it's probably safe enough to apply at least my patch, which
gets us to an assuredly better position than we're at otherwise.

Hmm, again, why is this happening? Isn't this a bug in cmake?

Best,
Sven

René J.V. Bertin

2017-12-08 22:56:35 UTC

Post by Sven Brauch
Hmm, again, why is this happening? Isn't this a bug in cmake?

Yes (or a feature that should have a disable option) but not the fixed upper limit to JSON processing. That's given that will probably also be hit without the useless junk in the cmake output, for large enough projects.

Matthews way of handling JSON errors is not just a workaround for a particular CMake bug but simply makes sense IMHO. That JSON input doesn't come from some failsafe helper (because a 3rd party product) so the code processing it should be prepared to handle errors gracefully when they occur. If it already does this for "normal server errors" then it seems logical to handle JSON errors the same way (they are caused by the server doing something wrong after all).

R.

René J.V. Bertin

2017-12-09 14:57:26 UTC

FWIW, importing QtWebKit 5.212 Alpha2 via cmake 3.9.3:

compile_commands.json : 51Mb
largest cmake server JSON reply: 0.52Mb (calculated as QByteArray data.size/1024/1024.0)

With cmake 3.10 the JSON replies do increase but the largest is still only 0.61Mb.

I haven't tried with the project from kitware that Matthew has been having issues with but can hardly imagine it'll be larger than QtWebKit. What seems more likely is that QtWebKit uses a very simple cmake build system, maybe partly based on using qmake even if 51Mb of compile commands isn't exactly a puny amount.

R.

Matthew Woehlke

2017-12-11 16:51:34 UTC

Post by RenÃ© J.V. Bertin
compile_commands.json : 51Mb
largest cmake server JSON reply: 0.52Mb (calculated as QByteArray data.size/1024/1024.0)
With cmake 3.10 the JSON replies do increase but the largest is still only 0.61Mb.

This is... curious...

In KDevelop, I am getting back a code model (for KWIVER) that is
108572484 bytes (about 103 MiB). However, if I write out the same
commands to a text file and feed it to CMake via an input file, I am
getting back a total session of only 198885 bytes (less than 200 KiB,
about 67 KiB of which is progress reporting from configuring and
generating the project).

Using CMake master, KDevelop gets a code model (again, for KWIVER) of
3677870 bytes (about 3.5 MiB, which is consistent with what I got
stripping the backtraces from the 103 MiB response). Again, however,
feeding the same commands to CMake server via a text input file, I get
198885 bytes.

This is most puzzling. At any rate, what I'm getting for KWIVER remains
almost an order of magnitude larger than the numbers you are reporting
for QtWebKit.

One possible — I would even venture to say "likely" — explanation is
that QtWebKit has very few targets; `ninja -t targets` reports only 243
items (which is not likely to exactly correspond to what CMake considers
targets, but is probably of similar order of magnitude). In this
respect, it is actually a quite *small* project. By comparison, KWIVER
reports 2750, and KDevelop reports 1592. (Note: I did disable audio and
video because it couldn't find GStreamer dependencies and it was not
immediately obvious what needed to be installed.)

I would therefore be curious what numbers you are getting from KDevelop.

As another interesting data point: my QtWebKit compile_commands.json is
35 MiB (so, on order with yours), while for KWIVER it is only 820 KiB,
and for KDevelop, 4.6 MiB. So the size of compile_commands.json does not
appear to have any particular bearing on the size of the JSON code
model. (This is probably because the size of compile_commands.json
chiefly reflects the number of *source* files in a project, while the
size of the code model chiefly reflects the number of *targets*. KWIVER
has many targets — mainly, the unit tests — that have only a single
source file, and a good chunk of the libraries have only a handful of
source files.)

--
Matthew

Matthew Woehlke

2017-12-11 17:26:16 UTC

Post by Matthew Woehlke
This is... curious...
In KDevelop, I am getting back a code model (for KWIVER) that is
108572484 bytes (about 103 MiB). However, if I write out the same
commands to a text file and feed it to CMake via an input file, I am
getting back a total session of only 198885 bytes (less than 200 KiB,
about 67 KiB of which is progress reporting from configuring and
generating the project).

Sigh... never mind. For some reason, if I pipe the output straight to
`wc` (which is how I was measuring), it just quits after 198885 bytes.
If I redirect to a file instead, I'm getting the same sizes as KDevelop
(104 MiB and 3.6 MiB with 3.10 and master, respectively).

Repeating the experiment with some other projects, I get:

3.10 master
KDevelop: 4.6 MiB 2.9 MiB
QtWebKit: 869 KiB 696 KiB
VTK: 13 MiB 4.8 MiB

This does suggest there is something a bit pathological about KWIVER
w.r.t. backtraces...

--
Matthew

René J.V. Bertin

2017-12-11 19:32:36 UTC

Post by Matthew Woehlke

Post by Matthew Woehlke
This is... curious...

3.10 master
KDevelop: 4.6 MiB 2.9 MiB

KDevelop (the 5.2 branch) only gives me a 0.004Mb reply (plus those <= 1k which I don't track).

Post by Matthew Woehlke
This does suggest there is something a bit pathological about KWIVER
w.r.t. backtraces..

What happens when you configure the KWIVER project with equivalent arguments in the usual manual way from the command line?

Is it possible to invoke the cmake server with -Wno-dev? Failing to add that argument on the command line can increase the output considerably.

R.

Matthew Woehlke

2017-12-11 20:35:16 UTC

Post by RenÃ© J.V. Bertin

Post by Matthew Woehlke
3.10 master
KDevelop: 4.6 MiB 2.9 MiB

KDevelop (the 5.2 branch) only gives me a 0.004Mb reply (plus those <= 1k which I don't track).

Are you sure you're looking at the *code model*? That would require an
average of only 3 bytes per Ninja target. Even given that CMake targets
and Ninja targets are not one-to-one, 4 KiB seems... unlikely.

If that's *really* what you're looking at, I'd be interested to know
what's in my code model (attached; xz-compressed with paths replaced by
placeholders) that isn't in yours.

The attached .txt file can be fed as stdin (after replacing the dummy
paths with real paths) to `cmake -E server --debug --experimental` to
generate the code model (plus a bunch of other responses; the actual
code model will be the second-last line of the output). If you try that
experiment with KDevelop, how large is the total output? (In my
experience, the total output size is heavily dominated by the size of
the code model, so just looking at the total size is a good ballpark
estimate for the size of the code model.)

Post by RenÃ© J.V. Bertin
Is it possible to invoke the cmake server with -Wno-dev? Failing to
add that argument on the command line can increase the output
considerably.

Not to my knowledge, but why would you expect that to affect the code model?

I think maybe you are confusing the output of CMake configure / generate
(i.e. the server "configure" and "compute" commands) with the CMake code
model (server "codemodel" command)?

--
Matthew

Matthew Woehlke

2017-12-11 22:11:14 UTC

(Resend without the huge attachment; if a moderator is reading this,
please DO NOT approve the version with the 300 KiB attachment; thanks!
...and sorry 'bout that.)

Post by Matthew Woehlke
This does suggest there is something a bit pathological about KWIVER
w.r.t. backtraces...

This seemed odd, so I did a little more digging, with the help of a
quick tool I whipped together to compute JSON statistics (basically, it
records paths to values and counts how many values have a particular path).

From this, I was able to determine:

- KDevelop has 612 CMake targets producing 268 artifacts with an average
of 2.40 lines of cross-reference backtrace and an average of 9.45
related statements per target, which in turn have an average of 2.48
lines of backtrace.

- KWIVER has 7700Â¹ CMake targets producing 390 artifacts with an average
of 5.02 lines of cross-reference backtrace and an average of 28.84
related statements per target, which in turn have an average of 4.59
lines of backtrace.

The (mostly) raw numbers may help to illustrate this:

- KDevelop:

crossReferences: 608
backtrace: 608
path: 1460
relatedStatements: 608
backtrace: 5746
path: 14246

- KWIVER:

crossReferences: 7700
backtrace: 7700
path: 38668
relatedStatements: 7700
backtrace: 222050
path: 1018377

From casual inspection of the full JSON, the reason for the difference
can be inferred. KWIVER has many external dependencies that are mostly
consumed by global use of `add_definitions` and `include_directories`.
The large number of these account for the much larger average number of
related statements per target, while project organization accounts for
the larger average line count of backtraces. Combined with a larger
number of total targets, this produces a geometrical increase in the
size of the code model.

The target disparity is itself worth some investigation. Besides
appearing that many targets are duplicated four times, registered tests
also turn into ten targets (for various invocations of sanitizer tools);
thus, a single test target can turn into 40 targets *per test case*
contained in that target. Simply eliminating these reduces the target
list to a much more reasonable 500.

The raw statistics and (xz-compressed) raw KWIVER target list are attached.

(Â¹ KWIVER also has 42 projects to KDevelop's 1. I wouldn't be surprised
if there is some duplication going on here, especially considering the
disparity in targets-to-artifacts ratios. However, another interesting
note is that a large percentage of the targets - it appears to be well
over half - are CDash related. In particular, each test)

--
Matthew

René J.V. Bertin

2017-12-11 17:48:57 UTC

Post by Matthew Woehlke
model. (This is probably because the size of compile_commands.json
chiefly reflects the number of *source* files in a project, while the
size of the code model chiefly reflects the number of *targets*. KWIVER

Yes, compile_commands is chiefly a list of files with their corresponding directories, plus the compiler arguments. Those arguments can represent a considerable amount of text with cmake (and KF5 build systems), and evidently the size depends scales with the string length of the path to the build directory (quite long in my case).
I've never really looked what kind of information KDevelop gets from the cmake server, but a priori I think it also needs the compiler arguments (to hand off to the clang parser).

R.

Matthew Woehlke

2017-12-11 15:14:57 UTC

Post by Sven Brauch

Post by Matthew Woehlke
At any rate, it's probably safe enough to apply at least my patch, which
gets us to an assuredly better position than we're at otherwise.

Hmm, again, why is this happening? Isn't this a bug in cmake?

It's a confluence of three bugs or (mis)features:

- CMake sends an enormous response
- QJson refuses to parse said response¹
- KDevelop does not handle said failure gracefully

(¹ https://bugreports.qt.io/browse/QTBUG-58652)

In some senses, it's actually the last is the "most bug-like", and is
what my patch addresses. IMHO this should be fixed regardless, since it
is making KDevelop more robust against unexpected conditions at minimal
cost; it will provide more graceful degradation in case of QJson failure
for *any* reason. (In particular, as René notes, it should improve
things in the case that the server sends a corrupt response for some
reason. We don't *expect* that would ever happen, but right now, if it
did, it would cause the same symptoms.)

--
Matthew

Sven Brauch

2017-12-05 22:30:44 UTC

Post by Matthew Woehlke
...followed by 100 MiB of JSON that appears to be the output of CMake
parsing the project.
Any thoughts? I'd *really* like to be able to open this project!

That sounds like a bug in cmake, no? Why would it produce so much
output? What is in there, what in your project could cause it to be this
big?

Best,
Sven

Aleix Pol

2017-12-12 11:55:36 UTC

On Tue, Dec 5, 2017 at 9:07 PM, Matthew Woehlke

Post by Matthew Woehlke
I'm trying to load my KDevelop project created for
https://github.com/kitware/kwiver. However, KDevelop says that it is
"loading project", but never finishes. When I dump KDevelop's stack
(pstack), all of the threads seem to be waiting (in poll or
pthread_cond_wait).
kdevelop.plugins.cmake: error processing "too large document"
...followed by 100 MiB of JSON that appears to be the output of CMake
parsing the project.
Any thoughts? I'd *really* like to be able to open this project!

Hi,
I've been away for the most part of last week and I haven't been able
to look closely into the issue, but I'd say we should focus here on
making sure the server mode works rather looking closely at the json.

If you can narrow down the issue a bit I'll be happy to look into it.

Cheers!
Aleix

René J.V. Bertin

2017-12-12 12:55:22 UTC

Post by Aleix Pol
but I'd say we should focus here on
making sure the server mode works rather looking closely at the json.

Shouldn't both be done? If server mode means communication via json I'd say that handling the protocol in a reliable (if not rock-solid) way and ensuring you use the server correctly are about equally important. And if json is the aspect that will likely evolve the least wouldn't that be the thing to focus on first, iow the modification Matthew is proposing? It's not like that's a complex change with all kinds of possible side-effects under normal operation.

FWIW, even if I cannot reproduce Matthew's observations (an order of magnitude (or more) larger json replies) I do still observe another thing. Server mode import is at least 8x slower on my Linux system (for the same project, evidently; roughly 66s for QtWebKit vs. roughly 8s using the compile_commands.json file).

R

Matthew Woehlke

2017-12-12 15:24:46 UTC

Post by RenÃ© J.V. Bertin
FWIW, even if I cannot reproduce Matthew's observations (an order of
magnitude (or more) larger json replies)

Apparently you would need to actually configure KWIVER (which needs a
bunch of dependencies that you may need to get from Fletch). Per my last
message, there are several aspects to that project that combine to
create a geometric increase in the code model size.

Alternatively, please try with the attached project, which is very
simple but synthesizes similar pathological structure as KWIVER. I'm
getting a 160 MiB code model with this. (It has fewer targets â only
2100 â but has a higher count of related statements per target.)

Post by RenÃ© J.V. Bertin
I do still observe another thing. Server mode import is at least 8x
slower on my Linux system (for the same project, evidently; roughly
66s for QtWebKit vs. roughly 8s using the compile_commands.json
file).

That is almost certainly because CMake server needs to *reconfigure* the
project. With the compile_commands.json, that's already been done.

--
Matthew

René J.V. Bertin

2017-12-12 16:10:14 UTC

Post by Matthew Woehlke
Apparently you would need to actually configure KWIVER

That'd be the ultimate test, but I'm also not seeing similar sizes with other projects like KDevelop and QtWebKit.

Your test project however gave me a 200.5Mb or so reply (I had just the time to see that before the entire reply was dumped to the terminal) and a reload was triggered - I'm not running your patch).

After forcing the use of the legacy import mode all issues disappear.

Post by Matthew Woehlke
That is almost certainly because CMake server needs to *reconfigure* the
project.

Probably. (That's another thing I don't particularly like when using an IDE to edit projects that need to be configured and built in a tightly controlled environment.)

I checked with that test-project: initial import 54s, subsequent imports under 4s (or under 25s when I delete the compile_commands.json file, forcing a reconfigure).
BTW, I note that much of the time waiting for that reconfigure is in fact waiting for cmake to write the new build files to disk. We shouldn't have to wait for that, at least in server mode cmake ought to be smart enough to see when it won't write anything different to disk. I don't know if that's feasible, but it ought to be =]

R.

Matthew Woehlke

2017-12-12 16:50:38 UTC

Post by RenÃ© J.V. Bertin

Post by Matthew Woehlke
Apparently you would need to actually configure KWIVER

That'd be the ultimate test, but I'm also not seeing similar sizes
with other projects like KDevelop and QtWebKit.

...and the analysis I posted yesterday shows why; KWIVER has an
unusually high ratio of related statements, and a LOT of targets; more
than ten times as many as KDevelop (largely due to multiple utility
targets per test case). QtWebKit is actually a very poor test case for
the CMake server code model, as it has very few targets. I don't have
any to look at, but I would guess there are KF5 core libraries
(especially if any have extensive unit tests) that would be better test
cases.

Post by RenÃ© J.V. Bertin
Your test project however gave me a 200.5Mb or so reply

Yep, that's about the right order of magnitude :-)... probably your
paths are about 40% longer than mine, accounting for it being larger for
you.

The key point is that both the test project and KWIVER have on the order
of a million related statement paths (KWIVER has 1018377, the test
project has 2310000, or a little over twice as many). In both cases,
these account for the majority of the code model. The test project may
be artificial, but the resulting code model structure is grossly
representative of KWIVER. (Far more homogeneous of course, but
comparable to KWIVER's average.)

Again, compare that to KDevelop, which had only 5746 related path
statements; that's more than two orders of magnitude difference!

Post by RenÃ© J.V. Bertin
I checked with that test-project: initial import 54s, subsequent
imports under 4s (or under 25s when I delete the
compile_commands.json file, forcing a reconfigure). BTW, I note that
much of the time waiting for that reconfigure is in fact waiting for
cmake to write the new build files to disk.

Huh...

I'm seeing 14s to dump the code model. This is without using KDevelop,
just feeding the commands to CMake server directly, and redirecting the
output to /dev/null. Most of this is spent building the code model;
using CMake master (which does not produce backtraces), it drops to
1.3s. I'm not seeing a significant difference between from-scratch and
subsequent runs. (I'm also using Ninja; with Makefiles, it jumps to
about 2.4s.)

What enormous files are being written to disk? My build.ninja is only
637 KiB...

--
Matthew

René J.V. Bertin

2017-12-12 18:52:10 UTC

Post by Matthew Woehlke

Post by RenÃ© J.V. Bertin
That'd be the ultimate test, but I'm also not seeing similar sizes
with other projects like KDevelop and QtWebKit.

...and the analysis I posted yesterday shows why; KWIVER has an
unusually high ratio of related statements, and a LOT of targets; more

That's not the reason why you get larger values for those projects than I do.

Post by Matthew Woehlke
Yep, that's about the right order of magnitude :-)... probably your
paths are about 40% longer than mine, accounting for it being larger for
you.

60 bytes in this case.

Post by Matthew Woehlke
subsequent runs. (I'm also using Ninja; with Makefiles, it jumps to
about 2.4s.)

I'm using Makefiles, on a lowly 1.6Ghz N3150 CPU and a ZFS filesystem. Add some overhead from KDevelop and that probably explains the difference.

Post by Matthew Woehlke
What enormous files are being written to disk? My build.ninja is only
637 KiB...

My test-project/build directory is 49Mb and that's with lz4 compression. This is concentrated in the CMakeFiles directories which has 1 .dir folder per target (and I'm a bit at a loss determining whether the compression doesn't actually increase the effective size here).

Anyway, that's all a different issue from the original one.

R.

Matthew Woehlke

2017-12-12 19:39:44 UTC

Post by RenÃ© J.V. Bertin

Post by Matthew Woehlke

Post by RenÃ© J.V. Bertin
That'd be the ultimate test, but I'm also not seeing similar sizes
with other projects like KDevelop and QtWebKit.

...and the analysis I posted yesterday shows why; KWIVER has an
unusually high ratio of related statements, and a LOT of targets; more

That's not the reason why you get larger values for those projects than I do.

Sigh. I sent a reply to that that's been stuck in moderation.

You mentioned a KDevelop code model that is "0.004Mb", or about 4 KiB.
This seems... implausible, especially if it has backtraces. Either you
have enormously fewer targets than I do for some reason, or you aren't
looking at the (full) code model.

As an experiment, please feed the attached file as stdin to `cmake -E
server --debug --experimental` (modifying the paths in it to reflect
your KDevelop source/build) and look at the size of the output. (Note:
redirect it *all* to a file; trying to directly process it seems to
always truncate the output for me. You can feed the resulting output
file to `sed 'x;$!d' | wc -c` to get the code model size.)

If you're *really* getting a code model on the order of a few KiB, I
would be extremely curious to see what's in it...

(In the message that's stuck in moderation, I also included my own code
model, which is why it's waiting for moderator approval; it's 45K
XZ-compressed. I don't recall if that's the version with or without
backtraces. Here I am attaching just the target list; I skimmed it and
nothing seems unusual, and it does not contain any duplicates.)

Post by RenÃ© J.V. Bertin

Post by Matthew Woehlke
What enormous files are being written to disk? My build.ninja is only
637 KiB...

My test-project/build directory is 49Mb and that's with lz4
compression. This is concentrated in the CMakeFiles directories
which has 1 .dir folder per target (and I'm a bit at a loss
determining whether the compression doesn't actually increase the
effective size here).

Ah. *Wow*, same here; I hadn't thought to look at CMakeFiles. Okay, I
can think of many plausible reasons your *I/O* would be slower, so that
explains things. Thanks.

(I encourage you to try Ninja; besides being a very nice build tool in
general, the CMakeFiles for Ninja for that project is only 720 KiB ;-).
Also FWIW, `du -sh` shows me 46M, so yeah, compression is probably not
helping the size significantly.)

--
Matthew

René J.V. Bertin

2017-12-12 20:25:17 UTC

Post by Matthew Woehlke
You mentioned a KDevelop code model that is "0.004Mb", or about 4 KiB.

Actually, I mentioned a largest reply of that size. I'm not at all looking what those are replies to, just printing out the size (in Mb) for replies exceeding 1024 bytes. I don't know if the server can fragment its replies, but if that's the case it would be able to fly under my radar entirely, in theory.

Post by Matthew Woehlke
As an experiment, please feed the attached file as stdin to `cmake -E
server --debug --experimental` (modifying the paths in it to reflect

OK, my bad...

{"cookie":"","errorMessage":"No build system was generated yet.","inReplyTo":"codemodel","type":"error"}

This particular project wasn't meant to drive builds, only to serve as a clean/scratch workspace to do patch reviews and commits so it not being configured correctly was never a problem.

After setting that right (by forcing the project to import once with the legacy importer) I now see a largest server reply of 4.53Mb which does seem to correspond to the code model.

Post by Matthew Woehlke
(I encourage you to try Ninja; besides being a very nice build tool in
general, the CMakeFiles for Ninja for that project is only 720 KiB ;-).

I rarely find a significant every-day benefit to using Ninja. A much smaller build system size overhead would be one but I think that KDevelop saves the make/ninja choice at a global level, not per project. And since I really like to be able to chdir into a project sub-build-dir and do only partial builds from there (and instant incremental installs with install/fast) I just stick with good ole make.

R.

Matthew Woehlke

2017-12-12 21:09:36 UTC

Post by RenÃ© J.V. Bertin

Post by Matthew Woehlke
You mentioned a KDevelop code model that is "0.004Mb", or about 4 KiB.

Actually, I mentioned a largest reply of that size. I'm not at all
looking what those are replies to, just printing out the size (in Mb)
for replies exceeding 1024 bytes. I don't know if the server can
fragment its replies, but if that's the case it would be able to fly
under my radar entirely, in theory.

Nope, the code model is not fragmented. (Which is part of the problem;
if it was, we wouldn't have the problem of it being too big for QJson to
parse ;-). See also https://gitlab.kitware.com/cmake/cmake/issues/17539,
where we're discussing how to refactor the monolithic API going forward.)

Post by RenÃ© J.V. Bertin

Post by Matthew Woehlke
As an experiment, please feed the attached file as stdin to `cmake -E
server --debug --experimental` (modifying the paths in it to reflect

OK, my bad...
{"cookie":"","errorMessage":"No build system was generated yet.","inReplyTo":"codemodel","type":"error"}

:-)

Post by RenÃ© J.V. Bertin
After setting that right (by forcing the project to import once with
the legacy importer) I now see a largest server reply of 4.53Mb which
does seem to correspond to the code model.

Yup, that's consistent with what I'm seeing. Confusion sorted.

--
Matthew

René J.V. Bertin

2017-12-12 22:28:10 UTC

Post by Matthew Woehlke
parse ;-). See also https://gitlab.kitware.com/cmake/cmake/issues/17539,
where we're discussing how to refactor the monolithic API going forward.)

Call me short-sighted, but reading that discussion only strengthens my questioning of the general usefulness of the additional functionality cmake server mode provides. I didn't notice any must-have new features in cmake-based projects after upgrading to 5.2 nor do I miss any when I use the legacy fallback import mode - but then I use KDevelop mostly (but intensively) as an advanced code editor with integrated VCS, API documentation etc. The crucial things for KDevelop are thus probably which files are built and with what options (so the parser knows where to find all includes and what preprocessor symbols are defined). The main consideration for me (after parsing validity) is how long I have to wait for a project to open, and how (un)likely it is that cmake gets run outside of the possibly imperative controlled environment.

That discussion just made me realise I'd be very happy with a per-project setting to indicate which cmake mode is most appropriate.

R.

Matthew Woehlke

2017-12-12 15:06:57 UTC

Post by Aleix Pol
If you can narrow down the issue a bit I'll be happy to look into it.

What's to narrow down? At cmakeserver.cpp:150, QJson refuses to parse an
enormous payload, at which point the server goes out to lunch, because
it is waiting on a reply that it will never receive. I already posted a
patch (although it seems it is only enough for KDevelop to limp along
rather than getting totally stuck; the CMake support isn't really able
to recover from a server failure at this time).

The problem can be (artificially) reproduced by applying the attached patch.

--
Matthew

34 Replies
23 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Matthew Woehlke 2017-12-05 20:07:52 UTC

René J.V. Bertin 2017-12-05 20:21:41 UTC

Matthew Woehlke 2017-12-05 20:33:42 UTC

René J.V. Bertin 2017-12-05 21:31:49 UTC

Matthew Woehlke 2017-12-05 21:48:24 UTC

René J.V. Bertin 2017-12-05 22:38:26 UTC

Matthew Woehlke 2017-12-06 17:26:26 UTC

René J.V. Bertin 2017-12-06 17:56:17 UTC

Matthew Woehlke 2017-12-06 19:52:24 UTC

René J.V. Bertin 2017-12-06 20:12:17 UTC

Matthew Woehlke 2017-12-06 20:25:57 UTC

René J.V. Bertin 2017-12-08 21:35:58 UTC

Matthew Woehlke 2017-12-08 21:59:51 UTC

Sven Brauch 2017-12-08 22:04:21 UTC

René J.V. Bertin 2017-12-08 22:56:35 UTC

René J.V. Bertin 2017-12-09 14:57:26 UTC

Matthew Woehlke 2017-12-11 16:51:34 UTC

Matthew Woehlke 2017-12-11 17:26:16 UTC

René J.V. Bertin 2017-12-11 19:32:36 UTC

Matthew Woehlke 2017-12-11 20:35:16 UTC

Matthew Woehlke 2017-12-11 22:11:14 UTC

René J.V. Bertin 2017-12-11 17:48:57 UTC

Matthew Woehlke 2017-12-11 15:14:57 UTC

Sven Brauch 2017-12-05 22:30:44 UTC

Aleix Pol 2017-12-12 11:55:36 UTC

René J.V. Bertin 2017-12-12 12:55:22 UTC

Matthew Woehlke 2017-12-12 15:24:46 UTC

René J.V. Bertin 2017-12-12 16:10:14 UTC

Matthew Woehlke 2017-12-12 16:50:38 UTC

René J.V. Bertin 2017-12-12 18:52:10 UTC

Matthew Woehlke 2017-12-12 19:39:44 UTC

René J.V. Bertin 2017-12-12 20:25:17 UTC

Matthew Woehlke 2017-12-12 21:09:36 UTC

René J.V. Bertin 2017-12-12 22:28:10 UTC

Matthew Woehlke 2017-12-12 15:06:57 UTC

about - legalese

Loading...