Discussion:
KDevelop 5 too slow?
Alexander Shaduri
2017-04-22 14:27:04 UTC
Permalink
Hi,

I migrated from KDevelop 4 to 5 a few days ago (git master).
The first thing I noticed is that even for a very simple Qt5-based
project, KDevelop needs 2-3sec (or more) to auto-complete, which
is very slow compared to KDevelop 4. Actually, after simply modifying
any line it takes quite some time (sometimes over 5 seconds) for
the changes to get picked up and the error squiggles to go away.
All this on a quite fast machine (i7-2600 desktop).

Is this normal behavior, or is my build broken somehow?

KDev* release build from 2017-04-22 git master.
OpenSUSE 42.2, kde 5.9.4, clang 4.0.0, qt 5.8.0, gcc 4.8.5.
i7-2600, 16 GB RAM, SSD.

Thanks,
Alexander
Sven Brauch
2017-05-02 17:41:48 UTC
Permalink
Hi,
Post by Alexander Shaduri
Is this normal behavior, or is my build broken somehow?
I have seen this behaviour with some projects, but not with others
(orthogonal to size). I still don't know what causes it and it's very
annoying. :/

Greetings,
Sven
Aleix Pol
2017-05-02 21:53:04 UTC
Permalink
Post by Sven Brauch
Hi,
Post by Alexander Shaduri
Is this normal behavior, or is my build broken somehow?
I have seen this behaviour with some projects, but not with others
(orthogonal to size). I still don't know what causes it and it's very
annoying. :/
+1
René J. V. Bertin
2017-05-02 22:19:46 UTC
Permalink
Post by Sven Brauch
Post by Alexander Shaduri
Is this normal behavior, or is my build broken somehow?
I have seen this behaviour with some projects, but not with others
(orthogonal to size). I still don't know what causes it and it's very
annoying. :/
+1
Isn't that because of the unknown declaration fixer thingy which has a knack of
scanning and rescanning a potentially huge number of include files, on the
main/gui thread?

If so, yes, it's standard behaviour (I wouldn't call it "normal" :))

I proposed a workaround for that a while back which got shot down; Kevin
apparently has a cleaner fix.

R.
Sven Brauch
2017-05-02 22:38:10 UTC
Permalink
Post by René J. V. Bertin
Isn't that because of the unknown declaration fixer thingy which has a knack of
scanning and rescanning a potentially huge number of include files, on the
main/gui thread?
No, different issue, this one is in the parse jobs.

Greetings,
Sven
René J.V. Bertin
2017-05-03 07:36:38 UTC
Permalink
Post by Sven Brauch
Post by René J. V. Bertin
Isn't that because of the unknown declaration fixer thingy which has a knack of
scanning and rescanning a potentially huge number of include files, on the
main/gui thread?
No, different issue, this one is in the parse jobs.
I presume that the parser also scans through headerfiles so the underlying reason could well be the same.

The OP mentions slowness in auto-complete. For me that means the culprit almost has to be the unknown declaration fixer. I had the same issue, also in very simple Qt projects (on Mac it's even worse) and traced it to that part of the code. And it seems logical in hindsight: auto-completion is applied to incomplete patterns and 99% of the time those are unknown declarations. The OP has an SSD though, which should make a considerable difference unless he also has (part of) his headerfiles on a networked volume.
FWIW, the best way to measure times here with as little influence from the unknown declaration fixer is probably to select an erroneous variable or function name, wait until that fixer has done its thing, and then use paste (Ctrl-V) to correct the pattern.

I also notice that the OP uses clang-4.0 . I wasn't even aware it was finally released, it wouldn't be a pre-release version built with assertions, would it? In my experience with earlier versions there is a very significant performance penalty to that. And maybe that version has continued the trend of being slower than its predecessor (or uses more memory when used the way KDevelop does)?

R.
René J. V. Bertin
2017-05-03 08:35:13 UTC
Permalink
On a related note: is there a (KDE) application for Linux that allows to see
live per-thread CPU usage?

R.
Volker Wysk
2017-05-03 08:47:48 UTC
Permalink
Post by René J. V. Bertin
On a related note: is there a (KDE) application for Linux that allows to see
live per-thread CPU usage?
KSysGuard, also called (in German) "Systemmonitor" ("system monitor") on my
system.

Bye
René J. V. Bertin
2017-05-03 09:09:33 UTC
Permalink
Post by Volker Wysk
KSysGuard, also called (in German) "Systemmonitor" ("system monitor") on my
system.
I tried that one of course. It only shows the number of threads for me, in a
tooltip. I'm still at v5.9.3 ; maybe that's why?

R.
Volker Wysk
2017-05-03 09:17:48 UTC
Permalink
Post by René J. V. Bertin
Post by Volker Wysk
KSysGuard, also called (in German) "Systemmonitor" ("system monitor") on my
system.
I tried that one of course. It only shows the number of threads for me, in
a tooltip. I'm still at v5.9.3 ; maybe that's why?
Seems like I mistook your question.

Volker
René J. V. Bertin
2017-05-03 11:33:09 UTC
Permalink
Post by Volker Wysk
Seems like I mistook your question.
Seems there are some options:
http://ask.xmodulo.com/view-threads-process-linux.html
René J. V. Bertin
2017-05-03 09:05:08 UTC
Permalink
Post by René J.V. Bertin
I also notice that the OP uses clang-4.0 . I wasn't even aware it was finally
released, it wouldn't be a pre-release version built with assertions, would
FWIW, I just installed the official llvm.org 4.0 packages for Ubuntu 14.04
(4.0~svn297204-1~exp1). Judging by the sheer install size they could well be
built with assertions.

I rebuilt the KDevelop clang plugin against that version and I did a few quick
comparisons running (hopefully) just the parser via the problem toolview's
refresh button.

On a 1.6Gh Intel N3150 cpu off a ZFS pool running from a Seagate SSHD and using
2 CPU threads for the parser (with the default 500ms delay), I observe

- in both cases it takes roughly 2.5s for the progress bar to appear after I
click the refresh button
- the progress bar jumps from 0 to 100% after roughly 12s with clang 3.9 and 15s
with clang 4.0
- in both cases the progress bar disappears after roughly 17.5s, possibly
slightly later with clang 4.0

This is parsing the kclock.cpp file from github.com/RJVB/kclock.k5s .
KDevelop4 doesn't seem to like this project; it fails complains about
KClockWidget class members; possibly as a result it takes a bit over 30s to
(re)parse the entire file.

Neither are exactly fast of course, even for a Celeron-class CPU.

Would libclang allow parsing only a part of the active document, say the
function in which an edit was made?

R.
Sven Brauch
2017-05-03 20:28:41 UTC
Permalink
Post by René J.V. Bertin
Post by Sven Brauch
Post by René J. V. Bertin
Isn't that because of the unknown declaration fixer thingy which
has a knack of scanning and rescanning a potentially huge number
of include files, on the main/gui thread?
No, different issue, this one is in the parse jobs.
I presume that the parser also scans through headerfiles so the
underlying reason could well be the same.
I don't think so. Kevin and me investigated this problem a bit at the
last sprint. The general issue is roughly, while the parser is running,
no new completion items can be generated; this is why we introduced the
long delay when typing on a single line intentionally (it shouldn't
start running the background parser while you're still typing in a
single line and need completion).

My memory tells me that we also timed what takes so long, and it's
clang generating the completion items, i.e. not in our code. It's just
sometimes very slow and nobody could figure out why yet. I feel like
some code cache (precompiled header cache?) is dropped under some
circumstances which happen on some projects erraneously, but not on
others. Maybe strange paths (non-normalized, ...) have something to do
with it, but that's just guesswork now.

Further investigations are very welcome, but I think one needs to sit
down with a debugger/profiler and a build of llvm with symbols and
figure out what happens. Guessing around on the mailing list won't find
it IMO.

Greetings,
Sven
René J.V. Bertin
2017-05-04 16:26:46 UTC
Permalink
Post by Sven Brauch
My memory tells me that we also timed what takes so long, and it's
clang generating the completion items, i.e. not in our code. It's just
Looks like it when I attach the "Time Profiler" instrument on OS X and hit "reload all" in KDevelop to force it to reparse a few source files. Most of the 25sec or so is spent in ClangHelpers::buildDUChain() which itself
spends over 93% of its time in Builder::visit() which in turn spends about 85% of its time in clang_visitChildren().

I'm not very familiar reading this kind of call graph, but if you follow the graph down clang_getFileLocation shows up which accounts for almost 24% of the processing time on my system. That could correspond to disk access, no?

Working back up I note that clang::cxcursor::CursorVisitor::visit() apparently calls an (anonymous)::visitCursor() method in libKDevClangPrivate, and this call accounts for over 97% of the processing time.

The nice thing with Instruments is that you mostly don't have to do special profiling builds and can attach to a running application to sample just a specific operation. I didn't even build llvm/clang (4.0) with debug info, which save.
The drawback is that you need a Mac ...

One thing I don't understand: I'm seeing 3 background threads that do the actual ClangDUChain work, while my session is configured to use only 2 . I guess it would help if the threads were given names...
Post by Sven Brauch
Further investigations are very welcome, but I think one needs to sit
down with a debugger/profiler and a build of llvm with symbols and
figure out what happens. Guessing around on the mailing list won't find
it IMO.
Hope this helps then, and apologies for attaching an image!

R
Milian Wolff
2017-05-29 08:52:00 UTC
Permalink
Post by René J.V. Bertin
Post by Sven Brauch
My memory tells me that we also timed what takes so long, and it's
clang generating the completion items, i.e. not in our code. It's just
Looks like it when I attach the "Time Profiler" instrument on OS X and hit
"reload all" in KDevelop to force it to reparse a few source files. Most of
the 25sec or so is spent in ClangHelpers::buildDUChain() which itself
spends over 93% of its time in Builder::visit() which in turn spends about
85% of its time in clang_visitChildren().
No wonder, this is a pseudo-recursive function (visiting the AST tree) and
does most of the work. Either look at a bottom-up profile, a caller/callee
aggregation, or at least use a flamegraph visualization.
Post by René J.V. Bertin
I'm not very familiar reading this kind of call graph, but if you follow the
graph down clang_getFileLocation shows up which accounts for almost 24% of
the processing time on my system. That could correspond to disk access, no?
No, CPU profilers do not account for disk access. I'm not familiar with
Instruments, but I'd be very surprised if they are different in that regard.
For I/O profiling, there is usually a separate profiler configuration you'll
have to use which traces context switches or syscalls.

I actually have seen this function pop up in KDevelop profile runs on Linux
myself. Maybe there's a way to speed up this check via some lookup table, or
by improving clang upstream.
Post by René J.V. Bertin
Working back up I note that clang::cxcursor::CursorVisitor::visit()
apparently calls an (anonymous)::visitCursor() method in
libKDevClangPrivate, and this call accounts for over 97% of the processing
time.
Yes, see above - it does the heavy work so this is to be expected.
Post by René J.V. Bertin
The nice thing with Instruments is that you mostly don't have to do special
profiling builds and can attach to a running application to sample just a
specific operation. I didn't even build llvm/clang (4.0) with debug info,
which save. The drawback is that you need a Mac ...
Perf or VTune are just as capable. Also, to make sure: Did you compile KDev*
and everything else you are profiling in RelWithDebInfo mode? If not, then
this profile output is completely useless.
Post by René J.V. Bertin
One thing I don't understand: I'm seeing 3 background threads that do the
actual ClangDUChain work, while my session is configured to use only 2 . I
guess it would help if the threads were given names...
There's N + 1. N for background parsing and one for parsing the active
document, if needed.
Post by René J.V. Bertin
Post by Sven Brauch
Further investigations are very welcome, but I think one needs to sit
down with a debugger/profiler and a build of llvm with symbols and
figure out what happens. Guessing around on the mailing list won't find
it IMO.
Hope this helps then, and apologies for attaching an image!
Better than nothing.
--
Milian Wolff
***@milianw.de
http://milianw.de
Milian Wolff
2017-05-29 08:58:47 UTC
Permalink
Post by Milian Wolff
Post by René J.V. Bertin
Post by Sven Brauch
My memory tells me that we also timed what takes so long, and it's
clang generating the completion items, i.e. not in our code. It's just
Looks like it when I attach the "Time Profiler" instrument on OS X and hit
"reload all" in KDevelop to force it to reparse a few source files. Most of
the 25sec or so is spent in ClangHelpers::buildDUChain() which itself
spends over 93% of its time in Builder::visit() which in turn spends about
85% of its time in clang_visitChildren().
No wonder, this is a pseudo-recursive function (visiting the AST tree) and
does most of the work. Either look at a bottom-up profile, a caller/callee
aggregation, or at least use a flamegraph visualization.
Post by René J.V. Bertin
I'm not very familiar reading this kind of call graph, but if you follow
the graph down clang_getFileLocation shows up which accounts for almost
24% of the processing time on my system. That could correspond to disk
access, no?
No, CPU profilers do not account for disk access. I'm not familiar with
Instruments, but I'd be very surprised if they are different in that regard.
For I/O profiling, there is usually a separate profiler configuration
you'll have to use which traces context switches or syscalls.
One more thing: Notice how low the "Running Time" actually is in your profile.
This may point to either I/O overhead or lock contention. Can you use a
different instruments profiler configuration to find wait time in the code?
That would allow us to get a more accurate image of what's going on. Also,
make sure to profile the same thing. I.e. I suggest you clear the duchain
cache before starting KDevelop with the profiler.
--
Milian Wolff
***@milianw.de
http://milianw.de
René J.V. Bertin
2017-05-29 09:34:54 UTC
Permalink
On Monday May 29 2017 10:52:00 Milian Wolff wrote:

Hi,
Post by Milian Wolff
Post by René J.V. Bertin
I'm not very familiar reading this kind of call graph, but if you follow the
graph down clang_getFileLocation shows up which accounts for almost 24% of
the processing time on my system. That could correspond to disk access, no?
No, CPU profilers do not account for disk access. I'm not familiar with
My bad, I didn't mean to claim that it does any analysis of disk access itself. As far as I understand this sort of thing that would only exclude the time the CPU spends waiting for I/O operations. Would a CPU profiler exclude the CPU cycles spent on I/O just because it's I/O?
IOW, clang_getFileLocation() is doing something that takes 24% of all spent CPU cycles, and given the name of the function that is related to I/O in some way.
Post by Milian Wolff
Perf or VTune are just as capable.
But most Linux systems don't come with the required kexts installed, do they?
Post by Milian Wolff
Also, to make sure: Did you compile KDev*
and everything else you are profiling in RelWithDebInfo mode? If not, then
this profile output is completely useless.
I build with "-O3 -g" if that's what you're asking, everything KF5 and Qt5 itself. Specifically, I use Debuntu's approach of building with a custom, non-predefined BUILD_TYPE and then set the desired compiler options in CFLAGS and CXXFLAGS.
Post by Milian Wolff
There's N + 1. N for background parsing and one for parsing the active
document, if needed.
So I ought to do this kind of profiling with only a single document open? That should actually be more representative of what happens when you make changes to the current document.

FWIW, it's still a bit confusing. The current document is also parsed in the background, so what the option in the settings controls is actually the number of threads to use for parsing "background documents"?
Post by Milian Wolff
Post by René J.V. Bertin
if KDevelop is slower running
from one than as/from a regular install the 1st explanation one thinks of
is "something related to the bundling".
Before doing such a claim, back it up with hard numbers. Profiling and
Eh? I didn't claim anything that needs backing up with hard numbers, unless you mean the number of people who actually decide to check if that 1st explanation is the right one? :)
Post by Milian Wolff
Can you use a
different instruments profiler configuration to find wait time in the code?
That would allow us to get a more accurate image of what's going on. Also,
make sure to profile the same thing.
There's a whole bunch of preconfigured "Instruments", I'll have to see if I can figure out which would be the appropriate one for wait time.
Post by Milian Wolff
I.e. I suggest you clear the duchain
cache before starting KDevelop with the profiler.
I can't clear the cache while KDevelop is running, can I? That means I'd be measuring KDevelop's start-up too meaning loads more data or less fine-grained sampling. I'm also not convinced that measuring the cost of rebuilding the entire cache is what we're interested in here, that shouldn't be what causes the reported slowness, right?
As to measuring the same thing: I make sure to force the parsing step 2 or 3 times before sampling the operation. I don't expect it makes a difference if I do that 2x or 3x but it'd probably be better indeed to standardise.

R.
Milian Wolff
2017-05-29 09:43:42 UTC
Permalink
Post by René J.V. Bertin
Hi,
Post by Milian Wolff
Post by René J.V. Bertin
I'm not very familiar reading this kind of call graph, but if you follow
the graph down clang_getFileLocation shows up which accounts for almost
24% of the processing time on my system. That could correspond to disk
access, no?>
No, CPU profilers do not account for disk access. I'm not familiar with
My bad, I didn't mean to claim that it does any analysis of disk access
itself. As far as I understand this sort of thing that would only exclude
the time the CPU spends waiting for I/O operations. Would a CPU profiler
exclude the CPU cycles spent on I/O just because it's I/O?
No, a kernel switches out a process that is waiting on something, i.e.
sleeping or doing I/O. And a switched-out process does not spent any CPU
cycles, these will be attributed to whatever other process gets switched in
(which may also be a pseudo-"idle"-process).
Post by René J.V. Bertin
IOW,
clang_getFileLocation() is doing something that takes 24% of all spent CPU
cycles, and given the name of the function that is related to I/O in some
way.
What, no?! This is a purely CPU-bound mapping function. Given a file offset,
it builds a line/column cursor.
Post by René J.V. Bertin
Post by Milian Wolff
Perf or VTune are just as capable.
But most Linux systems don't come with the required kexts installed, do they?
What are you talking about?!
Post by René J.V. Bertin
Post by Milian Wolff
Also, to make sure: Did you compile KDev*
and everything else you are profiling in RelWithDebInfo mode? If not, then
this profile output is completely useless.
I build with "-O3 -g" if that's what you're asking, everything KF5 and Qt5
itself. Specifically, I use Debuntu's approach of building with a custom,
non-predefined BUILD_TYPE and then set the desired compiler options in
CFLAGS and CXXFLAGS.
Sounds good.
Post by René J.V. Bertin
Post by Milian Wolff
There's N + 1. N for background parsing and one for parsing the active
document, if needed.
So I ought to do this kind of profiling with only a single document open?
That should actually be more representative of what happens when you make
changes to the current document.
I don't know what exactly you are trying to profile. If you want to figure out
why the main thread gets blocked every now and then, you'll need a profiler
that gives you per-thread granularity and then look at what the main thread is
doing. This means doing both on-CPU and off-CPU profiling. With VTune that is
trivial and should show you what's going on. I bet Instruments has similar
capabilities. Learn the tool and report back.
Post by René J.V. Bertin
FWIW, it's still a bit confusing. The current document is also parsed in the
background, so what the option in the settings controls is actually the
number of threads to use for parsing "background documents"?
The N in the N + 1 above.
Post by René J.V. Bertin
Post by Milian Wolff
Post by René J.V. Bertin
if KDevelop is slower running
from one than as/from a regular install the 1st explanation one thinks of
is "something related to the bundling".
Before doing such a claim, back it up with hard numbers. Profiling and
Eh? I didn't claim anything that needs backing up with hard numbers, unless
you mean the number of people who actually decide to check if that 1st
explanation is the right one? :)
I mean it's a waste of time to muse about what could possible happen. Measure
it and see what's happening instead.
Post by René J.V. Bertin
Post by Milian Wolff
Can you use a
different instruments profiler configuration to find wait time in the code?
That would allow us to get a more accurate image of what's going on. Also,
make sure to profile the same thing.
There's a whole bunch of preconfigured "Instruments", I'll have to see if I
can figure out which would be the appropriate one for wait time.
Post by Milian Wolff
I.e. I suggest you clear the duchain
cache before starting KDevelop with the profiler.
I can't clear the cache while KDevelop is running, can I? That means I'd be
measuring KDevelop's start-up too meaning loads more data or less
fine-grained sampling. I'm also not convinced that measuring the cost of
rebuilding the entire cache is what we're interested in here, that
shouldn't be what causes the reported slowness, right? As to measuring the
same thing: I make sure to force the parsing step 2 or 3 times before
sampling the operation. I don't expect it makes a difference if I do that
2x or 3x but it'd probably be better indeed to standardise.
Just make sure you are measuring something reliably and reproducibly. The
profile you shows simply tells us that most of the time is spent parsing the
file, which isn't suprising if you parse a file...

Bye
--
Milian Wolff
***@milianw.de
http://milianw.de
René J.V. Bertin
2017-05-29 11:05:21 UTC
Permalink
Post by Milian Wolff
Post by René J.V. Bertin
the time the CPU spends waiting for I/O operations. Would a CPU profiler
exclude the CPU cycles spent on I/O just because it's I/O?
No, a kernel switches out a process that is waiting on something, i.e.
Wording... I meant CPU cycles spent by the process. Even when pumping data from a file into a buffer the CPU must be spending *some* cycles in the process itself, no?
Post by Milian Wolff
cycles, these will be attributed to whatever other process gets switched in
(which may also be a pseudo-"idle"-process).
That shows up as "kernel_task" on Mac.
Post by Milian Wolff
Post by René J.V. Bertin
So I ought to do this kind of profiling with only a single document open?
That should actually be more representative of what happens when you make
changes to the current document.
I don't know what exactly you are trying to profile.
I was trying to provide data that would help understand the kind of issue the OP complains about. Now, if that can also be done on Linux with the exact builds that show the issue (= without running a special build) then I don't think my help is indispensable.
Post by Milian Wolff
Post by René J.V. Bertin
FWIW, it's still a bit confusing. The current document is also parsed in the
background, so what the option in the settings controls is actually the
number of threads to use for parsing "background documents"?
The N in the N + 1 above.
I guess that's what I mean: you think you're allocating N threads to something CPU intensive and you're actually allowing N+1 threads. The settings dialog in question could be reworded to reflect that, or else show N-1 in the spinner.
Post by Milian Wolff
I mean it's a waste of time to muse about what could possible happen. Measure
it and see what's happening instead.
Doing that without first establishing a working hypothesis is sunday afternoon science at best.

R.
Milian Wolff
2017-05-30 13:18:51 UTC
Permalink
Post by Sven Brauch
Post by René J.V. Bertin
Post by Sven Brauch
Post by René J. V. Bertin
Isn't that because of the unknown declaration fixer thingy which
has a knack of scanning and rescanning a potentially huge number
of include files, on the main/gui thread?
No, different issue, this one is in the parse jobs.
I presume that the parser also scans through headerfiles so the
underlying reason could well be the same.
I don't think so. Kevin and me investigated this problem a bit at the
last sprint. The general issue is roughly, while the parser is running,
no new completion items can be generated; this is why we introduced the
long delay when typing on a single line intentionally (it shouldn't
start running the background parser while you're still typing in a
single line and need completion).
My memory tells me that we also timed what takes so long, and it's
clang generating the completion items, i.e. not in our code. It's just
sometimes very slow and nobody could figure out why yet. I feel like
some code cache (precompiled header cache?) is dropped under some
circumstances which happen on some projects erraneously, but not on
others. Maybe strange paths (non-normalized, ...) have something to do
with it, but that's just guesswork now.
Further investigations are very welcome, but I think one needs to sit
down with a debugger/profiler and a build of llvm with symbols and
figure out what happens. Guessing around on the mailing list won't find
it IMO.
I just profiled a simple file in VTune and there the delay of 3s is really
killing the perceived performance. Simply returning DefaultDelay from
ClangSupport::suggestedReparseDelayForChange speeds things up dramatically.

I actually think that always waiting for 3s is too long. We need to find a
different behavior for this, which may need more changes elsewhere. Here are
some comments on how it should work from my POV:

- user is typing and the default delay (500ms) catches fast typing and only
once the user has finished typing will we wait 500ms until we kick of the
parse job

- user is typing, waits too long, and we kick of a parse job, the user
continues to type -> this case is, afaik, currently unhandled. we should abort
the parse job in such cases, if at all possible. But looking at clang's
Index.h, this doesn't seem to be possible, which is very unfortunate :( At
least not with the current API we are using. I wonder if we may need to use
clang_indexSourceFile instead...

- until the above is fixed, we'll have to reduce the 3s delay based on some
heuristics. i.e. we should probably return the default delay when

-- the file is $small (whatever that means)
-- the file is a .cpp file and we have parsed it before (reparsing is usually
quite fast in my experience)
-- the user entered a string after which he expects code completion to work,
like any of the following operators (and probably more): (,:<>.

Sven, Kevin - what did you use for benchmarking back then? I.e. is there a
reproducible test case that we can use to find a good value for the delay
here? 3s clearly isn't a good value.

Thanks
--
Milian Wolff
***@milianw.de
http://milianw.de
René J.V. Bertin
2017-05-30 14:23:55 UTC
Permalink
Post by Milian Wolff
- user is typing, waits too long, and we kick of a parse job, the user
continues to type -> this case is, afaik, currently unhandled. we should abort
the parse job in such cases, if at all possible. But looking at clang's
Yes. I've tried to implement the simplest possible approach to achieve this for unknown declaration problems (got downvoted because it uses a nested event loop but it does make things bearable for me).
Even if you can't abort the parse job (properly short of killing the thread?), can't you simply stop waiting for it to complete and ignore the results whenever they become available?
Post by Milian Wolff
- until the above is fixed, we'll have to reduce the 3s delay based on some
heuristics. i.e. we should probably return the default delay when
I have already found myself bumping those 500ms value significantly (to 2000). Maybe this is completely beside the point, but couldn't it be useful to make the timeout (the default 500ms value) adaptive (optionally or always) within a reasonable range of values? That is, increase the threshold if the timeout is triggered inappropriately too often, decrease again when waits become unnecessarily long. There are adaptive staircase methods (e.g. QUEST) from psycho-physics research that could probably be ... adapted to serve here if similar algorithms don't already exist in GUI context.
A similar thing might be possible with the 3s delay
Post by Milian Wolff
-- the file is $small (whatever that means)
-- the file is a .cpp file and we have parsed it before (reparsing is usually
quite fast in my experience)
Keep track of how long the last (full) parse took, per file, and use that information some way?
Post by Milian Wolff
-- the user entered a string after which he expects code completion to work,
like any of the following operators (and probably more): (,:<>.
Define a shortcut to trigger completion, some modifier combination with the Tab key by default?

I probably wouldn't mind an option anyway to make completion manual because too often I find that it finds and selects a (or another) completion just when I hit the enter key (or even just after). I never really had that with other IDEs or completion mechanisms.

R.
Milian Wolff
2017-05-30 14:45:55 UTC
Permalink
Post by René J.V. Bertin
Post by Milian Wolff
- user is typing, waits too long, and we kick of a parse job, the user
continues to type -> this case is, afaik, currently unhandled. we should
abort the parse job in such cases, if at all possible. But looking at
clang's
Yes. I've tried to implement the simplest possible approach to achieve this
for unknown declaration problems (got downvoted because it uses a nested
event loop but it does make things bearable for me).
Again, this has nothing to do with the unknown declaration problems. And your
patch was simply of unacceptable quality.
Post by René J.V. Bertin
Even if you can't
abort the parse job (properly short of killing the thread?), can't you
simply stop waiting for it to complete and ignore the results whenever they
become available?
No, because clang internally keeps track of the parse job, see Sven's mail on
that matter. We will probably need to extend the clang API to get a way to
abort a parse job properly.
Post by René J.V. Bertin
Post by Milian Wolff
- until the above is fixed, we'll have to reduce the 3s delay based on some
heuristics. i.e. we should probably return the default delay when
I have already found myself bumping those 500ms value significantly (to
2000). Maybe this is completely beside the point, but couldn't it be useful
to make the timeout (the default 500ms value) adaptive (optionally or
always) within a reasonable range of values? That is, increase the
threshold if the timeout is triggered inappropriately too often, decrease
again when waits become unnecessarily long. There are adaptive staircase
methods (e.g. QUEST) from psycho-physics research that could probably be
... adapted to serve here if similar algorithms don't already exist in GUI
context. A similar thing might be possible with the 3s delay
KISS say's no to this proposal.
Post by René J.V. Bertin
Post by Milian Wolff
-- the file is $small (whatever that means)
-- the file is a .cpp file and we have parsed it before (reparsing is
usually quite fast in my experience)
Keep track of how long the last (full) parse took, per file, and use that
information some way?
Not reliable, as reparsing is very different from parsing. And if any header
inbetween was updated, we need to update everything, which again slows down
things. So no.
Post by René J.V. Bertin
Post by Milian Wolff
-- the user entered a string after which he expects code completion to
work, like any of the following operators (and probably more): (,:<>.
Define a shortcut to trigger completion, some modifier combination with the
Tab key by default?
I probably wouldn't mind an option anyway to make completion manual because
too often I find that it finds and selects a (or another) completion just
when I hit the enter key (or even just after). I never really had that with
other IDEs or completion mechanisms.
Ctrl + Space is that shortcut. It's actually a good idea to force-update the
document in that situation.
--
Milian Wolff
***@milianw.de
http://milianw.de
René J.V. Bertin
2017-05-30 15:54:10 UTC
Permalink
Post by Milian Wolff
No, because clang internally keeps track of the parse job, see Sven's mail on
that matter.
In this thread? Seems surprising that you'd be obliged to wait for an asynchronous operation to finish once it's started. I realise that could mean you cannot launch a new parse job in the same thread, but chances are that job would be just as unwanted.
Post by Milian Wolff
KISS say's no to this proposal.
The minor South-African political party or the Korean girly pop trio? Or maybe https://en.wikipedia.org/wiki/Kisekae_Set_System (annoying humans in the loop, so yeah you must have meant that O:^) )
Post by Milian Wolff
Ctrl + Space is that shortcut. It's actually a good idea to force-update the
document in that situation.
What situation, when triggering a completion manually? Isn't that another highly appropriate moment to launch such an operation if it can take seconds to complete? (you'd at least need a visual feedback so the user knows to be patient and wait.)

R
Sven Brauch
2017-05-30 15:57:55 UTC
Permalink
Post by René J.V. Bertin
In this thread? Seems surprising that you'd be obliged to wait for an
asynchronous operation to finish once it's started. I realise that
could mean you cannot launch a new parse job in the same thread, but
chances are that job would be just as unwanted.
If you overwrite your cache you cannot just cancel the job and use the
half-overwritten cache ...
René J.V. Bertin
2017-05-30 16:30:38 UTC
Permalink
Post by Sven Brauch
If you overwrite your cache you cannot just cancel the job and use the
half-overwritten cache ...
Ah yes, indeed, that's the bit I missed. Still, is it not possible to let that operation terminate (and then reload the cache) without making the user wait for it, once you know you're not going to be using the information you were planning to wait for? Not a very elegant solution, but it could do as a stop-gap measure while something better is implemented upstream (and it'd work with older clang versions that don't get the patch).

I wonder how Xcode approaches this. It must also be based on clang's parser.

R.

Sven Brauch
2017-05-30 15:51:15 UTC
Permalink
Hi,
Post by Milian Wolff
- user is typing, waits too long, and we kick of a parse job, the user
continues to type -> this case is, afaik, currently unhandled. we should abort
the parse job in such cases, if at all possible.
Yes, this is exactly the reason why we introduced the delay back then. I
don't disagree with lowering the delay, we can just put 1.5s and see
what happens ... I don't see how you could build a reproducible test
case for this, it's purely related to how fast and in what pattern you
type on your keyboard.

Best,
Sven
René J.V. Bertin
2017-05-30 16:06:44 UTC
Permalink
This post might be inappropriate. Click to display it.
Alexander Shaduri
2017-05-03 18:08:50 UTC
Permalink
Hi,

I can reproduce at least some of the slowness using a template
(Qt/C++) project from KDevelop.

Simply opening main.cpp:

int main(int argc, char *argv[])
{
QApplication app(argc, argv);
testkdev5 w;
w.show();

return app.exec();
}

and changing "app" definition (like typing "2" after it so it becomes app2)
takes about 3-4 seconds to re-parse and re-highlight.

It all becomes a lot slower with 2 other projects I have (one quite small
and one large, with the parser sometimes "giving up?" - not highlighting
errors).

Should I file a bug report?

Thanks,
Alexander


On Tue, 2 May 2017 23:53:04 +0200
Post by Sven Brauch
Hi,
Post by Alexander Shaduri
Is this normal behavior, or is my build broken somehow?
I have seen this behaviour with some projects, but not with others
(orthogonal to size). I still don't know what causes it and it's very
annoying. :/
+1
Alexander Zhigalin
2017-05-05 08:40:17 UTC
Permalink
René J.V. Bertin
2017-05-05 09:21:27 UTC
Permalink
On Friday May 05 2017 10:40:17 Alexander Zhigalin wrote:

Hi,
First, try to use clang 3.8 instead of 4
Why that version in particular, is there anything special about it?

FWIW, I have the impression that memory management has improved in clang 4.

R.
Alexander Zhigalin
2017-05-05 09:26:12 UTC
Permalink
<div> </div><div> </div><div>05.05.2017, 11:21, "René J.V. Bertin" &lt;***@gmail.com&gt;:</div><blockquote type="cite"><p>On Friday May <span>05 2017 10</span>:40:17 Alexander Zhigalin wrote:<br /><br />Hi,<br /> </p><blockquote>First, try to use clang 3.8 instead of 4</blockquote><p><br />Why that version in particular, is there anything special about it?</p></blockquote><div><br />Because it works on my machine :)</div><div> </div><div> </div><blockquote type="cite"><p><br />FWIW, I have the impression that memory management has improved in clang 4.<br /><br />R.</p></blockquote><div> </div><div> </div><div>-- </div><div>Alexander Zhigalin - <span style="color:#0000ff;">DevOps</span></div><div> </div>
Alexander Shaduri
2017-05-05 10:43:57 UTC
Permalink
OK,

So I tried my distribution's (openSUSE Leap 42.2) supplied
clang 3.8 and also clang 3.8 from OBS devel:tools:compiler,
the example I posted is still problematic, and I get inability
to auto-complete in some situations in my projects.

There may be some speedup with auto-complete *when
it works*, but still, renaming "app" to "app2" takes about
4 seconds to catch up.

Note: My clang 4.0.0 was from OBS as well.

Thanks,
Alexander


On Fri, 05 May 2017 11:26:12 +0200
 
 
Hi,
 First, try to use clang 3.8 instead of 4
Why that version in particular, is there anything special about it?
Because it works on my machine :)
 
 
FWIW, I have the impression that memory management has improved in clang 4.
R.
 
 
-- 
Alexander Zhigalin - DevOps
 
Alexander Shaduri
2017-05-05 10:50:55 UTC
Permalink
Also, looking at the rpm spec file:
https://build.opensuse.org/package/view_file/devel:tools:compiler/llvm4/llvm4.spec?expand=1

It looks like the assertions are off.

Thanks,
Alexander


On Fri, 5 May 2017 14:43:57 +0400
Post by Alexander Shaduri
OK,
So I tried my distribution's (openSUSE Leap 42.2) supplied
clang 3.8 and also clang 3.8 from OBS devel:tools:compiler,
the example I posted is still problematic, and I get inability
to auto-complete in some situations in my projects.
There may be some speedup with auto-complete *when
it works*, but still, renaming "app" to "app2" takes about
4 seconds to catch up.
Note: My clang 4.0.0 was from OBS as well.
Thanks,
Alexander
On Fri, 05 May 2017 11:26:12 +0200
 
 
Hi,
 First, try to use clang 3.8 instead of 4
Why that version in particular, is there anything special about it?
Because it works on my machine :)
 
 
FWIW, I have the impression that memory management has improved in clang 4.
R.
 
 
-- 
Alexander Zhigalin - DevOps
 
Alexander Shaduri
2017-05-27 11:52:42 UTC
Permalink
Hi,

On Fri, 05 May 2017 10:40:17 +0200
Second, try to use our AppImage (maybe with a live cd).
If the problem persists, fill a bug report.
So I tried AppImage of 5.1.1 and the issue is still present there.

Is there some test I can profile with valgrind to detect the offending
function? The easiest test case for this is to create a Qt5 project
from template and rename "app" to "app2" in main.cpp. It takes ~3-4
seconds to refresh the squiggles on "app" usages.

Thanks,
Alexander
Sven Brauch
2017-05-27 11:54:47 UTC
Permalink
Hi,
Post by Alexander Shaduri
So I tried AppImage of 5.1.1 and the issue is still present there.
Is there some test I can profile with valgrind to detect the offending
function? The easiest test case for this is to create a Qt5 project
from template and rename "app" to "app2" in main.cpp. It takes ~3-4
seconds to refresh the squiggles on "app" usages.
Did you see my mail from May 3rd?

Greetings,
Sven
Alexander Shaduri
2017-05-27 12:28:09 UTC
Permalink
Hi Sven,

It seems I missed it somehow.
It's interesting to know whether the issues I'm facing are the same
as you describe, since to me, this problem makes it very difficult
to use KDevelop 5 (I use KDev 4 for the time being because of this).

Anyway, if there's anything I can do to help you find the exact
cause, I'm ready to do it.

Thanks,
Alexander


On Sat, 27 May 2017 13:54:47 +0200
Post by Sven Brauch
Hi,
Post by Alexander Shaduri
So I tried AppImage of 5.1.1 and the issue is still present there.
Is there some test I can profile with valgrind to detect the offending
function? The easiest test case for this is to create a Qt5 project
from template and rename "app" to "app2" in main.cpp. It takes ~3-4
seconds to refresh the squiggles on "app" usages.
Did you see my mail from May 3rd?
Greetings,
Sven
René J.V. Bertin
2017-05-27 13:21:56 UTC
Permalink
On Saturday May 27 2017 16:28:09 Alexander Shaduri wrote:

It seems it might be useful to build an AppImage for profiling purposes, if indeed this slowness issue is so much worse with that packaging principle than with other kinds of installs.

I take it that the clang parser and all its requirements are bundled inside the AppImage too? Could it be that it does a lot of loading and seeking from inside that image, and that this is what's slowing things down? I don't know how those images work exactly, but if KDevelop is slower running from one than as/from a regular install the 1st explanation one thinks of is "something related to the bundling".
Post by Alexander Shaduri
Anyway, if there's anything I can do to help you find the exact
cause, I'm ready to do it.
What you might also try doing is to install a rolling-release distribution that follows the latest KF5 versions closely in a VM, and see if you have the same slowness problems that way, on the same computer. You're not by any chance running from a disk that any new file you install ends up scattered all over the place (or on an SSD that's equally full and hasn't been TRIM'ed for a long time)?

R.
Milian Wolff
2017-05-29 08:56:10 UTC
Permalink
Post by René J.V. Bertin
It seems it might be useful to build an AppImage for profiling purposes, if
indeed this slowness issue is so much worse with that packaging principle
than with other kinds of installs.
I take it that the clang parser and all its requirements are bundled inside
the AppImage too? Could it be that it does a lot of loading and seeking
from inside that image, and that this is what's slowing things down?
No, AppImage is just a loop-mounted image which will get it's code paged in.
This does not explain such a slowness. Also, so far I haven't seen any
reproducible profile that clearly shows that the AppImage build is slower than
a distro build. But I also don't know how the AppImage code is compiled -
Sven, Kevin - are you using Release or RelWithDebInfo for KDevelop and all the
dependencies shippped in the AppImage?
Post by René J.V. Bertin
I
don't know how those images work exactly, but if KDevelop is slower running
from one than as/from a regular install the 1st explanation one thinks of
is "something related to the bundling".
Before doing such a claim, back it up with hard numbers. Profiling and
performance isn't guess work. It's a matter of measuring and reliably
attributing costs to specific code functions.
Post by René J.V. Bertin
Post by Alexander Shaduri
Anyway, if there's anything I can do to help you find the exact
cause, I'm ready to do it.
What you might also try doing is to install a rolling-release distribution
that follows the latest KF5 versions closely in a VM, and see if you have
the same slowness problems that way, on the same computer. You're not by
any chance running from a disk that any new file you install ends up
scattered all over the place (or on an SSD that's equally full and hasn't
been TRIM'ed for a long time)?
R.
--
Milian Wolff
***@milianw.de
http://milianw.de
Kevin Funk
2017-05-29 09:26:04 UTC
Permalink
Post by Milian Wolff
Post by René J.V. Bertin
It seems it might be useful to build an AppImage for profiling purposes, if
indeed this slowness issue is so much worse with that packaging principle
than with other kinds of installs.
I take it that the clang parser and all its requirements are bundled inside
the AppImage too? Could it be that it does a lot of loading and seeking
from inside that image, and that this is what's slowing things down?
No, AppImage is just a loop-mounted image which will get it's code paged in.
This does not explain such a slowness. Also, so far I haven't seen any
reproducible profile that clearly shows that the AppImage build is slower
than a distro build. But I also don't know how the AppImage code is
compiled - Sven, Kevin - are you using Release or RelWithDebInfo for
KDevelop and all the dependencies shippped in the AppImage?
We've used the "empty" CMake build type so far, which does *not* append `-O2 -
g -DNDEBUG`.

I've now fixed this to use the RelWithDebInfo build type:
https://commits.kde.org/kdevelop/b810eee06655822d194f0009740c7a280c0e168c

I'll start a new build, which you can try out afterwards. You can download the
new AppImage under the following URL, once it's done (takes approx. 2 hours):
https://www.kdevelop.org/download -- section "Nightly builds"

Regards,
Kevin
Post by Milian Wolff
Post by René J.V. Bertin
I
don't know how those images work exactly, but if KDevelop is slower running
from one than as/from a regular install the 1st explanation one thinks of
is "something related to the bundling".
Before doing such a claim, back it up with hard numbers. Profiling and
performance isn't guess work. It's a matter of measuring and reliably
attributing costs to specific code functions.
Post by René J.V. Bertin
Post by Alexander Shaduri
Anyway, if there's anything I can do to help you find the exact
cause, I'm ready to do it.
What you might also try doing is to install a rolling-release distribution
that follows the latest KF5 versions closely in a VM, and see if you have
the same slowness problems that way, on the same computer. You're not by
any chance running from a disk that any new file you install ends up
scattered all over the place (or on an SSD that's equally full and hasn't
been TRIM'ed for a long time)?
R.
--
Kevin Funk | ***@kde.org | http://kfunk.org
Alexander Zhigalin
2017-05-03 06:32:48 UTC
Permalink
Loading...