ANR机制概述

Post on Jan 01, 2023 by Wei Lin

概述

ANR机制

参考:[安卓稳定性之ANR]第一篇:安卓ANR问题综述——CSDN


在安卓系统中,当应用因为Java Exception与Native Exception发生异常时都会启动对应闪退机制让应用程序死亡。无论是通过DefaultUncatchedExceptionHandler处理的JE,或是通过信号杀死进程的NE,最终都会使应用程序结束,让用户有一个”更好的用户体验”。


但在某些场景,如果应用的某线程发生异常并仅在线程级别终止,在应用和系统看来可能还没有发生异常,但是有的功能却无法使用,这时候只能等用户发现之后自行杀死进程,这是极为不友好的。

与闪退相对应的,安卓中也有无响应机制。无响应与闪退不同,应用程序闪退时一定是发生了某种问题。但是无响应时,可能应用的整个进程没有发生任何问题,只是”慢了”。虽然应用可能没有问题发生,但是依然会弹出”应用程序无响应对话框”,安卓之所以要这样设计,是为了满足”与用户实时的交互”。


应用卡顿,在很多情况下是由于软件程序运行时效率慢了,不能与用户进行实时性的交互。严格一点的卡顿就是掉帧,若手机刷新率为60HZ,则每隔16.67ms绘制一帧。这个绘制在应用主线程执行,如果应用主线程由于某些消息执行耗时导致绘制被延迟,就发生了掉帧。如果应用主线程的某些消息执行了更长的时间(1s、2s或5s),则会导致用户肉眼可见的卡顿。如果消息执行的时间还能更久,或者有可能是永远卡在这里呢?那用户后续的点击事件在此消息执行完毕之前就再也不能被应用响应了。造成的现象就是,用户发现应用彻底卡死了,并无法用任何操作控制应用。


这时候,无响应机制的作用就体现出来了,当系统发现应用卡住了,则会弹出一个对话框,让用户可以选择关闭应用或者继续等待,有些时候等一会应用也会恢复正常,原因是应用主线程的消息没有被永久卡住。

“无响应机制”是安卓系统为应用程序的”慢”、”卡顿”所做的一个兜底策略,如果”卡顿”太久,就会触发”无响应”机制——前台应用弹出无响应对话框,后台应用由于用户无感知直接杀死。无响应机制就像是悬在应用开发者头上的一根利剑,督促着应用开发者优化应用加载、启动、运行时逻辑。常用来对抗”无响应机制”的方案就是异步线程,多进程启动加载等。

ANR原理

ANR(Application Not Responding)的监测原理本质上是消息机制,通过sendMessageAtTime()设定一个delay消息,超时未被移除则触发ANR。具体逻辑处理都在system_server端,包括发送超时消息,移除超时消息,处理超时消息以及ANR弹框展示等步骤。


对于APP而言,触发ANR的条件是主线程阻塞。以下四种场景会引起ANR:

(1) Service Timeout:前台20s,后台200s内未执行完成;

(2) BroadcastQueue Timeout:前台10s,后台60s;

(3) ContentProvider Timeout:10s;

(4) InputDispatching Timeout:输入事件分发超时5s。

ANR类型

关于ANR问题的分析

(1) KeyDispatchTimeout

也称为InputDispatchTimeout,主要是按键、触摸事件、input事件在5S内没有处理完成发生ANR。

发生KeyDispatchTimeout类型的ANR日志中出现的关键字:Input dispatching timed out xxxx


(2) ServiceTimeout

Service的create,start,unbind等在主线程处理耗时,前台Service在20s内,后台Service在200s内没有处理完成发生ANR。

发生ServiceTimeout类型的ANR日志中出现的关键字:Timeout executing service:/executing service XXX


(3) BroadcastTimeout

BroadcastReceiver onReceiver处理事务时前台广播在10S内,后台广播在60s内没有处理完成发生ANR。

发生BroadcastTimeout类型的ANR日志中出现的关键字:Timeout of broadcast XXX,Receiver during timeout:XXX,Broadcast of XXX。


(4) ProcessContentProviderPublishTimedOut

ContentProvider publish在10s内没有处理完成发生ANR。

发生ProcessContentProviderPublishTimedOut类型的ANR日志中出现的关键字:timeout publishing content providers。

appNotResponding()

调用栈

发生ANR后,除了ContentProvider Timeout,其它类型的ANR都会调用到AMS.mAnrHelper.appNotResponding(),之后的调用流程如下:

AnrHelper.appNotResponding() ->
AnrHelper.startAnrConsumerIfNeeded(){
    new AnrConsumerThread().start()
} ->
AnrHelper.AnrConsumerThread.run() ->
AnrRecord.appNotResponding() ->
ProcessRecord.appNotResponding()

ProcessRecord.appNotResponding()

最终调用ProcessRecord.appNotResponding()执行后续操作,在ProcessRecord.appNotResponding()中,依次进行了:

  • 输出CPU使用信息:updateCpuStatsNow();
  • 输出event log:am_anr;
  • 输出堆栈信息:输出到trace文件;
  • 输出main log:ANR in;
  • 再次输出CPU使用信息:updateCpuStatsNow();
  • 弹出ANR对话框:结束或等待ANR。


注: 因为anr发生后dump信息多,处理耗时,导致dump trace的时候耗时消息的现场已经丢失了,经常抓取到的trace是恢复后的正常堆栈,并非Block信息的堆栈。

全部代码

ProcessRecord.java中appNotResponding()代码如下:

void appNotResponding(String activityShortComponentName, ApplicationInfo aInfo,
        String parentShortComponentName, WindowProcessController parentProcess,
        boolean aboveSystem, String annotation, boolean onlyDumpSelf) {
    ArrayList<Integer> firstPids = new ArrayList<>(5);
    SparseArray<Boolean> lastPids = new SparseArray<>(20);

    mWindowProcessController.appEarlyNotResponding(annotation, () -> kill("anr",
              ApplicationExitInfo.REASON_ANR, true));

    long anrTime = SystemClock.uptimeMillis();
    if (isMonitorCpuUsage()) {
        mService.updateCpuStatsNow();
    }

    final boolean isSilentAnr;
    synchronized (mService) {
        // PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down.
        if (mService.mAtmInternal.isShuttingDown()) {
            Slog.i(TAG, "During shutdown skipping ANR: " + this + " " + annotation);
            return;
        } else if (isNotResponding()) {
            Slog.i(TAG, "Skipping duplicate ANR: " + this + " " + annotation);
            return;
        } else if (isCrashing()) {
            Slog.i(TAG, "Crashing app skipping ANR: " + this + " " + annotation);
            return;
        } else if (killedByAm) {
            Slog.i(TAG, "App already killed by AM skipping ANR: " + this + " " + annotation);
            return;
        } else if (killed) {
            Slog.i(TAG, "Skipping died app ANR: " + this + " " + annotation);
            return;
        }

        // In case we come through here for the same app before completing
        // this one, mark as anring now so we will bail out.
        setNotResponding(true);

        // Log the ANR to the event log.
        EventLog.writeEvent(EventLogTags.AM_ANR, userId, pid, processName, info.flags,
                annotation);

        // Dump thread traces as quickly as we can, starting with "interesting" processes.
        firstPids.add(pid);

        // Don't dump other PIDs if it's a background ANR or is requested to only dump self.
        isSilentAnr = isSilentAnr();
        if (!isSilentAnr && !onlyDumpSelf) {
            int parentPid = pid;
            if (parentProcess != null && parentProcess.getPid() > 0) {
                parentPid = parentProcess.getPid();
            }
            if (parentPid != pid) firstPids.add(parentPid);

            if (MY_PID != pid && MY_PID != parentPid) firstPids.add(MY_PID);

            for (int i = getLruProcessList().size() - 1; i >= 0; i--) {
                ProcessRecord r = getLruProcessList().get(i);
                if (r != null && r.thread != null) {
                    int myPid = r.pid;
                    if (myPid > 0 && myPid != pid && myPid != parentPid && myPid != MY_PID) {
                        if (r.isPersistent()) {
                            firstPids.add(myPid);
                            if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
                        } else if (r.treatLikeActivity) {
                            firstPids.add(myPid);
                            if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
                        } else {
                            lastPids.put(myPid, Boolean.TRUE);
                            if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
                        }
                    }
                }
            }
        }
    }

    // Log the ANR to the main log.
    StringBuilder info = new StringBuilder();
    info.setLength(0);
    info.append("ANR in ").append(processName);
    if (activityShortComponentName != null) {
        info.append(" (").append(activityShortComponentName).append(")");
    }
    info.append("\n");
    info.append("PID: ").append(pid).append("\n");
    if (annotation != null) {
        info.append("Reason: ").append(annotation).append("\n");
    }
    if (parentShortComponentName != null
            && parentShortComponentName.equals(activityShortComponentName)) {
        info.append("Parent: ").append(parentShortComponentName).append("\n");
    }

    StringBuilder report = new StringBuilder();
    report.append(MemoryPressureUtil.currentPsiState());
    ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);

    // don't dump native PIDs for background ANRs unless it is the process of interest
    String[] nativeProcs = null;
    if (isSilentAnr || onlyDumpSelf) {
        for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
            if (NATIVE_STACKS_OF_INTEREST[i].equals(processName)) {
                nativeProcs = new String[] { processName };
                break;
            }
        }
    } else {
        nativeProcs = NATIVE_STACKS_OF_INTEREST;
    }

    int[] pids = nativeProcs == null ? null : Process.getPidsForCommands(nativeProcs);
    ArrayList<Integer> nativePids = null;

    if (pids != null) {
        nativePids = new ArrayList<>(pids.length);
        for (int i : pids) {
            nativePids.add(i);
        }
    }

    // For background ANRs, don't pass the ProcessCpuTracker to
    // avoid spending 1/2 second collecting stats to rank lastPids.
    StringWriter tracesFileException = new StringWriter();
    // To hold the start and end offset to the ANR trace file respectively.
    final long[] offsets = new long[2];
    File tracesFile = ActivityManagerService.dumpStackTraces(firstPids,
            isSilentAnr ? null : processCpuTracker, isSilentAnr ? null : lastPids,
            nativePids, tracesFileException, offsets);

    if (isMonitorCpuUsage()) {
        mService.updateCpuStatsNow();
        synchronized (mService.mProcessCpuTracker) {
            report.append(mService.mProcessCpuTracker.printCurrentState(anrTime));
        }
        info.append(processCpuTracker.printCurrentLoad());
        info.append(report);
    }
    report.append(tracesFileException.getBuffer());

    info.append(processCpuTracker.printCurrentState(anrTime));

    Slog.e(TAG, info.toString());
    if (tracesFile == null) {
        // There is no trace file, so dump (only) the alleged culprit's threads to the log
        Process.sendSignal(pid, Process.SIGNAL_QUIT);
    } else if (offsets[1] > 0) {
        // We've dumped into the trace file successfully
        mService.mProcessList.mAppExitInfoTracker.scheduleLogAnrTrace(
                pid, uid, getPackageList(), tracesFile, offsets[0], offsets[1]);
    }

    FrameworkStatsLog.write(FrameworkStatsLog.ANR_OCCURRED, uid, processName,
            activityShortComponentName == null ? "unknown": activityShortComponentName,
            annotation,
            (this.info != null) ? (this.info.isInstantApp()
                    ? FrameworkStatsLog.ANROCCURRED__IS_INSTANT_APP__TRUE
                    : FrameworkStatsLog.ANROCCURRED__IS_INSTANT_APP__FALSE)
                    : FrameworkStatsLog.ANROCCURRED__IS_INSTANT_APP__UNAVAILABLE,
            isInterestingToUserLocked()
                    ? FrameworkStatsLog.ANROCCURRED__FOREGROUND_STATE__FOREGROUND
                    : FrameworkStatsLog.ANROCCURRED__FOREGROUND_STATE__BACKGROUND,
            getProcessClassEnum(),
            (this.info != null) ? this.info.packageName : "");
    final ProcessRecord parentPr = parentProcess != null
            ? (ProcessRecord) parentProcess.mOwner : null;
    mService.addErrorToDropBox("anr", this, processName, activityShortComponentName,
            parentShortComponentName, parentPr, annotation, report.toString(), tracesFile,
            null);

    if (mWindowProcessController.appNotResponding(info.toString(), () -> kill("anr",
            ApplicationExitInfo.REASON_ANR, true),
            () -> {
                synchronized (mService) {
                    mService.mServices.scheduleServiceTimeoutLocked(this);
                }
            })) {
        return;
    }

    synchronized (mService) {
        // mBatteryStatsService can be null if the AMS is constructed with injector only. This
        // will only happen in tests.
        if (mService.mBatteryStatsService != null) {
            mService.mBatteryStatsService.noteProcessAnr(processName, uid);
        }

        if (isSilentAnr() && !isDebugging()) {
            kill("bg anr", ApplicationExitInfo.REASON_ANR, true);
            return;
        }

        // Set the app's notResponding state, and look up the errorReportReceiver
        makeAppNotRespondingLocked(activityShortComponentName,
                annotation != null ? "ANR " + annotation : "ANR", info.toString());

        // mUiHandler can be null if the AMS is constructed with injector only. This will only
        // happen in tests.
        if (mService.mUiHandler != null) {
            // Bring up the infamous App Not Responding dialog
            Message msg = Message.obtain();
            msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
            msg.obj = new AppNotRespondingDialog.Data(this, aInfo, aboveSystem);

            mService.mUiHandler.sendMessage(msg);
        }
    }
}