Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVF seems to be incomplete when dealing with virtual calls. #1301

Open
Lqs66 opened this issue Dec 25, 2023 · 23 comments
Open

SVF seems to be incomplete when dealing with virtual calls. #1301

Lqs66 opened this issue Dec 25, 2023 · 23 comments

Comments

@Lqs66
Copy link

Lqs66 commented Dec 25, 2023

I'm using SVF version 2.1.

The omission occurs when I write SVF as a lib to get the callsite of all the virtual functions and their targets. As far as I know this should be complete.

My core code is as follows:

//
// Created by lqs66 on 23-12-24.
//

/**
 *   A driver program of SVF including usages of SVF APIs.
 *   vcall used to get all virtual callsites and targets
*/

#include "SVF-FE/LLVMUtil.h"
#include "Graphs/SVFG.h"
#include "WPA/Andersen.h"
#include "SVF-FE/PAGBuilder.h"
#include "MemoryModel/PointerAnalysis.h"
#include <chrono>

using namespace llvm;
using namespace std;
using namespace SVF;

static llvm::cl::opt<std::string> InputFilename(cl::Positional,
                                                llvm::cl::desc("<input bitcode>"), llvm::cl::init("-"));

int main(int argc, char ** argv)
{
    auto start_time = std::chrono::high_resolution_clock::now();
    int arg_num = 0;
    char **arg_value = new char*[argc];
    std::vector<std::string> moduleNameVec;
    SVFUtil::processArguments(argc, argv, arg_num, arg_value, moduleNameVec);
    cl::ParseCommandLineOptions(arg_num, arg_value,
                                "virtual callsites Analysis\n");

    //parse ir file
    SVFModule* svfModule = LLVMModuleSet::getLLVMModuleSet()->buildSVFModule(moduleNameVec);

    /// Build Program Assignment Graph (PAG)
    PAGBuilder builder;
    PAG* pag = builder.build(svfModule);

    /// Create Andersen's pointer analysis. The analyze() function has been called at creation time. 
    Andersen* ander = AndersenWaveDiff::createAndersenWaveDiff(pag);

    //get pta call graph
    PTACallGraph* pta = ander->getPTACallGraph();

    //get all virtual callsites and targets
    PointerAnalysis::CallEdgeMap temp = pta->getIndCallMap();
    for(auto it = temp.begin(); it != temp.end(); it++)
    {
        const CallBlockNode* vcall_node = it->first;
        const Instruction *vcall_site = vcall_node->getCallSite();
        PointerAnalysis::FunctionSet& targets = it->second;

        if (const CallBase *callInst = dyn_cast<CallBase>(vcall_site)) {
            string valueStr;
            raw_string_ostream valueStream(valueStr);
            callInst->getCalledValue()->print(valueStream);
            std::string resultStr = valueStream.str();
            

            if(resultStr.find("=") == string::npos)
                continue;

            resultStr = resultStr.substr(0, resultStr.find("="));
            //errs() << resultStr << "\n";

            if(resultStr.find("%") != string::npos){
                outs() << "callsite: " << *vcall_site << "\n";
                for(auto it2 = targets.begin(); it2 != targets.end(); it2++)
                {
                    const SVFFunction *target = *it2;
                    outs() << "-------target: " << target->getLLVMFun()->getName() << "\n";
                }
            }
        }   
    }

    // clean up memory
    AndersenWaveDiff::releaseAndersenWaveDiff();
    PAG::releasePAG();
    auto end_time = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::seconds>(end_time - start_time);

    outs() << "Time taken by function: " << duration.count() << " seconds\n";
    return 0;
}

The output is shown in the following file: outs_2.txt

An example of what was missed:
its IR: copter_lib.zip
Its omitted Virtual function callsite:

; Function Attrs: nounwind uwtable
define dso_local void @_ZN6Copter18update_flight_modeEv(%class.Copter* nocapture %this) local_unnamed_addr #0 align 2 {
entry:
  %valid_for_logging.i = getelementptr inbounds %class.Copter, %class.Copter* %this, i64 0, i32 15, i32 3
  store i8 0, i8* %valid_for_logging.i, align 4, !tbaa !2230
  %flightmode = getelementptr inbounds %class.Copter, %class.Copter* %this, i64 0, i32 29
  %0 = load %class.Mode*, %class.Mode** %flightmode, align 8, !tbaa !1491
  %1 = bitcast %class.Mode* %0 to void (%class.Mode*)***
  %vtable = load void (%class.Mode*)**, void (%class.Mode*)*** %1, align 8, !tbaa !1489
  %vfn = getelementptr inbounds void (%class.Mode*)*, void (%class.Mode*)** %vtable, i64 3
  %2 = load void (%class.Mode*)*, void (%class.Mode*)** %vfn, align 8
  **tail call void %2(%class.Mode* %0) #22**
  ret void
}
@yuleisui
Copy link
Collaborator

Would you be able to identify the problem in CHG or virtual call solving?

@Lqs66
Copy link
Author

Lqs66 commented Dec 25, 2023

Would you be able to identify the problem in CHG or virtual call solving?

I will try my best, but I don't know the andersen algorithm very well and may not be able to help.
I can test the andersen analysis on the latest version of SVF, if it has been changed.

@Lqs66
Copy link
Author

Lqs66 commented Dec 26, 2023

Would you be able to identify the problem in CHG or virtual call solving?

I tried SVF-2.7 again, with the same results as SVF-2.1, and the indirect call to targets is still incomplete.
The results are shown in the IndEdgeSolved field below, which is 0.
Screenshot from 2023-12-26 19-50-06

@Lqs66
Copy link
Author

Lqs66 commented Dec 27, 2023

Would you be able to identify the problem in CHG or virtual call solving?

I also tested SVF-2.7's client SUPA and there are still misses, is it possible that this has something to do with the fact that my .bc is incomplete (not the .bc of the whole program)?

@yuleisui
Copy link
Collaborator

Thanks for the update. Please confirm if this is the reason.

@Lqs66
Copy link
Author

Lqs66 commented Dec 27, 2023

Thanks for the update. Please confirm if this is the reason.

Does andersen analysis analyze virtual function calls related to phi instructions?
I'm guessing this is the reason, in my case, the following example of a virtual function call situation was missed.
I look forward to your reply and if this is not the problem, I will continue to look for the cause.

e.g.

  %8 = phi void (%class.Copter*)* [ %memptr.virtualfn, %memptr.virtual ], [ @_ZN6Copter18update_flight_modeEv, %memptr.nonvirtual ], !dbg !7379
  call void %8(%class.Copter* noundef nonnull align 8 dereferenceable(47040) %this.adjusted), !dbg !7379

The complete function is below:

; Function Attrs: mustprogress noinline nounwind optnone uwtable

define linkonce_odr dso_local void @_ZN7FunctorIvJEE14method_wrapperI6CopterXadL_ZNS2_18update_flight_modeEvEEEEvPv(i8* noundef %obj) #1 comdat align 2 !dbg !7368 {

entry:

  %obj.addr = alloca i8*, align 8

  %t = alloca %class.Copter*, align 8

  store i8* %obj, i8** %obj.addr, align 8

  call void @llvm.dbg.declare(metadata i8** %obj.addr, metadata !7372, metadata !DIExpression()), !dbg !7373

  call void @llvm.dbg.declare(metadata %class.Copter** %t, metadata !7374, metadata !DIExpression()), !dbg !7375

  %0 = load i8*, i8** %obj.addr, align 8, !dbg !7376

  %1 = bitcast i8* %0 to %class.Copter*, !dbg !7377

  store %class.Copter* %1, %class.Copter** %t, align 8, !dbg !7375

  %2 = load %class.Copter*, %class.Copter** %t, align 8, !dbg !7378

  %3 = bitcast %class.Copter* %2 to i8*, !dbg !7379

  %4 = getelementptr inbounds i8, i8* %3, i64 0, !dbg !7379

  %this.adjusted = bitcast i8* %4 to %class.Copter*, !dbg !7379

  br i1 false, label %memptr.virtual, label %memptr.nonvirtual, !dbg !7379



memptr.virtual:                                   ; preds = %entry

  %5 = bitcast %class.Copter* %this.adjusted to i8**, !dbg !7379

  %vtable = load i8*, i8** %5, align 8, !dbg !7379

  %6 = getelementptr i8, i8* %vtable, i64 sub (i64 ptrtoint (void (%class.Copter*)* @_ZN6Copter18update_flight_modeEv to i64), i64 1), !dbg !7379, !nosanitize !793

  %7 = bitcast i8* %6 to void (%class.Copter*)**, !dbg !7379, !nosanitize !793

  %memptr.virtualfn = load void (%class.Copter*)*, void (%class.Copter*)** %7, align 8, !dbg !7379, !nosanitize !793

  br label %memptr.end, !dbg !7379



memptr.nonvirtual:                                ; preds = %entry

  br label %memptr.end, !dbg !7379



memptr.end:                                       ; preds = %memptr.nonvirtual, %memptr.virtual

  %8 = phi void (%class.Copter*)* [ %memptr.virtualfn, %memptr.virtual ], [ @_ZN6Copter18update_flight_modeEv, %memptr.nonvirtual ], !dbg !7379

  call void %8(%class.Copter* noundef nonnull align 8 dereferenceable(47040) %this.adjusted), !dbg !7379

  ret void, !dbg !7380

}

@Lqs66
Copy link
Author

Lqs66 commented Dec 28, 2023

Thanks for the update. Please confirm if this is the reason.

I'm sorry to bother you again.
I'm reproducing the problem on a small example.

In my example, the update_mode() function calls the virtual function run() of the base class, and since there is no explicit indication of the subclass object that the base class pointer actually points to, SVF won't handle such an indirect call to the edge. However, I think SVF should add the run() function in a subclass of the base class to the set of edges based on the CHA method.

#include <stdio.h>

namespace test{

    class base{
    public:
        base(void);
        virtual void foo() = 0;
        virtual void run() = 0;

        int a;
    } ;
    class A : public base{
    public:
        A(){}
        void foo() override {
            printf("this is a!\n");
        }
        void run() override {

        }
    };

    class B : public base{
    public:
        B(){}
        void foo() override {
            printf("this is b!\n");
        }
        void run() override {
            
        }
    };


    class C {
    public:
        C(){}
        void update_mode();
        test::base *mode;
        test::A modeA;
        test::B modeB;
    };

}
#include "virtualFunc.h"

using namespace test;

base::base(void) : 
	a(0)
{}


void C::update_mode(){
    mode->run();
}


int main(){
    base *a = new A();
    a->foo();
    base *b = new B();
    b->foo();

    return 0;
}



@yuleisui
Copy link
Collaborator

Thanks for reporting. This looks to be a good test case. Could you upload (1) a minimum c program (try to cut it down a bit), (2) its bc file, and (3) CHG (the incomplete one as you pointed out)

@yuleisui
Copy link
Collaborator

Make the test case as small as possible and would be good to have a single cpp file.

@Lqs66
Copy link
Author

Lqs66 commented Dec 28, 2023

Make the test case as small as possible and would be good to have a single cpp file.

Hi, I have uploaded (1) and (2) as you suggested, but for (3) I don't quite understand what it means.

virtualFuncTest-Suite.zip

tips: The files I uploaded is compiled based on LLVM 10.0.1.

@yuleisui
Copy link
Collaborator

it is the class hierarchy graph (CHG) using the option -dump-chg

@yuleisui
Copy link
Collaborator

@JasonZhongZexin @xudon9 could you take a look at this C++ test case?

@yuleisui
Copy link
Collaborator

Make the test case as small as possible and would be good to have a single cpp file.

Hi, I have uploaded (1) and (2) as you suggested, but for (3) I don't quite understand what it means.

virtualFuncTest-Suite.zip

tips: The files I uploaded is compiled based on LLVM 10.0.1.

better to use llvm-14.0 though it might not make a difference, worth having a try.

@Lqs66
Copy link
Author

Lqs66 commented Dec 28, 2023

it is the class hierarchy graph (CHG) using the option -dump-chg

This is the CHG.

vcall-cha.zip

@Lqs66
Copy link
Author

Lqs66 commented Dec 28, 2023

Make the test case as small as possible and would be good to have a single cpp file.

Hi, I have uploaded (1) and (2) as you suggested, but for (3) I don't quite understand what it means.
virtualFuncTest-Suite.zip
tips: The files I uploaded is compiled based on LLVM 10.0.1.

better to use llvm-14.0 though it might not make a difference, worth having a try.

I've tried this on SVF 2.7 (which relies on LLVM 14) and do have the same problem.

I will upload the LLVM 14 version of bc as soon as I can if I need to provide it.

@yuleisui
Copy link
Collaborator

Make the test case as small as possible and would be good to have a single cpp file.

Hi, I have uploaded (1) and (2) as you suggested, but for (3) I don't quite understand what it means.
virtualFuncTest-Suite.zip
tips: The files I uploaded is compiled based on LLVM 10.0.1.

better to use llvm-14.0 though it might not make a difference, worth having a try.

I've tried this on SVF 2.7 (which relies on LLVM 14) and do have the same problem.

I will upload the LLVM 14 version of bc as soon as I can if I need to provide it.

would be good to have the llvm-14 bc.

@Lqs66
Copy link
Author

Lqs66 commented Dec 28, 2023

Make the test case as small as possible and would be good to have a single cpp file.

Hi, I have uploaded (1) and (2) as you suggested, but for (3) I don't quite understand what it means.
virtualFuncTest-Suite.zip
tips: The files I uploaded is compiled based on LLVM 10.0.1.

better to use llvm-14.0 though it might not make a difference, worth having a try.

I've tried this on SVF 2.7 (which relies on LLVM 14) and do have the same problem.
I will upload the LLVM 14 version of bc as soon as I can if I need to provide it.

would be good to have the llvm-14 bc.

Sorry for the late reply, but here is the compiled bc file for llvm 14.
vcall-test-suite-llvm14.zip

@JasonZhongZexin
Copy link
Contributor

@JasonZhongZexin @xudon9 could you take a look at this C++ test case?

Okay, I will take a look at these test cases and follow up on this issue.

@Lqs66
Copy link
Author

Lqs66 commented Jan 5, 2024

Dear Developers.
I think I know why CHA is not working.
The code below

const Option<bool> Options::ConnectVCallOnCHA(
"v-call-cha",
"connect virtual calls using cha",
false
);
indicates the options supported by SVF, analyzing c++ indirect calls using cha is disabled by default and requires the user to add -v-call-cha to enable cha functionality.
But turning this on affects the results of the andersen analysis (it will add more edges on top of the andersen analysis).
image

This may require adding a handling logic to enable (degenerate to) the CHA method only for those callsites where andersen fails.

The following file contains two call graphs, one generated with the -v-call-cha option added, and one generated without the -v-call-cha option.

two_callgraph.zip

@yuleisui
Copy link
Collaborator

yuleisui commented Jan 5, 2024

Once -v-call-cha is turned on, SVF will use CHA result to build call graph and solve indirect/virtuall calls, rather than using points-to analysis.

Can you check whether the call graph is correct if -v-call-cha is turned on?

@Lqs66
Copy link
Author

Lqs66 commented Jan 6, 2024

-v-call-cha

I have checked the call graph and when both -v-call-cha and -ander options are added, svf adds two edges to the same candidate for the virtual call, one from andersen analysis and one from cha.
The generated call graph I have given, you can have a look at it.

@yuleisui
Copy link
Collaborator

yuleisui commented Jan 6, 2024

Does it mean both call graphs are correct?

@Lqs66
Copy link
Author

Lqs66 commented Jan 6, 2024

Does it mean both call graphs are correct?

Both call graphs are incorrect (one has duplicate edges and one is missing edges).

For call graphs generated with both -ander and -v-call-cha turned on. This simply retains the results of both analyses (andersen and CHA). I think there should be some logic for both analyses, such as enabling the CHA results when the andersen algorithm fails to ensure that there are no false negatives in the results of the VIRTUAL CALL analysis.
For call graphs generated with only -ander enabled. Missing virtual call edge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants