Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExtAPI: questions about new format #1186

Open
adriaanjacobs opened this issue Sep 5, 2023 · 21 comments
Open

ExtAPI: questions about new format #1186

adriaanjacobs opened this issue Sep 5, 2023 · 21 comments

Comments

@adriaanjacobs
Copy link
Contributor

adriaanjacobs commented Sep 5, 2023

Hi

We've submitted patches in the past to allow extending the ExtAPI programmatically with new entries to describe allocation wrappers (#821, #1083), and would like to continue updating that functionality to support the new format. To that purpose, we have a few questions about the direction of the new format, and how you would like extensibility to be handled:

  • After reading the wiki, it seems that the OVERWRITE attribute will now always clear the function definition in the app module (app_def). This seems different from the previous behaviour of the OVERWRITE_APP flag: we used to be able to keep the original definition in the app, while SVF simply ignored the app definition and used the behaviour provided by the extapi database (now extapi.bc, previously extapi.json) to handle it.
    • Is there a reason for this change? We'd prefer the old way, since we wouldn't have to re-insert the allocation wrapper definitions to actually compile the module.
    • In our understanding, deleting the app body without merging in the extapi definition of the function would result in linker errors by default?
  • We are currently considering to add the option to specify an llvm::Module as the extApiModule instead of a filename, and let the user parse the .bc file themselves. That would allow us to insert our allocation wrappers before SVF ever sees the module. If that's desirable for you, we can send those patches in.
  • The ExtAPI class is now fully empty, and all the relevant parts are contained in LLVMModuleSet instead. Could we refactor this back into ExtAPI to maintain more separation? Ideally, we'd like the parsing of extapi.c & the population of the relevant maps to be done separately from building the LLVMModule(Set), so we can analyze the module up front & generate the necessary changes to extapi.c before we ever invoke the "real" SVF machinery. Edit: the maps are about the relation between the app module & the extAPI module, so probably not a good candidate to move into the ExtAPI which doesn't know about the app module.

As always, happy to send PRs for these changes. Just wanted to confirm whether they are welcome :)

@yuleisui
Copy link
Collaborator

yuleisui commented Sep 5, 2023

Your proposal seems good to me. Could you please make a pr?

@shuangxiangkan could you take a look at this? Make any comments if you have.

@adriaanjacobs
Copy link
Contributor Author

adriaanjacobs commented Sep 6, 2023

Another quick question: it looks like all functions that are annotated with STATIC in extapi.c are also annotated with ALLOC_RET; is that intentional? The aliasing behaviors of those annotations seem contradictory: ALLOC_RET always returns a new object, whereas all calls to a STATIC function should alias.

Most other functions in extapi.c seem to have function-level annotations as well. E.g. "strtok", should that not be implementable as follows:

char* strtok(char * str, const char * delim) {
    static char* old_str = NULL;
    if (str) {
        old_str = str;
        return str;
    }
    return old_str;
}

without any function-level annotation?

@yuleisui
Copy link
Collaborator

yuleisui commented Sep 6, 2023

Though calling the same static function, it will return different new objects. You could have a try to validate it.

If a function in extapi.c has one or more annotations, the function body will be ignored and the side-effect is based on the annotations.

@yuleisui
Copy link
Collaborator

yuleisui commented Sep 6, 2023

Another quick question: it looks like all functions that are annotated with STATIC in extapi.c are also annotated with ALLOC_RET; is that intentional? The aliasing behaviors of those annotations seem contradictory: ALLOC_RET always returns a new object, whereas all calls to a STATIC function should alias.

Most other functions in extapi.c seem to have function-level annotations as well. E.g. "strtok", should that not be implementable as follows:

char* strtok(char * str, const char * delim) {
    static char* old_str = NULL;
    if (str) {
        old_str = str;
        return str;
    }
    return old_str;
}

without any function-level annotation?

(1) If the application code has a declared function, you could provide your own implementated function in extapi.c
(2) If the application code has a function, which you want to overwrite with your own function implementation, you could add an annotation in the function in extapi.c

For your case strtok , you feel free to write your own function in extapi.c since strtok is lib function that does not have a function body. Current SVF simply added an annotation to indicate returning a new object for pointer analysis purposes.

@adriaanjacobs
Copy link
Contributor Author

adriaanjacobs commented Sep 6, 2023

Though calling the same static function, it will return different new objects. You could have a try to validate it.

If a function in extapi.c has one or more annotations, the function body will be ignored and the side-effect is based on > the annotations.

Thanks for your reply.

As far as I know, __errno_location always returns the same pointer (in the same thread), and, depending on the implementation, __ctype_tolower_loc, __ctype_toupper_loc, and __ctype_b_loc do too. All are marked both STATIC and ALLOC_RET in the current extapi.c

But even if they didn't (e.g. strerror allocates a new string in musl, but returns from a static table in some glibc versions), how does SVF decide between those two annotations? If I remember correctly, these functions were marked only "EFT_STAT(2)" in the old extapi.json, so the ALLOC_RET behavior is new to me.

@yuleisui
Copy link
Collaborator

yuleisui commented Sep 6, 2023

I see. I think you are right. STATIC and ALLOC_RET sometimes are contradictory to each other. We will need to sort this out. What suggestions for the fix?

@yuleisui
Copy link
Collaborator

yuleisui commented Sep 6, 2023

Would be good to go through each function with these two annotations. For a function which is not STATIC, the annotation STATIC should be removed. For a function that is STATIC, both annotations should be removed and make the function body return a malloc object so that the application code will share the same malloc site with only one object for all its invocations.

@adriaanjacobs
Copy link
Contributor Author

For a function that is STATIC, both annotations should be removed and make the function body return a malloc object so that the application code will share the same malloc site with only one object for all its invocations.

Would SVF not be able to handle the STATIC annotation alone? If we return a malloc() object, will that not lead SVF to believe that every call to this function would result in a new object?

@yuleisui
Copy link
Collaborator

yuleisui commented Sep 6, 2023

No, if we have a function body returns malloc in extapi. The SVFIR or PAG will treat this malloc as a single object returning to all its call sites that call this function.

If we use annotation like RET_ALLOC, SVF will return different objects for each callsite calls this function

@adriaanjacobs
Copy link
Contributor Author

adriaanjacobs commented Sep 6, 2023

I see. From the point of view of allocation wrappers, we frequently encounter not simply wrapper functions (such as xmalloc), but also initialization functions or functions that contain additional pointer behavior beyond simple allocation, e.g.:

__attribute__((annotate("ALLOC_RET"))
void* init_struct(void* ptr, void** secondptr) {
    struct* ret = malloc(sizeof(struct));
    ret->field = ptr;
    *secondptr = ptr;
    return ret;
}

How could we let SVF capture that init_struct both returns a new object every time, but field and secondptr are also set to ptr? I.e.

void* ptr = ...;
void* secondptr;
struct* obj = init_struct(false, ptr, &secondptr);
assert(mustalias(obj->field, ptr));
assert(mustalias(ptr, secondptr));

With the old extapi format, we were able to generate an extapi entry that captured both the new object allocation and the copy edge between the 2nd parameter and the 1st parameter. We were hoping to also capture the initialization of the struct field in the new format, but how could we achieve this? If we specify "ALLOC_RET", the body of the function will be ignored?

@yuleisui
Copy link
Collaborator

yuleisui commented Sep 6, 2023

In your case, you can simply remove the ALLOC_RET annotation and use your self-defined function body in ExtAPI.c It should work.

@yuleisui
Copy link
Collaborator

yuleisui commented Sep 6, 2023

If you specify "ALLOC_RET", yes, the body of the function will be ignored.

@adriaanjacobs
Copy link
Contributor Author

adriaanjacobs commented Sep 6, 2023

In your case, you can simply remove the ALLOC_RET annotation and use your self-defined function body in ExtAPI.c It should work.

Okay, but would SVF not treat all calls to this function as returning the same object then? Which would defeat the point of adding entries for the allocation wrappers in the first place, because we are trying to have SVF treat calls to the allocation wrapper function as separate objects.

@adriaanjacobs
Copy link
Contributor Author

adriaanjacobs commented Sep 6, 2023

Proposal: would it be possible to "inline" the PAG information obtained from the extapi entries in their callers (1-level context sensitive)? Most extapi functions could then be entirely defined by their bodies (no function-level annotations), and function-level annotations would be mandatory on declarations in extapi (functions without body).

e.g.

// mandatory function-level attribute on functions without body
// svf_extapi_new_obj is an artificial function that just provides the new object node
// the only necessary annotations (I think) are ALLOC_RET and STATIC
__attribute__((annotate("ALLOC_RET"))
void* svf_extapi_new_obj();

// no need for ALLOC_RET here; PAG information of body is inlined into every caller -> no two callsites will alias
void* malloc(size_t bytes) {
    return svf_extapi_new_obj();
}

// no need for ALLOC_ARGi, the PAG of this body will already contain information about where the new object is stored
int asprintf(char **restrict strp, const char *restrict fmt, ...) {
    *strp = svf_extapi_new_obj();
    return 0;
}

// no need for REALLOC_RET, the PAG of this body handles the NULL case
void *realloc(void *ptr, size_t size) {
    if (ptr)
         return ptr;
    return svf_extapi_new_obj();
}

// another artificial function that provides a pointer to the location of the 
// strtok global. all calls to this function will alias
// note: function-level annotation mandatory because the function has no body
__attribute__((annotate("STATIC"))
char** svf_extapi_strtok_global_loc();

// again, no need for an attribute here: the body contains all the necessary information
char *strtok(char * str, const char * delim) {
    char** global_loc = svf_extapi_strtok_global_loc();
    if (str)
        *global_loc = str;
    return *global_loc;
}

// __ctype_b_loc: this function returns a pointer to a STATIC pointer to a STATIC array
// this corresponds to the old EFT_STAT2 entry (multi-level static), which no longer exists in the new extapi format (?)
// none of the entries in the top-level array should alias each other
// when we inline PAG information, we can describe arbitrarily deep/complex datastructures without additional annotations

// "STATIC" means "return value points to static var"
__attribute__((annotate("STATIC"))
const unsigned short* svf_extapi_ctype_b_array();

__attribute__((annotate("STATIC"))
const unsigned short** svf_extapi_ctype_b_loc();

const unsigned short** __ctype_b_loc() {
    const unsigned short** loc = svf_extapi_ctype_b_loc();
    for (int = -128; i < 256; i++)
        loc[i] = svf_extapi_ctype_b_array();
    return loc;
}

Given a few restrictions on extapi entries (no indirect calls, no recursion), we could then express arbitrarily complex pointer behavior in the extapi, including the example from our discussion above:

// no annotation needed
void* init_struct(void* ptr, void** secondptr) {
    struct* ret = svf_extapi_new_obj();
    ret->field = ptr;
    *secondptr = ptr;
    return ret;
}

In my understanding, this format provides the following benefits:

  • There's at least as much flexibility as the previous format (extapi.json), without having to provide attributes for all kinds of pointer behavior.
  • This format would more than serve our needs for handling (complex) allocation wrappers.
  • The inlined PAG information could be simplified more generally, e.g., in the realloc or strtok case, to notice whether the first argument is NULL, without having to supply a dedicated attribute for that ("REALLOC_RET") -> other functions (such as strtok) with behavior depending on values of the arguments could benefit, too.

@yuleisui
Copy link
Collaborator

yuleisui commented Sep 6, 2023

The proposal looks pretty good!

there are two challenges:

  1. 1-level inlining. How to implement this? This is also based on the function body is not too large?
  2. declarations in extapi will be optimised and removed when compiling them into bc if you don’t have a function body. How to address this issue?

@adriaanjacobs
Copy link
Contributor Author

adriaanjacobs commented Sep 6, 2023

re 1: I'm not too familiar with how the SVFIR is constructed, but I believe calls to functions that have an extapi counterpart should be handled specially. When encountering a call to such a function, e.g., malloc, I imagine we should not just add a directcallgraphedge to the callgraph (CallGraphBuilder.cpp), but instead add the ICFG nodes of the callee to the ICFG of the caller (inlining the SVFIR representation of the function). To be clear, I don't think we should inline the actual extapi code into the app module.

re 2. As long as we call the declarations in the extapi, they should not get optimized out, I think. But if it proves to be an issue, we can use the same workaround as in extapi.c right now and add a dummy body. The bodies of functions with annotations on them will then simply be ignored. (note that this is still different from the current format; non-annotated functions can still behave like "ALLOC_RET").

I'm out of my depth on how to implement such "inlining" in SVF; could you point me towards where the SVFIR of the current extapi entries is queried during pointer analysis? I imagine that the changes would have to happen there.

@shuangxiangkan
Copy link
Contributor

re 1: I'm not too familiar with how the SVFIR is constructed, but I believe calls to functions that have an extapi counterpart should be handled specially. When encountering a call to such a function, e.g., malloc, I imagine we should not just add a directcallgraphedge to the callgraph (CallGraphBuilder.cpp), but instead add the ICFG nodes of the callee to the ICFG of the caller (inlining the SVFIR representation of the function). To be clear, I don't think we should inline the actual extapi code into the app module.

I agree with your idea of inline the PAG information from the external API into the caller's PAG. However, constructing the PAG can involve various data structures, and if the body of the external API function is complex, inlining the PAG from the external API into the caller's PAG can be challenging. This is because it's not as simple as adding a copy or store edge at the external callsite, as was done in the previous ExtAPI.json. SVFIR requires the information from the external API to be present at this place:

handleExtCall(cs, svfcallee);

re 2. As long as we call the declarations in the extapi, they should not get optimized out, I think. But if it proves to be an issue, we can use the same workaround as in extapi.c right now and add a dummy body. The bodies of functions with annotations on them will then simply be ignored. (note that this is still different from the current format; non-annotated functions can still behave like "ALLOC_RET").

Even when calling a declaration within extapi.c, while using Clang to compile extapi.c into extapi.bc with optimization flags disabled (-O0), the annotations information still cannot be preserved. Therefore, it is necessary to include an empty function body to retain the annotations information.

@yuleisui
Copy link
Collaborator

yuleisui commented Sep 7, 2023

For the second challenge, welcome any solution to remove the dummy function body for external functions with annotation.

@adriaanjacobs
Copy link
Contributor Author

adriaanjacobs commented Sep 7, 2023

I agree with your idea of inline the PAG information from the external API into the caller's PAG. However, constructing the PAG can involve various data structures, and if the body of the external API function is complex, inlining the PAG from the external API into the caller's PAG can be challenging. This is because it's not as simple as adding a copy or store edge at the external callsite, as was done in the previous ExtAPI.json.

I've looked a little more at how to do this (challenge 1), and I believe it would be good to create separate SVFInstructions per-callsite for the extapi functions. So in createSVFFunction, we would iterate into the extapi call and handle its instructions as if they were part of the caller (app function), creating new SVFInstructions, essentially as if it was a "macro". There would be no call instruction to that extapi function at the SVFInstruction level. The only call instructions left to the extapi would be those to extapi "declarations" (svf_extapi_new_obj, svf_extapi_strtok_old_loc, etc) which are annotated and should be handled similar to how they are handled now; based on their annotation.

All later analyses would observe the contents of the extapi functions as if they were inlined into the app functions, which achieves our goal, I believe.

One (major?) issue i see with this currently are indirect calls to extapi functions. At the SVFFunction construction level, do we know yet an estimation of the targets of indirect calls?

@shuangxiangkan
Copy link
Contributor

In pull request #1188, "attribute((annotate("STATIC")))" has been removed, and instead, returns malloc() within the function body. When functions with the "STATIC" annotation in previous extapi.c are called multiple times, only the same object will be returned.

@yuleisui
Copy link
Collaborator

yuleisui commented Sep 8, 2023

@adriaanjacobs can you take a look at #1188 before I merge it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants