October 28, 2011

Creating a Triangle Mesh with 3ds Max SDK

I have been working on developing 3ds Max plug-ins lately. Comparing to Maya API, 3ds Max SDK is a hell of a mess. Inconsistent naming styles, lots of exposed pointers, and poor documentations. It took me a while to achieve this common task: Creating a triangle mesh. The following code is what I came up with eventually. Most of it came from OpenCOLLADA. Enjoy!

TriObject *createTriangleMesh(const std::vector<Point3> &points,
                              const std::vector<Point3> &normals,
                              const std::vector<Point2> &uvs,
                              const std::vector<int> &triangleVertIndices)
    TriObject *triobj = CreateNewTriObject();
    if (triobj == NULL)
        return NULL;

    assert(points.size() == normals.size() && normals.size() == uvs.size());
    assert(triangleVertIndices.size() % 3 == 0);

    int numVertices = (int) points.size();
    int numTriangles = (int) triangleVertIndices.size() / 3;
    Mesh &mesh = triobj->GetMesh();

    // set vertex positions
    for (int i = 0; i < numVertices; i++)
        mesh.setVert(i, points[i]);
    // set vertex normals
    MeshNormalSpec *normalSpec = mesh.GetSpecifiedNormals();
    for (int i = 0; i < numVertices; i++)
        normalSpec->Normal(i) = normals[i].Normalize();
        normalSpec->SetNormalExplicit(i, true);

    // set UVs
    // TODO: multiple map channels?
    // channel 0 is reserved for vertex color, channel 1 is the default texture mapping
    mesh.setMapSupport(1, TRUE);  // enable map channel
    MeshMap &map = mesh.Map(1);
    for (int i = 0; i < numVertices; i++)
        UVVert &texVert = map.tv[i];
        texVert.x = uvs[i].x;
        texVert.y = uvs[i].y;
        texVert.z = 0.0f;

    // set triangles
    for (int i = 0, j = 0; i < numTriangles; i++, j += 3)
        // three vertex indices of a triangle
        int v0 = triangleVertIndices[j];
        int v1 = triangleVertIndices[j+1];
        int v2 = triangleVertIndices[j+2];

        // vertex positions
        Face &face = mesh.faces[i];
        face.setEdgeVisFlags(1, 1, 1);
        face.setVerts(v0, v1, v2);

        // vertex normals
        MeshNormalFace &normalFace = normalSpec->Face(i);
        normalFace.SetNormalID(0, v0);
        normalFace.SetNormalID(1, v1);
        normalFace.SetNormalID(2, v2);

        // vertex UVs
        TVFace &texFace = map.tf[i];
        texFace.setTVerts(v0, v1, v2);


    return triobj;

July 22, 2011

A Gotcha of C++ map and set

C++ std::map and std::set allow you to define your own less-than comparator for the key type. But one thing to bear in mind: The comparator must follow the rules of strick weak ordering! That is:
  • x < x is always false (irreflexivity)
  • If x < y, then y < x is false (asymmetric)
  • If x < y and y < z, then x < z (transitivity)
  • If x = y and y = z, then x = z (transitivity of equivalence)
Miss any one of these rules, you will get undefined behavior!

To see what it means, let me show you a mistake I made.

I was thinking of keeping a bunch of non-repeated (unique) floats, but I wish to compare the floats with some tolerance. For example, 1.0 and 1.01 should be same. So naturally, I used float as the map key, and the mapped value can be any type I need. Let's say here I use int.

In order to compare floats with tolerance, I wrote a float comparator like this:
struct FloatCmp
    bool operator()(float a, float b)
        const float epsilon = 0.01f;
        if (fabsf(a - b) < epsilon)
            return false; // a == b
        return a < b;
You can see line 6-7 checks if two floats are equal first before using the less-than comparison (line 8). Now, I can initialize my map container happily:
std::map<float, int, FloatCmp> table;
table[1.0f] = 1;
table[1.001f] = 2;   // same as table[1.0f] = 2
table[1.002f] = 3;   // same as table[1.0f] = 3
It all seemed to work perfectly. But I found sometimes it didn't work. It produced some duplicated keys. What did I do wrong?

The answer: I broke the rule of "transitivity of equivalence" (if x = y and y = z then x = z). Both FloatCmp(1.0f, 1.006f) and FloatCmp(1.006f, 1.012f) are true, but FloatCmp(1.0f, 1.012f) is not true! This makes the result depend on how you insert your elements. Consider the following code:
#include <iostream>
#include <map>

int main()
    std::map<float, int, FloatCmp> map1;
    std::map<float, int, FloatCmp> map2;

    float a = 1.000f;
    float b = 1.006f;
    float c = 1.012f;

    map1[a] = 1;
    map1[b] = 1;
    map1[c] = 1;

    map2[b] = 1;
    map2[a] = 1;
    map2[c] = 1;

    std::cout << map1.size() << std::endl;  // print 2
    std::cout << map2.size() << std::endl;  // print 1
    return 0;
After the insertions, map1 contains a and c, and map2 contains b. This is astonishing: Two maps with the same set of elements inserted can have different sizes because you insert them in different orders. It's not hard to figure out why by tracing the code.

So how to fix this? Well, as long as "transitivity of equivalence" is followed, there would be no problems. Finally I came up with this new comparator:
inline float discretize(float a)
    return floorf(a * 100.0f) / 100.0f;
struct FloatCmp
    bool operator()(float a, float b)
        float aa = discretize(a);
        float bb = discretize(b);
        if (aa == bb)
            return false;
        return aa < bb;
At line 9-10, discretize() function truncates (round off) floating-point numbers to 2 decimal digits. For example, discretize(1.1234f) returns 1.12f. And check if they're equal with == operator directly (line 11). In this way, I make sure "transitivity of equivalence" is strictly followed. Now if you relaunch the previous main program with this new comparator, both map1 and map2 will contain two elements as expected.

July 4, 2011

Python shelve vs. sqlite3

I have a huge Python dict that I can't store in the memory. Python provides a shelve module for this purpose. It acts like a regular dict but it can be saved into a file. I wonder its performance comparing to sqlite3, so I run this little test:
import shelve
import sqlite3

def test_sqlite3():
    conn = sqlite3.connect("debug.s3db")
    cur = conn.cursor()
    cur.execute("CREATE TABLE IF NOT EXISTS [mydict] ("
                "[key] VARCHAR(255) PRIMARY KEY NOT NULL, "
                "[value] VARCHAR(255) NOT NULL)")
    for i in xrange(0, 1000000):
        cur.execute("INSERT INTO [mydict] (key, value) VALUES (?, ?)",
                    (str(i), str(i*2)))
def test_shelve():
    d = shelve.open("debug.shelf")
    for i in xrange(0, 1000000):
        d[str(i)] = str(i*2)
On my computer, test_sqlite3() cost me about 25 seconds, and about 51 seconds for test_shelve(). I also performed some fetch queries on these two, sqlite3 still runs faster than shelve. In conclusion, sqlite3 wins!

July 3, 2011

June 24, 2011

How to check whether an OptiX variable is initialized or not?

In OptiX/CUDA programs (*.cu), simple type variables can always be initialized by putting an assignment operator immediately after rtDeclareVariable:
// in CUDA program
rtDeclareVariable(int, an_integer, , ) = 0;
rtDeclareVariable(float, a_float, , ) = 0.0f;
rtDeclareVariable(float3, a_vector, , ) = { 1.0f, 0.0f, 0.0f };
But for complex type variables, such as TextureSampler and Buffer, the only way to initialize them is to set the variables in host (CPU) programs. To prevent OptiX from recompiling, make sure you initialize your OptiX variables only at the beginning and only ONCE. So to do this, you can use Variable::getType() to check if an OptiX variable is initialized:
// in CUDA program
rtBuffer<float, 1>  a_buffer;
rtTextureSampler<float4, 2> a_texture;
// in host program
Variable buf_var = context["a_buffer"];
if (buf_var->getType() == RT_OBJECTTYPE_UNKNOWN)
    // initialize buf_var...
    Buffer buf = context->createBuffer(...);
Variable tex_var = context["a_texture"];
if (tex_var->getType() == RT_OBJECTTYPE_UNKNOWN)
    // initialize tex_var...
    TextureSampler sampler = context->createTextureSampler(...);

// now you can dereference the variables safely
Buffer buf = context["a_buffer"]->getBuffer();
TextureSampler tex = context["a_tex"]->getTextureSampler();

June 16, 2011

The Fabric of the Cosmos

書名:The Fabric of the Cosmos: Space, Time, and the Texture of Reality
作者:Brian Greene

一本 500 頁的書竟然可以涵蓋相對論、量子力學、弦論等艱深理論。重點是作者還能寫得有深度,不因它是一本科普書而寫得過於表面。

會找到這本書是因為看了作者 Brian Greene 製作的一部科學影片 "The Elegant Universe",這影片用了很多厲害的特效來表現物理學的奧妙,但內容深度不夠。例如,看完影片你知道愛因斯坦說時間會扭曲,量子力學的世界很奇怪,但感覺就是少了什麼,騷不到癢處。當然這也不能怪作者,短短兩個小時的影片根本放不下太多的內容。所以為了滿足好奇心,我就找了這本書來看,書名 "The Fabric of the Cosmos" 取得很有詩意,意思是宇宙的本質就像布料,那樣的飄逸柔軟。






這本書最精采的地方是狹義相對論和量子力學。以前在大學普物課本上讀過狹義相對論,當年的我會用公式解題,但就是少了一些「直覺」(換句話說,就是沒 fu)。這本書作者很厲害,用了一個全新的方式解釋狹義相對論,現在我雖然已經忘了題目怎麼算了,但卻因為這本書有了狹義相對論的「直覺」。

因為實在太酷了!就讓我簡單講一下作者是怎麼解釋狹義相對論的。先假設我們的空間是二維的,而時間仍是一維,時空結合起來可以形成一個三維的長方體,作者把這個長方體比喻成一條長長的時空土司(Spacetime Loaf):

當我們切下一片土司時,那片土司就代表某個時間點的空間。你切土司的方法就是你對時間感受,你每切一刀,就代表你的「現在」。當你靜止不動時,你切土司的方法如下圖左(藍線);而當你運動速度愈來愈接近光速時,你切土司的方法如下圖右(紅線)。這代表什麼?這是說,如果你和另一個人速度不一樣,你看到空間中兩處同時發生的事件(例如:A 和 B 兩人同時中彈身亡),另一個人看到的就不是同時的(即 A  先死,B 才死)!這就是狹義相對論裡一定會先學到的「同時性的相對性」(Relativity of Simultaneity)。


要明顯觀察到「同時性的相對性」效應,必須有兩個速度差距很大(非常大,如光速的 1/4 倍)的東西才行。而我們現在看來很快的東西,無論是噴射機還是火箭,跟光速比還是微不足道,在這種慢速的情況下,時空土司的切法幾乎相等,所以根本感受不到時間的偏差。不過徜若雙方彼此距離遙遠(非常遠,如好幾百萬光年),就算速度慢,還是會造成很大的效應。這是因為「弧長 = 半徑 x 角度」,只要半徑夠大,就算角度很小,弧長還是能大到你要的程度。作者舉了一個例子,想像你坐在椅子,一個距離你一百億光年的外星人也坐在椅子上,若你們彼此沒有相對運動,你們的「現在」就是一致的。但是只要那個外星人站起來走動,他的「現在」可能就比你早或晚上百年,走幾步路竟然可以跨越百年的差距!我們感受到的「現在」其實是幻覺,時間相對的概念下,「現在」根本不存在。



May 31, 2011

Getting NVIDIA GPU Usage in C++

Strangely enough, NVAPI has no functions to get GPU usage/load. It turns out that there are some secret functions in nvapi.dll. You can use QueryInterface function to retrieve them by specifying the memory address of the function. Here's the code for Windows.

// Getting Nvidia GPU Usage
// Reference: Open Hardware Monitor (http://code.google.com/p/open-hardware-monitor)

#include <windows.h>
#include <iostream>

// magic numbers, do not change them

// function pointer types
typedef int *(*NvAPI_QueryInterface_t)(unsigned int offset);
typedef int (*NvAPI_Initialize_t)();
typedef int (*NvAPI_EnumPhysicalGPUs_t)(int **handles, int *count);
typedef int (*NvAPI_GPU_GetUsages_t)(int *handle, unsigned int *usages);

int main()
    HMODULE hmod = LoadLibraryA("nvapi.dll");
    if (hmod == NULL)
        std::cerr << "Couldn't find nvapi.dll" << std::endl;
        return 1;

    // nvapi.dll internal function pointers
    NvAPI_QueryInterface_t      NvAPI_QueryInterface     = NULL;
    NvAPI_Initialize_t          NvAPI_Initialize         = NULL;
    NvAPI_EnumPhysicalGPUs_t    NvAPI_EnumPhysicalGPUs   = NULL;
    NvAPI_GPU_GetUsages_t       NvAPI_GPU_GetUsages      = NULL;

    // nvapi_QueryInterface is a function used to retrieve other internal functions in nvapi.dll
    NvAPI_QueryInterface = (NvAPI_QueryInterface_t) GetProcAddress(hmod, "nvapi_QueryInterface");

    // some useful internal functions that aren't exported by nvapi.dll
    NvAPI_Initialize = (NvAPI_Initialize_t) (*NvAPI_QueryInterface)(0x0150E828);
    NvAPI_EnumPhysicalGPUs = (NvAPI_EnumPhysicalGPUs_t) (*NvAPI_QueryInterface)(0xE5AC921F);
    NvAPI_GPU_GetUsages = (NvAPI_GPU_GetUsages_t) (*NvAPI_QueryInterface)(0x189A1FDF);

    if (NvAPI_Initialize == NULL || NvAPI_EnumPhysicalGPUs == NULL ||
        NvAPI_EnumPhysicalGPUs == NULL || NvAPI_GPU_GetUsages == NULL)
        std::cerr << "Couldn't get functions in nvapi.dll" << std::endl;
        return 2;

    // initialize NvAPI library, call it once before calling any other NvAPI functions

    int          gpuCount = 0;
    int         *gpuHandles[NVAPI_MAX_PHYSICAL_GPUS] = { NULL };
    unsigned int gpuUsages[NVAPI_MAX_USAGES_PER_GPU] = { 0 };

    // gpuUsages[0] must be this value, otherwise NvAPI_GPU_GetUsages won't work
    gpuUsages[0] = (NVAPI_MAX_USAGES_PER_GPU * 4) | 0x10000;

    (*NvAPI_EnumPhysicalGPUs)(gpuHandles, &gpuCount);

    // print GPU usage every second
    for (int i = 0; i < 100; i++)
        (*NvAPI_GPU_GetUsages)(gpuHandles[0], gpuUsages);
        int usage = gpuUsages[3];
        std::cout << "GPU Usage: " << usage << std::endl;

    return 0;