October 28, 2011

Creating a Triangle Mesh with 3ds Max SDK

I have been working on developing 3ds Max plug-ins lately. Comparing to Maya API, 3ds Max SDK is a hell of a mess. Inconsistent naming styles, lots of exposed pointers, and poor documentations. It took me a while to achieve this common task: Creating a triangle mesh. The following code is what I came up with eventually. Most of it came from OpenCOLLADA. Enjoy!

TriObject *createTriangleMesh(const std::vector<Point3> &points,
                              const std::vector<Point3> &normals,
                              const std::vector<Point2> &uvs,
                              const std::vector<int> &triangleVertIndices)
{
    TriObject *triobj = CreateNewTriObject();
    if (triobj == NULL)
        return NULL;

    assert(points.size() == normals.size() && normals.size() == uvs.size());
    assert(triangleVertIndices.size() % 3 == 0);

    int numVertices = (int) points.size();
    int numTriangles = (int) triangleVertIndices.size() / 3;
    Mesh &mesh = triobj->GetMesh();

    // set vertex positions
    mesh.setNumVerts(numVertices);
    for (int i = 0; i < numVertices; i++)
        mesh.setVert(i, points[i]);
    
    // set vertex normals
    mesh.SpecifyNormals();
    MeshNormalSpec *normalSpec = mesh.GetSpecifiedNormals();
    normalSpec->ClearNormals();
    normalSpec->SetNumNormals(numVertices);
    for (int i = 0; i < numVertices; i++)
    {
        normalSpec->Normal(i) = normals[i].Normalize();
        normalSpec->SetNormalExplicit(i, true);
    }

    // set UVs
    // TODO: multiple map channels?
    // channel 0 is reserved for vertex color, channel 1 is the default texture mapping
    mesh.setNumMaps(2);
    mesh.setMapSupport(1, TRUE);  // enable map channel
    MeshMap &map = mesh.Map(1);
    map.setNumVerts(numVertices);
    for (int i = 0; i < numVertices; i++)
    {
        UVVert &texVert = map.tv[i];
        texVert.x = uvs[i].x;
        texVert.y = uvs[i].y;
        texVert.z = 0.0f;
    }

    // set triangles
    mesh.setNumFaces(numTriangles);
    normalSpec->SetNumFaces(numTriangles);
    map.setNumFaces(numTriangles);
    for (int i = 0, j = 0; i < numTriangles; i++, j += 3)
    {
        // three vertex indices of a triangle
        int v0 = triangleVertIndices[j];
        int v1 = triangleVertIndices[j+1];
        int v2 = triangleVertIndices[j+2];

        // vertex positions
        Face &face = mesh.faces[i];
        face.setMatID(1);
        face.setEdgeVisFlags(1, 1, 1);
        face.setVerts(v0, v1, v2);

        // vertex normals
        MeshNormalFace &normalFace = normalSpec->Face(i);
        normalFace.SpecifyAll();
        normalFace.SetNormalID(0, v0);
        normalFace.SetNormalID(1, v1);
        normalFace.SetNormalID(2, v2);

        // vertex UVs
        TVFace &texFace = map.tf[i];
        texFace.setTVerts(v0, v1, v2);
    }

    mesh.InvalidateGeomCache();
    mesh.InvalidateTopologyCache();

    return triobj;
}

July 22, 2011

A Gotcha of C++ map and set

C++ std::map and std::set allow you to define your own less-than comparator for the key type. But one thing to bear in mind: The comparator must follow the rules of strick weak ordering! That is:
  • x < x is always false (irreflexivity)
  • If x < y, then y < x is false (asymmetric)
  • If x < y and y < z, then x < z (transitivity)
  • If x = y and y = z, then x = z (transitivity of equivalence)
Miss any one of these rules, you will get undefined behavior!

To see what it means, let me show you a mistake I made.

I was thinking of keeping a bunch of non-repeated (unique) floats, but I wish to compare the floats with some tolerance. For example, 1.0 and 1.01 should be same. So naturally, I used float as the map key, and the mapped value can be any type I need. Let's say here I use int.

In order to compare floats with tolerance, I wrote a float comparator like this:
struct FloatCmp
{
    bool operator()(float a, float b)
    {
        const float epsilon = 0.01f;
        if (fabsf(a - b) < epsilon)
            return false; // a == b
        return a < b;
    }
};
You can see line 6-7 checks if two floats are equal first before using the less-than comparison (line 8). Now, I can initialize my map container happily:
std::map<float, int, FloatCmp> table;
table[1.0f] = 1;
table[1.001f] = 2;   // same as table[1.0f] = 2
table[1.002f] = 3;   // same as table[1.0f] = 3
It all seemed to work perfectly. But I found sometimes it didn't work. It produced some duplicated keys. What did I do wrong?

The answer: I broke the rule of "transitivity of equivalence" (if x = y and y = z then x = z). Both FloatCmp(1.0f, 1.006f) and FloatCmp(1.006f, 1.012f) are true, but FloatCmp(1.0f, 1.012f) is not true! This makes the result depend on how you insert your elements. Consider the following code:
#include <iostream>
#include <map>

int main()
{
    std::map<float, int, FloatCmp> map1;
    std::map<float, int, FloatCmp> map2;

    float a = 1.000f;
    float b = 1.006f;
    float c = 1.012f;

    map1[a] = 1;
    map1[b] = 1;
    map1[c] = 1;

    map2[b] = 1;
    map2[a] = 1;
    map2[c] = 1;

    std::cout << map1.size() << std::endl;  // print 2
    std::cout << map2.size() << std::endl;  // print 1
    return 0;
}
After the insertions, map1 contains a and c, and map2 contains b. This is astonishing: Two maps with the same set of elements inserted can have different sizes because you insert them in different orders. It's not hard to figure out why by tracing the code.

So how to fix this? Well, as long as "transitivity of equivalence" is followed, there would be no problems. Finally I came up with this new comparator:
inline float discretize(float a)
{
    return floorf(a * 100.0f) / 100.0f;
}
struct FloatCmp
{
    bool operator()(float a, float b)
    {
        float aa = discretize(a);
        float bb = discretize(b);
        if (aa == bb)
            return false;
        return aa < bb;
    }
};
At line 9-10, discretize() function truncates (round off) floating-point numbers to 2 decimal digits. For example, discretize(1.1234f) returns 1.12f. And check if they're equal with == operator directly (line 11). In this way, I make sure "transitivity of equivalence" is strictly followed. Now if you relaunch the previous main program with this new comparator, both map1 and map2 will contain two elements as expected.

July 4, 2011

Python shelve vs. sqlite3

I have a huge Python dict that I can't store in the memory. Python provides a shelve module for this purpose. It acts like a regular dict but it can be saved into a file. I wonder its performance comparing to sqlite3, so I run this little test:
import shelve
import sqlite3

def test_sqlite3():
    conn = sqlite3.connect("debug.s3db")
    cur = conn.cursor()
    cur.execute("CREATE TABLE IF NOT EXISTS [mydict] ("
                "[key] VARCHAR(255) PRIMARY KEY NOT NULL, "
                "[value] VARCHAR(255) NOT NULL)")
    for i in xrange(0, 1000000):
        cur.execute("INSERT INTO [mydict] (key, value) VALUES (?, ?)",
                    (str(i), str(i*2)))
    conn.commit()
    cur.close()
    conn.close()
    
def test_shelve():
    d = shelve.open("debug.shelf")
    for i in xrange(0, 1000000):
        d[str(i)] = str(i*2)
    d.close()
On my computer, test_sqlite3() cost me about 25 seconds, and about 51 seconds for test_shelve(). I also performed some fetch queries on these two, sqlite3 still runs faster than shelve. In conclusion, sqlite3 wins!

July 3, 2011

June 24, 2011

How to check whether an OptiX variable is initialized or not?

In OptiX/CUDA programs (*.cu), simple type variables can always be initialized by putting an assignment operator immediately after rtDeclareVariable:
// in CUDA program
rtDeclareVariable(int, an_integer, , ) = 0;
rtDeclareVariable(float, a_float, , ) = 0.0f;
rtDeclareVariable(float3, a_vector, , ) = { 1.0f, 0.0f, 0.0f };
But for complex type variables, such as TextureSampler and Buffer, the only way to initialize them is to set the variables in host (CPU) programs. To prevent OptiX from recompiling, make sure you initialize your OptiX variables only at the beginning and only ONCE. So to do this, you can use Variable::getType() to check if an OptiX variable is initialized:
// in CUDA program
rtBuffer<float, 1>  a_buffer;
rtTextureSampler<float4, 2> a_texture;
// in host program
Variable buf_var = context["a_buffer"];
if (buf_var->getType() == RT_OBJECTTYPE_UNKNOWN)
{
    // initialize buf_var...
    Buffer buf = context->createBuffer(...);
    buf_var->setBuffer(buf);
}
Variable tex_var = context["a_texture"];
if (tex_var->getType() == RT_OBJECTTYPE_UNKNOWN)
{
    // initialize tex_var...
    TextureSampler sampler = context->createTextureSampler(...);
    tex_var->setTextureSampler(sampler);
}

// now you can dereference the variables safely
Buffer buf = context["a_buffer"]->getBuffer();
TextureSampler tex = context["a_tex"]->getTextureSampler();

June 16, 2011

The Fabric of the Cosmos


書名:The Fabric of the Cosmos: Space, Time, and the Texture of Reality
作者:Brian Greene
出版年:2004
連結:Amazon

一本 500 頁的書竟然可以涵蓋相對論、量子力學、弦論等艱深理論。重點是作者還能寫得有深度,不因它是一本科普書而寫得過於表面。

會找到這本書是因為看了作者 Brian Greene 製作的一部科學影片 "The Elegant Universe",這影片用了很多厲害的特效來表現物理學的奧妙,但內容深度不夠。例如,看完影片你知道愛因斯坦說時間會扭曲,量子力學的世界很奇怪,但感覺就是少了什麼,騷不到癢處。當然這也不能怪作者,短短兩個小時的影片根本放不下太多的內容。所以為了滿足好奇心,我就找了這本書來看,書名 "The Fabric of the Cosmos" 取得很有詩意,意思是宇宙的本質就像布料,那樣的飄逸柔軟。

這本書講了很多東西,也不是全部都是我有興趣的,遇到沒什麼興趣的章節,例如天文學,就看得很痛苦,但撐過去之後,又是一片海闊天空。這裡就寫一些我覺得有趣的部分。

空間是什麼?空間是真的東西,還是只是我們的幻覺?什麼是完全的空?如果空間沒有任何物質,那還有我們所認知的空間嗎?就牛頓的觀點來看,不管物質存不存在,空間是絕對存在的,而且是靜止不動的,空間是唯一絕對的參考座標系。這種解釋叫「絕對主義」(Absolutism)。「相對主義」(Relationalism)者(如萊布尼茲和馬赫)提出不一樣的看法,他們認為我們在空間的位置和速度都是相對於其他的物質(太空中的星星)。如果空間完全淨空,沒有物質可供參考,就不會有位置和運動的概念。所以用相對主義的觀點來看,空間其實並不存在,是物質決定了一切。

時間是什麼?為什麼我們感受到的時間是單向的?為什麼我們生活中觀察到的現象,似乎都是單向的?雞蛋掉到地上會破,但我們曾沒見過雞蛋從破掉的狀態還原。大部分物理定律裡的時間都是雙向的,正反向時間套到公式都對。實際上,牛頓定律還告訴我們,一個破掉的雞蛋,如果它的每個原子能以它摔落時相反的速度運動,那麼循著物理定律,雞蛋就會還原成完好的狀態。既然物理定律都這麼說了,為什麼我們仍看不到這種「奇蹟」?一種解釋是這種奇蹟可能發生,只是機率微乎其微罷了,畢竟要讓雞蛋每個原子都剛好有某個初速實在太難了!

另一個合理的解釋就是這跟亂度(Entropy)有關。為謂亂度?簡單來說,就是當你去擾動某個系統,整體狀態仍看起來差不多,就代表亂度高。舉個例子,一本書頁次排得好好的,你去隨便調動幾頁,這本書就跟原本差很多了,所以排好頁次的書亂度極低(即非常有秩序)。相反的,你把一本書的每一頁撕下來,洗牌後丟到地上,此時你再怎麼亂動,這本書看起來還是一樣亂,所以這種狀態下亂度就很高。

提高和降低亂度,那個容易?一本書頁次排好只有一種可能,而亂七八糟的排法多到數不清。你能輕易的把東西弄亂,弄整齊卻要花很多時間,所以顯然是提高亂度比較容易。宇宙整體的亂度似乎也是愈來愈高,亂度變高是自然趨勢,不需要理由。因此我們只能看到雞蛋破掉(亂度變高),而看不到雞蛋還原(亂度變低)。

這本書最精采的地方是狹義相對論和量子力學。以前在大學普物課本上讀過狹義相對論,當年的我會用公式解題,但就是少了一些「直覺」(換句話說,就是沒 fu)。這本書作者很厲害,用了一個全新的方式解釋狹義相對論,現在我雖然已經忘了題目怎麼算了,但卻因為這本書有了狹義相對論的「直覺」。

因為實在太酷了!就讓我簡單講一下作者是怎麼解釋狹義相對論的。先假設我們的空間是二維的,而時間仍是一維,時空結合起來可以形成一個三維的長方體,作者把這個長方體比喻成一條長長的時空土司(Spacetime Loaf):


當我們切下一片土司時,那片土司就代表某個時間點的空間。你切土司的方法就是你對時間感受,你每切一刀,就代表你的「現在」。當你靜止不動時,你切土司的方法如下圖左(藍線);而當你運動速度愈來愈接近光速時,你切土司的方法如下圖右(紅線)。這代表什麼?這是說,如果你和另一個人速度不一樣,你看到空間中兩處同時發生的事件(例如:A 和 B 兩人同時中彈身亡),另一個人看到的就不是同時的(即 A  先死,B 才死)!這就是狹義相對論裡一定會先學到的「同時性的相對性」(Relativity of Simultaneity)。


你可能會感覺藍線的切法比較「正」,紅線比較「斜」,但請把空間想成是一個無限大的平面,就不會有所謂那一個比較正的問題了。所有切法都正確,只是觀察角度不同罷了。這就是時間相對的概念。

要明顯觀察到「同時性的相對性」效應,必須有兩個速度差距很大(非常大,如光速的 1/4 倍)的東西才行。而我們現在看來很快的東西,無論是噴射機還是火箭,跟光速比還是微不足道,在這種慢速的情況下,時空土司的切法幾乎相等,所以根本感受不到時間的偏差。不過徜若雙方彼此距離遙遠(非常遠,如好幾百萬光年),就算速度慢,還是會造成很大的效應。這是因為「弧長 = 半徑 x 角度」,只要半徑夠大,就算角度很小,弧長還是能大到你要的程度。作者舉了一個例子,想像你坐在椅子,一個距離你一百億光年的外星人也坐在椅子上,若你們彼此沒有相對運動,你們的「現在」就是一致的。但是只要那個外星人站起來走動,他的「現在」可能就比你早或晚上百年,走幾步路竟然可以跨越百年的差距!我們感受到的「現在」其實是幻覺,時間相對的概念下,「現在」根本不存在。

既然如此,為什麼我們對「現在」的感受這麼深刻?或者應該問,為什麼我們只能感受到「現在」這個瞬間?畢竟在相對論裡,時間和空間、過去和未來應該視為一體,糾纏在愛因斯坦的方程式裡,為什麼我們不能感受過去和未來?這問題目前仍無解答,而且可能永遠是個謎。

先寫到這裡,有時間再來寫量子力學的部分。

May 31, 2011

Getting NVIDIA GPU Usage in C++

Strangely enough, NVAPI has no functions to get GPU usage/load. It turns out that there are some secret functions in nvapi.dll. You can use QueryInterface function to retrieve them by specifying the memory address of the function. Here's the code for Windows.

//
// Getting Nvidia GPU Usage
//
// Reference: Open Hardware Monitor (http://code.google.com/p/open-hardware-monitor)
//

#include <windows.h>
#include <iostream>

// magic numbers, do not change them
#define NVAPI_MAX_PHYSICAL_GPUS   64
#define NVAPI_MAX_USAGES_PER_GPU  34

// function pointer types
typedef int *(*NvAPI_QueryInterface_t)(unsigned int offset);
typedef int (*NvAPI_Initialize_t)();
typedef int (*NvAPI_EnumPhysicalGPUs_t)(int **handles, int *count);
typedef int (*NvAPI_GPU_GetUsages_t)(int *handle, unsigned int *usages);

int main()
{   
    HMODULE hmod = LoadLibraryA("nvapi.dll");
    if (hmod == NULL)
    {
        std::cerr << "Couldn't find nvapi.dll" << std::endl;
        return 1;
    }

    // nvapi.dll internal function pointers
    NvAPI_QueryInterface_t      NvAPI_QueryInterface     = NULL;
    NvAPI_Initialize_t          NvAPI_Initialize         = NULL;
    NvAPI_EnumPhysicalGPUs_t    NvAPI_EnumPhysicalGPUs   = NULL;
    NvAPI_GPU_GetUsages_t       NvAPI_GPU_GetUsages      = NULL;

    // nvapi_QueryInterface is a function used to retrieve other internal functions in nvapi.dll
    NvAPI_QueryInterface = (NvAPI_QueryInterface_t) GetProcAddress(hmod, "nvapi_QueryInterface");

    // some useful internal functions that aren't exported by nvapi.dll
    NvAPI_Initialize = (NvAPI_Initialize_t) (*NvAPI_QueryInterface)(0x0150E828);
    NvAPI_EnumPhysicalGPUs = (NvAPI_EnumPhysicalGPUs_t) (*NvAPI_QueryInterface)(0xE5AC921F);
    NvAPI_GPU_GetUsages = (NvAPI_GPU_GetUsages_t) (*NvAPI_QueryInterface)(0x189A1FDF);

    if (NvAPI_Initialize == NULL || NvAPI_EnumPhysicalGPUs == NULL ||
        NvAPI_EnumPhysicalGPUs == NULL || NvAPI_GPU_GetUsages == NULL)
    {
        std::cerr << "Couldn't get functions in nvapi.dll" << std::endl;
        return 2;
    }

    // initialize NvAPI library, call it once before calling any other NvAPI functions
    (*NvAPI_Initialize)();

    int          gpuCount = 0;
    int         *gpuHandles[NVAPI_MAX_PHYSICAL_GPUS] = { NULL };
    unsigned int gpuUsages[NVAPI_MAX_USAGES_PER_GPU] = { 0 };

    // gpuUsages[0] must be this value, otherwise NvAPI_GPU_GetUsages won't work
    gpuUsages[0] = (NVAPI_MAX_USAGES_PER_GPU * 4) | 0x10000;

    (*NvAPI_EnumPhysicalGPUs)(gpuHandles, &gpuCount);

    // print GPU usage every second
    for (int i = 0; i < 100; i++)
    {
        (*NvAPI_GPU_GetUsages)(gpuHandles[0], gpuUsages);
        int usage = gpuUsages[3];
        std::cout << "GPU Usage: " << usage << std::endl;
        Sleep(1000);
    }

    return 0;
}